[Openais] does openais need to consider the error happens in the process of receiving a mcast message

Mark Haverkamp markh at osdl.org
Mon Jul 25 09:32:49 PDT 2005


On Mon, 2005-07-25 at 13:01 +0800, Li Huanghai wrote:
> Hi,
>     I am puzzled with the openais's exception handling.
> When a node sends a message to all nodes,it doesn't
> wait for the other nodes' responses of the result that 
> does it handle the message correctly. That means once 
> a node handle the message error,such as the most malloc 
> error, the other nodes won't find it and consider it correct.
> Then the cluster is in an inconsistent state and the following
> operations will get error result but application consider it 
> true. This is a big problem for it is the high-availability
> software.
> 
>     How to consider this problem? Can it being ignored? If can't,
> how to deal with it ? Does it need a rollback policy to keep
> all nodes in a consisitent state.
> 

The protocol keeps track of messages by sequence number.  If a message
can't be received for some reason, the protocol will notice that it has
a missing message and request that the missing message be retransmitted.
In a way the protocol does wait for the nodes response because the token
contains the information about what the highest sequence number received
for messages with no sequence holes and a list of message sequence
numbers that need to be re-transmitted because someone hasn't received
them yet.


-- 
Mark Haverkamp <markh at osdl.org>




More information about the Openais mailing list