[Openais] does openais need to consider the error happens in
the process of receiving a mcast message
Mark Haverkamp
markh at osdl.org
Mon Jul 25 09:32:49 PDT 2005
On Mon, 2005-07-25 at 13:01 +0800, Li Huanghai wrote:
> Hi,
> I am puzzled with the openais's exception handling.
> When a node sends a message to all nodes,it doesn't
> wait for the other nodes' responses of the result that
> does it handle the message correctly. That means once
> a node handle the message error,such as the most malloc
> error, the other nodes won't find it and consider it correct.
> Then the cluster is in an inconsistent state and the following
> operations will get error result but application consider it
> true. This is a big problem for it is the high-availability
> software.
>
> How to consider this problem? Can it being ignored? If can't,
> how to deal with it ? Does it need a rollback policy to keep
> all nodes in a consisitent state.
>
The protocol keeps track of messages by sequence number. If a message
can't be received for some reason, the protocol will notice that it has
a missing message and request that the missing message be retransmitted.
In a way the protocol does wait for the nodes response because the token
contains the information about what the highest sequence number received
for messages with no sequence holes and a list of message sequence
numbers that need to be re-transmitted because someone hasn't received
them yet.
--
Mark Haverkamp <markh at osdl.org>
More information about the Openais
mailing list