[Openais] does openais need to consider the error happens in the process of receiving a mcast message

Steven Dake sdake at mvista.com
Mon Jul 25 10:30:18 PDT 2005


On Mon, 2005-07-25 at 09:32 -0700, Mark Haverkamp wrote:
> On Mon, 2005-07-25 at 13:01 +0800, Li Huanghai wrote:
> > Hi,
> >     I am puzzled with the openais's exception handling.
> > When a node sends a message to all nodes,it doesn't
> > wait for the other nodes' responses of the result that 
> > does it handle the message correctly. That means once 
> > a node handle the message error,such as the most malloc 
> > error, the other nodes won't find it and consider it correct.
> > Then the cluster is in an inconsistent state and the following
> > operations will get error result but application consider it 
> > true. This is a big problem for it is the high-availability
> > software.
> > 
> >     How to consider this problem? Can it being ignored? If can't,
> > how to deal with it ? Does it need a rollback policy to keep
> > all nodes in a consisitent state.
> > 
> 
> The protocol keeps track of messages by sequence number.  If a message
> can't be received for some reason, the protocol will notice that it has
> a missing message and request that the missing message be retransmitted.
> In a way the protocol does wait for the nodes response because the token
> contains the information about what the highest sequence number received
> for messages with no sequence holes and a list of message sequence
> numbers that need to be re-transmitted because someone hasn't received
> them yet.
> 

So Mark is correct; the protocol handles out of memory errors by
"ignoring" the packet.  Then the packet is resent.  It is important to
note that poll and other system calls may have problems in low memory
situations, in which case they are likely to be bounced from the
configuration.

The services themselves (such as checkpoint) are not very tolerant of
out of memory errors.  I had planned to solve this problem through the
use of memory pools, but it as yet remains unimplemented.

look at mempool.c/.h for more details

regards

> 
> _______________________________________________
> Openais mailing list
> Openais at lists.osdl.org
> https://lists.osdl.org/mailman/listinfo/openais




More information about the Openais mailing list