[Openais] [corosync trunk] fix process pausing issue with membership algorithm

Steven Dake sdake at redhat.com
Fri Jun 26 14:20:48 PDT 2009


On Fri, 2009-06-26 at 09:30 +0100, Chrissie Caulfield wrote:
> Steven Dake wrote:
> > When a process pauses for longer then the token timeout, the other
> > processors in the system form a new ring.  The remaining processor then
> > eventually reschedules and processes the pending membership multicast
> > messages in its kernel queues.  This wreaks havok on the membership of
> > the other nodes.
> > 
> > While a proper kernel shouldn't pause for long periods, its a reality
> > that many kernels still have long periods of spinlocking without
> > scheduling and no proper preemption.
> > 
> > This patch resolves the scenario by creating a timer which records a
> > time stamp at an interval that is the token timeout / 5.  Then if a
> > process executes the membership algorithm by receiving a join message,
> > the current time is retrieved and compared to the timestamp.  If they
> > differ by more then token timeout / 2, it is assumed the process
> > couldn't schedule (because it couldn't trigger the timer callbacks via
> > poll) and calls totemnet to flush any pending multicasts in the file
> > descriptor responsible for receiving multicast messages.  This results
> > in the old membership messages being thrown away allowing the new
> > membership to form properly.
> > 
> > This can be tested by ctrl-z a corosync process in a 8 node cluster.
> > Then use fg to bring it into the foreground.  Pre-patch - bad news -
> > post patch, prints a notice and proceeds properly.
> > 
> 
> At the bottom of the patch:
> 
> +       if (pause_flush (instance)) {
> +               return (0);
> +       }
> 
> will skip the rest of the routine if pause_flush encounters an error, as
> well as if it flushes some messages ... is that intended behaviour ?
> 

The correct should be that pause_flush returns 1 if there was a pause
and when all pending messages were flushed.  If that doesn't happen in
totemnet, the totemnet code should block waiting for a no error
condition.

Thanks for pointing this out, I'll sort out a fix for it.

> It's a consequence of overloading the return code to indicate not only
> whether the operation succeeded or not, but also whether it flushed any
> messages.  Perhaps there should be a pass-by-reference parameter for
> &messages_flushed to keep them separate ?
> 
> 
> Chrissie

Thanks for a review of the patch

Regards
-steve



More information about the Openais mailing list