[Openais] [corosync trunk] fix process pausing issue with membership algorithm
Steven Dake
sdake at redhat.com
Fri Jun 26 14:20:48 PDT 2009
On Fri, 2009-06-26 at 09:30 +0100, Chrissie Caulfield wrote:
> Steven Dake wrote:
> > When a process pauses for longer then the token timeout, the other
> > processors in the system form a new ring. The remaining processor then
> > eventually reschedules and processes the pending membership multicast
> > messages in its kernel queues. This wreaks havok on the membership of
> > the other nodes.
> >
> > While a proper kernel shouldn't pause for long periods, its a reality
> > that many kernels still have long periods of spinlocking without
> > scheduling and no proper preemption.
> >
> > This patch resolves the scenario by creating a timer which records a
> > time stamp at an interval that is the token timeout / 5. Then if a
> > process executes the membership algorithm by receiving a join message,
> > the current time is retrieved and compared to the timestamp. If they
> > differ by more then token timeout / 2, it is assumed the process
> > couldn't schedule (because it couldn't trigger the timer callbacks via
> > poll) and calls totemnet to flush any pending multicasts in the file
> > descriptor responsible for receiving multicast messages. This results
> > in the old membership messages being thrown away allowing the new
> > membership to form properly.
> >
> > This can be tested by ctrl-z a corosync process in a 8 node cluster.
> > Then use fg to bring it into the foreground. Pre-patch - bad news -
> > post patch, prints a notice and proceeds properly.
> >
>
> At the bottom of the patch:
>
> + if (pause_flush (instance)) {
> + return (0);
> + }
>
> will skip the rest of the routine if pause_flush encounters an error, as
> well as if it flushes some messages ... is that intended behaviour ?
>
The correct should be that pause_flush returns 1 if there was a pause
and when all pending messages were flushed. If that doesn't happen in
totemnet, the totemnet code should block waiting for a no error
condition.
Thanks for pointing this out, I'll sort out a fix for it.
> It's a consequence of overloading the return code to indicate not only
> whether the operation succeeded or not, but also whether it flushed any
> messages. Perhaps there should be a pass-by-reference parameter for
> &messages_flushed to keep them separate ?
>
>
> Chrissie
Thanks for a review of the patch
Regards
-steve
More information about the Openais
mailing list