[Openais] [whitetank / corosync trunk] Fix checkpoint sync in certain scenarios

Fri Nov 7 13:43:47 PST 2008

it is the starting not the exiting that is at issue.

specifically when two nodes are synchronizing and a lower IP addressed
machine starts up, it triggers an abort in the other synchronization
process and then those nodes completely fail to synchronize.

So I think it probably effects you if you use the checkpoint service.

Regards
-steve

On Fri, 2008-11-07 at 14:48 +0100, Andrew Beekhof wrote:
> On Fri, Nov 7, 2008 at 10:24, Steven Dake <sdake at redhat.com> wrote:
> > In a certain rare scenario, the checkpoint service throws away the
> > current checkpoint database.
> >
> > An example of when this occurs is when there are 3 nodes A, B, C, node A
> > and C are killed
> 
> Does it have to be killed, or could shutdown trigger this too?
> 
> > then node B syncs.  After this completes, Node C is
> > started and node B again begins resyncing, but during this sync process
> > node A starts up.
> >
> > This results in node b no longer believing it is required to sync its
> > current database contents.  The abort called on node b throws away all
> > checkpoints in the system but since node b is no longer the lowest node
> > id in the system it believes it doesn't have to sync.
> >
> > The design change is that once a node has been declared as a responsible
> > for synchronization, any aborts or configuration changes will never
> > change the fact that node is still responsible for synchronization.
> >
> > Regards
> > -steve
> >
> > _______________________________________________
> > Openais mailing list
> > Openais at lists.linux-foundation.org
> > https://lists.linux-foundation.org/mailman/listinfo/openais
> >