[Openais] [whitetank / corosync trunk] Fix checkpoint sync in certain scenarios

Ratbag Patrick ratbag at live.com
Sat Nov 8 09:06:42 PST 2008


Hi Steve,
 
It seems like the fix has connection with the potential bug report "[Openais] RE: Need help to reduce the time wait‏ of saRecvRetry()‏" which I called several weeks before. When the node start to open or read a checkpoint it will tried to resync the checkpoint so it cost a lot of times, is that right? So does this diff fix the bugs? If so I will rebuild the test environment to see whether the saRecvRetry() time delay while phy connection lost problem has gone.
Thanks.
 
Best
Rat> From: sdake at redhat.com> To: beekhof at gmail.com> Date: Fri, 7 Nov 2008 14:43:47 -0700> CC: openais at lists.osdl.org> Subject: Re: [Openais] [whitetank / corosync trunk] Fix checkpoint sync in certain scenarios> > it is the starting not the exiting that is at issue.> > specifically when two nodes are synchronizing and a lower IP addressed> machine starts up, it triggers an abort in the other synchronization> process and then those nodes completely fail to synchronize.> > So I think it probably effects you if you use the checkpoint service.> > Regards> -steve> > On Fri, 2008-11-07 at 14:48 +0100, Andrew Beekhof wrote:> > On Fri, Nov 7, 2008 at 10:24, Steven Dake <sdake at redhat.com> wrote:> > > In a certain rare scenario, the checkpoint service throws away the> > > current checkpoint database.> > >> > > An example of when this occurs is when there are 3 nodes A, B, C, node A> > > and C are killed> > > > Does it have to be killed, or could shutdown trigger this too?> > > > > then node B syncs. After this completes, Node C is> > > started and node B again begins resyncing, but during this sync process> > > node A starts up.> > >> > > This results in node b no longer believing it is required to sync its> > > current database contents. The abort called on node b throws away all> > > checkpoints in the system but since node b is no longer the lowest node> > > id in the system it believes it doesn't have to sync.> > >> > > The design change is that once a node has been declared as a responsible> > > for synchronization, any aborts or configuration changes will never> > > change the fact that node is still responsible for synchronization.> > >> > > Regards> > > -steve> > >> > > _______________________________________________> > > Openais mailing list> > > Openais at lists.linux-foundation.org> > > https://lists.linux-foundation.org/mailman/listinfo/openais> > >> > _______________________________________________> Openais mailing list> Openais at lists.linux-foundation.org> https://lists.linux-foundation.org/mailman/listinfo/openais
_________________________________________________________________
Connect to the next generation of MSN Messenger 
http://imagine-msn.com/messenger/launch80/default.aspx?locale=en-us&source=wlmailtagline
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.linux-foundation.org/pipermail/openais/attachments/20081109/0d6c7197/attachment.htm 


More information about the Openais mailing list