[Openais] [corosync] [patch] - Fix problems with long token timeout and cpg

David Teigland teigland at redhat.com
Thu Jul 2 07:27:55 PDT 2009


On Thu, Jul 02, 2009 at 01:15:18PM +0200, Jan Friesse wrote:
> David Teigland wrote:
> > On Wed, Jul 01, 2009 at 01:46:03PM -0500, David Teigland wrote:
> >> other nodes should immediately recognize it has
> >> previously failed and process a complete failure for it.
> > 
> > i.e. the full equivalent to what apps (using any api's) would see if the
> > node had failed via normal token timeout.
>
> More or less agree, but does this patch fixed problem for you or not?

I haven't tried the patch, but based on the description and a quick look at
the patch, I don't think it helps.  Think more broadly about what's happening
here, don't focus on one particular effect.

1. nodes 1,2,3,4: are cluster members
2. nodes 1,2,3,4: are using services A,B,C,D
3. node4: ifdown eth0, kill corosync
4. node4: ifup eth0, start corosync
5. node4: do not start/use any services
6. nodes 1,2,3: never see node4 removed from membership
7. nodes 1,2,3: services A,B,C,D never see node4 removed/fail

Dave



More information about the Openais mailing list