[Openais] Re: strange behaviour with testclm

Steven Dake sdake at mvista.com
Tue Jun 29 12:50:19 PDT 2004


On Tue, 2004-06-29 at 11:20, Chris Friesen wrote:
> I'm seeing some strange stuff, and I'm wondering if this is expected.  I'm 
> working with current bk version.
> 
> 
> I've got two nodes, a and b.  I run the executive on a, then on b.  a shows b 
> joining.  I then kill the executive on b. I get the following output on a:
> 
> L(3): Token loss in OPERATIONAL.
> L(4): entering GATHER state.
> L(4): SENDING attempt join because this node is ring rep.
> L(4): No members sent join, keeping old ring and transitioning to operational.
> 
> 
> I don't see anything there about b leaving the cluster.
> 

The executive won't print messages about a node leaving the cluster.  It
either prints out the new configuration, or the message above indicating
the node is the only member of the configuration.  This is a little
confusing..  I'd like to clean up these notices from the executive..

> If I then run testclm on a, it prints out information about both a and b.  Why? 
>   b shouldn't be in the cluster.

Indeed it does.  This is a bug.  Atleast the executive knows the
memberhsip is :)  It looks like the code in the exec/clm.c is not
working correctly.  This code handles communication with the API and
keeps track of the membership.

> 
> If I start, then stop the executive on b again, testclm does not print anything 
> out.  Presumably it should show b joining then leaving.

At one time this stuff was working perfectly :)  I think I missed a
checkin somewhere since the auth code for the membership service was
completely missing.  I'll take a look at getting this fixed

> If I stop, then start the executive on a, testclm is stuck in an infinite loop 
> on select() and is unable to reconnect with the executive.
> 

This is an issue that I'm not sure how to solve (and isn't supposed to
happen except in a failure case in which case the components will
failover).  One possibility is to reconnect all of the API connections
but this is pretty complicated and I'm not sure you would want to
proceed on that processor under such an error.

The API should return SA_ERR_LIBRARY in this case.  What was the error
code you received?

Thanks
-steve

> 
> Is this design intent?
> 
> 
> Chris




More information about the Openais mailing list