[Openais] SOLVED Re: I think I have hit a bug, but need confirmation

Jerome Martin jmartin at longphone.fr
Sun Aug 10 14:13:07 PDT 2008


Hi Steven, Andrew,

On Sun, 2008-08-10 at 13:24 -0700, Steven Dake wrote:
> Our policy regarding your issue is that we require the operating
> system
> multicast to operate properly (which it doesn't), we require the
> multicast hardware switch to operate properly, and we require a basic
> posix api to make all this a reality.

I read you loud and clear.

However, I might be picky, but I make a differentce between a member
node OS/network malfunction (which is not what happens in that
situation) preventing it from being part of the cluster (which would be
only fair :-) ) and a misbehaving node preventing OTHER nodes (in that
case ALL of them) to continue functionning properly. Please read
further, because I am not jumping on openAIS back there...

The case at hand, however, is not as simple as that. We are in a
situation where one misbehaving node triggers what I will call a
"membership event storm". That storm does not in fact prevent openAIS
from functionning, and membership for operationnal cluster nodes is
preserved. However, when used in conjunction with pacemaker (and I would
bet that other services might be impacted by this), the storm being
forwarded to the service level has very bad consequences, preventing the
WHOLE cluster from functionning properly. It is, IMHO, a weakness caused
by two factors :

1) Lack of robustness on the pacemaker side (Andrew, this one's for
you :-) ).

2) Useless forwarding of event which DO NOT CHANGE THE CLUSTER STATE
(this one is on openAIS, the fact that the membership storm is being
propagated instead of just being "buffered" as it does not change the
state of the membership world)

The way I see it, as an end-user of a stack of various software aimed at
providing HA (openAIS + pacemaker + heartbeat lrmd), including tolerance
for nodes faults and a high level of relilience/robustness, is that this
is typically the kind of node malfunction that should be handled
gracefully by my cluster. Which is not the case.

Of course, I I were Andrew, I could point at STONITH to kill that
misbehaving node. But what if the storm prevents pacemaker to actually
schedule the STONITH event (as in fact it prevented pacemaker to
schedule ANY action until crmd crashed in my test case) ?

If I were you Steven, I could easilly point a finger at pacemaker being
solely responsible for this ...

But we all know that in order to acheive the best-in-class cluster
resiliency, we need each and every link of the chain to be as robust and
forgiving as possible. Forgiving here, for me, means try to avoid
putting unnecessary burden on the next one, in that case forwarding that
membership event storm.

Please bear with my limited insight of the philosophy and internals of
the openAIS stack, maybe I am overseeing a design prerequisite that
implies to forward membership events even if they do not actually change
the members database contents ...

[...]
> I'd suggest reporting a bug to the maintainer of vserver on the
> multicast not being bound properly.  That is a bug in their driver
> software.  Mutlicast is required by ipv6 networks.

Multicast is working fine in the dev branch of linux-vservers, and I was
not in fact trying to make it work on my compile vservers machines. This
was just a side-effect of installing openAIS packages there to compile
pacemaker. Still, I think this is only one example of how such a one-way
broadcast can happen, and is vserver is not the point of focus in the
broader question I am rising now (totally separate from understanding
WHY the initial issue happened, thanks to your help in decoding my logs,
Steven).

Please, Steven and Andrew, talk to me and to each other about this,
because I am in no position to decide at which level(s) it is more
meaningfull to improve your software behaviors, but as an end-user, I
clearly see a robustness issue which I feel should be addresses if we
agree on the notion of "fault tolerance" (litteraly), which is at the
very heart of any HA cluster.

Sidenote: Andrew, should I crosspost this to linux-ha ML with a summary
of the actual scenario ?

Regards,
-- 
Jérôme Martin | LongPhone
Responsable Architecture Réseau
122, rue la Boetie | 75008 Paris
Tel :  +33 (0)1 56 26 28 44
Fax : +33 (0)1 56 26 28 45
Mail : jmartin at longphone.fr
Web : www.longphone.com <http://www.longphone.com>



More information about the Openais mailing list