[Openais] SOLVED Re: I think I have hit a bug, but need confirmation

Jerome Martin jmartin at longphone.fr
Sat Aug 9 03:15:41 PDT 2008


Hi Steven, Andrew,

I identified the issue described in "[Openais] I think I have hit a bug,
but need confirmation". It was my fault, however I am not sure what
happens should not be handled more "gracefully" by openais and/or
pacemaker (note for beekhof: not sure if openais behavior during this
misconfiguration is wrong or pacemaker should be more resilient to it,
because the reason why my problem happens could occur in production
network with faulty config or as a DoS attack, and this is VERY bad that
it crashed pacemaker or render it unusable).

What happens is that apart from the bare-metal machines that are used as
cluster nodes, I have, on some othe bare-metal servers (same eth
segment, same subnet), some compile machines running as linux-vservers.
On those, I do compile openAIS, pacemaker, etc. In order to compile
pacemaker, I have installed on them the openAIS package that I built, so
it was in fact running on those vservers. BUT, as in the vserver setup I
am using the mcast addresses cannot be set on the interface from inside
the vserver, those vservers where in fact sending notifications to the
mcast address configured (triggering the config change and GATHER from
11 seen in my logs on the bare-metal nodes), but were unable to receive
any replies via the mcast group. 

Note that I can reproduce this at will by running the aisexec with
pacemaker patches on the vserver (Andrew, please have a look !) but not
when running the aisexec from the stock openais.org 0.80.3 tarball. I do
not really know why and won't investigate being short on time for now.
All I know is that one misconfigured openais on an eth segment with
production machines can break havoc in a pacemaker network, and probably
in any openais production setup.

Thanks a lot Steven for your answers which definately helped me nail the
issue, which was in fact totally unrelated to 32/64 bits and/or libc6
version.

I would appreciate your thoughts about the potential risk that this
issue resolution highlights, and what is your advice/philospophy
regarding it. If you do not consider it as a bug but as a part of
openAIS design, please explain the recommended administrative
policy/configuration you recommend to avoid this risk.

Regards,
-- 
Jérôme Martin | LongPhone
Responsable Architecture Réseau
122, rue la Boetie | 75008 Paris
Tel :  +33 (0)1 56 26 28 44
Fax : +33 (0)1 56 26 28 45
Mail : jmartin at longphone.fr
Web : www.longphone.com <http://www.longphone.com>



More information about the Openais mailing list