[Openais] I think I have hit a bug, but need confirmation

Jerome Martin jmartin at longphone.fr
Fri Aug 8 10:30:26 PDT 2008


Hi Steven,

Thanks for the answer.
My observation below ...

On Fri, 2008-08-08 at 09:31 -0700, Steven Dake wrote:
> The gather from state 11 means a node sent a join message.  The pastebin
> output makes it look like one node can't communicate with the others in
> one direction but possibly the other.

Ok, that clarifies it, thanks.

> There could be one of three problems:
> 1) firewall configured for the port on which your running openais
root at cougar1  ~  iptables -L
Chain INPUT (policy ACCEPT)
target     prot opt source               destination         

Chain FORWARD (policy ACCEPT)
target     prot opt source               destination         

Chain OUTPUT (policy ACCEPT)
target     prot opt source               destination         

Plus, there is only L2 switch between both nodes, and the problem is the
same with a local run (only one node).

> 2) Multiple cluster nodes with different security keys on the same
> network
This could be the case, I'll double check that after sending this
answer. But please note that I have the exact same behavior with one
node only... so I doubt this is the problem. Furthermore I do not
remember using authkeys, and secauth is off in the config file (sorry, I
am not familiar with this so I hope that we are talking about the same
thing here).

> 3) Your multicast switch is defective.  Have you tried a non-managed
> switch?
I've been using the same switches since the begining without a glitch,
moreover I use 224.0.0.1 as mcast address, so I am pretty confident it
is not a misconfigured mcast group problem.

> do you have a tcpdump?
As I told you, the problem is the same with only one node ...
Note that I use (but this was previously working) a vlan interface,
itself over a bond of two phy eths in failover mode.
Anyway, here is the dump attached ... seems there is direct
communication between both nodes here, plus the expected mcast traffic,
but as I do not know how the data is supposed to look like, I cannot say
more :-)

Just to add a little more background to it, I originally thought the
issue was due to a combination of me running 64bits (always did and it
was working before with both stock whitetank and whitetank patched for
pacemake) + upgrading libc6. But the libc6 upgrade was minimal and I
reversed it to no effect. I have upgraded a few other packages since
when it was working fine, but I do not see openais linking against
those, so I really am lost (libkrb53, openssl, openssh).

Of course the version I currently use is compiled against the libs
currently installed ....

Regards,
-- 
Jérôme Martin | LongPhone
Responsable Architecture Réseau
122, rue la Boetie | 75008 Paris
Tel :  +33 (0)1 56 26 28 44
Fax : +33 (0)1 56 26 28 45
Mail : jmartin at longphone.fr
Web : www.longphone.com <http://www.longphone.com>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: openais.cap
Type: application/octet-stream
Size: 362036 bytes
Desc: not available
Url : http://lists.linux-foundation.org/pipermail/openais/attachments/20080808/9aef2489/attachment-0001.obj 


More information about the Openais mailing list