[Openais] Strange behaviour of corosync

Andreas Mock Andreas.Mock at web.de
Mon Mar 22 11:31:43 PDT 2010


-----Ursprüngliche Nachricht-----
Von: Steven Dake <sdake at redhat.com>
Gesendet: 22.03.2010 18:12:34
An: Andreas Mock <Andreas.Mock at web.de>
Betreff: Re: [Openais] Strange behaviour of corosync

>On Mon, 2010-03-22 at 17:50 +0100, Andreas Mock wrote:
>> Hi all,
>>
>> I'm using corosync 1.2 together with pacemaker 1.0.8 as found at clusterlabs.org
>> on openSuSE 11.2.
>>
>> Now I have a situation where /etc/init.d/corosync status replies with a running instance
>> of corosync.
>> Log shows that the child services have stopped and "detached" but ps axf shows
>> the following:
>>
>> 7211 ? Ssl 98:04 corosync
>> 7219 ? Zs 0:00 \_ [stonithd] <defunct>
>> 7220 ? Z 0:00 \_ [cib] <defunct>
>> 7222 ? Z 0:00 \_ [attrd] <defunct>
>> 7224 ? Z 0:00 \_ [crmd] <defunct>
>>
>>
>
>My first guess is your shutting down pacemaker and running into a rare
>deadlock that happens only during shutdown of corosync. Your in luck
>though, because it is fixed in whitetank and pending release.

By the way: top shows the following
7211 root RT 0 150m 4268 1616 S 200 0.0 297:31.99 corosync

200% CPU: That's what I call overbooking!

Is this another sympton for this deadlock situation?
Something polling like crazy?

>you can verify that if you would like by doing the following:
>install corosync-debuginfo package:
>gdb
>attach 7211
>thread apply all bt

I'll try, but I don't get enough cpu cylces to install the additional package.

Best regards
Andreas Mock


More information about the Openais mailing list