[Openais] Strange behaviour of corosync

Steven Dake sdake at redhat.com
Mon Mar 22 10:12:34 PDT 2010


On Mon, 2010-03-22 at 17:50 +0100, Andreas Mock wrote:
> Hi all,
> 
> I'm using corosync 1.2 together with pacemaker 1.0.8 as found at clusterlabs.org
> on openSuSE 11.2.
> 
> Now I have a situation where /etc/init.d/corosync status replies with a running instance
> of corosync.
> Log shows that the child services have stopped and "detached" but ps axf shows
> the following:
> 
> 7211 ?        Ssl   98:04 corosync
>  7219 ?        Zs     0:00  \_ [stonithd] <defunct>
>  7220 ?        Z      0:00  \_ [cib] <defunct>
>  7222 ?        Z      0:00  \_ [attrd] <defunct>
>  7224 ?        Z      0:00  \_ [crmd] <defunct>
> 
> 

My first guess is your shutting down pacemaker and running into a rare
deadlock that happens only during shutdown of corosync.  Your in luck
though, because it is fixed in whitetank and pending release.

you can verify that if you would like by doing the following:
install corosync-debuginfo package:
gdb
attach 7211
thread apply all bt

I can tell by the backtraces if your hitting this problem.

Regards
-steve


> Uaaah, how can this be?
> corosync is logging nothing anymore. See the last log entries
> http://paste.org/pastebin/view/16608
> 
> Any hints?
> 
> Best regards
> Andreas Mock
> _______________________________________________
> Openais mailing list
> Openais at lists.linux-foundation.org
> https://lists.linux-foundation.org/mailman/listinfo/openais



More information about the Openais mailing list