[Openais] Strange behaviour of corosync
Steven Dake
sdake at redhat.com
Mon Mar 22 10:12:34 PDT 2010
On Mon, 2010-03-22 at 17:50 +0100, Andreas Mock wrote:
> Hi all,
>
> I'm using corosync 1.2 together with pacemaker 1.0.8 as found at clusterlabs.org
> on openSuSE 11.2.
>
> Now I have a situation where /etc/init.d/corosync status replies with a running instance
> of corosync.
> Log shows that the child services have stopped and "detached" but ps axf shows
> the following:
>
> 7211 ? Ssl 98:04 corosync
> 7219 ? Zs 0:00 \_ [stonithd] <defunct>
> 7220 ? Z 0:00 \_ [cib] <defunct>
> 7222 ? Z 0:00 \_ [attrd] <defunct>
> 7224 ? Z 0:00 \_ [crmd] <defunct>
>
>
My first guess is your shutting down pacemaker and running into a rare
deadlock that happens only during shutdown of corosync. Your in luck
though, because it is fixed in whitetank and pending release.
you can verify that if you would like by doing the following:
install corosync-debuginfo package:
gdb
attach 7211
thread apply all bt
I can tell by the backtraces if your hitting this problem.
Regards
-steve
> Uaaah, how can this be?
> corosync is logging nothing anymore. See the last log entries
> http://paste.org/pastebin/view/16608
>
> Any hints?
>
> Best regards
> Andreas Mock
> _______________________________________________
> Openais mailing list
> Openais at lists.linux-foundation.org
> https://lists.linux-foundation.org/mailman/listinfo/openais
More information about the Openais
mailing list