[Openais] Strange behaviour of corosync

Steven Dake sdake at redhat.com
Mon Mar 22 11:25:27 PDT 2010


On Mon, 2010-03-22 at 19:31 +0100, Andreas Mock wrote:
> -----Ursprüngliche Nachricht-----
> Von: Steven Dake <sdake at redhat.com>
> Gesendet: 22.03.2010 18:12:34
> An: Andreas Mock <Andreas.Mock at web.de>
> Betreff: Re: [Openais] Strange behaviour of corosync
> 
> >On Mon, 2010-03-22 at 17:50 +0100, Andreas Mock wrote:
> >> Hi all,
> >>
> >> I'm using corosync 1.2 together with pacemaker 1.0.8 as found at clusterlabs.org
> >> on openSuSE 11.2.
> >>
> >> Now I have a situation where /etc/init.d/corosync status replies with a running instance
> >> of corosync.
> >> Log shows that the child services have stopped and "detached" but ps axf shows
> >> the following:
> >>
> >> 7211 ? Ssl 98:04 corosync
> >> 7219 ? Zs 0:00 \_ [stonithd] <defunct>
> >> 7220 ? Z 0:00 \_ [cib] <defunct>
> >> 7222 ? Z 0:00 \_ [attrd] <defunct>
> >> 7224 ? Z 0:00 \_ [crmd] <defunct>
> >>
> >>
> >
> >My first guess is your shutting down pacemaker and running into a rare
> >deadlock that happens only during shutdown of corosync. Your in luck
> >though, because it is fixed in whitetank and pending release.
> 
> By the way: top shows the following
> 7211 root RT 0 150m 4268 1616 S 200 0.0 297:31.99 corosync
> 
> 200% CPU: That's what I call overbooking!
> 
> Is this another sympton for this deadlock situation?
> Something polling like crazy?
> 

I am not certain if the cpu pegs.  A backtrace would be very helpful to
ensure you haven't uncovered a different issue.

Regards
-steve

> >you can verify that if you would like by doing the following:
> >install corosync-debuginfo package:
> >gdb
> >attach 7211
> >thread apply all bt
> 
> I'll try, but I don't get enough cpu cylces to install the additional package.
> 
> Best regards
> Andreas Mock



More information about the Openais mailing list