[Openais] Corosync enters endless loop after hiccup in system

Dejan Muhamedagic dejan at suse.de
Tue Mar 30 04:00:25 PDT 2010


Hi,

On Tue, Mar 30, 2010 at 11:43:22AM +0200, Colin wrote:
> Hi All,
> 
> we are running Corosync 1.2.0-0ubuntu1 on Ubuntu 10.4 beta w/current
> updates; the cluster consists of two systems running in KVM, each on a
> dedicated host.
> 
> We have observed several times, but are unfortunately unable to nail
> the exact cause, that when the virtualised system that is running
> corosync has a "hiccup", i.e. hangs for couple of seconds when we
> introduce a delay into its storage access, then the corosync process
> enters an endless loop from which it doesn't ever seem to recover.
> 
> In this endless loop the process uses 193% CPU in the 2-CPU
> virtualised system, and is issuing a stream of wait4() system-calls
> (with an occasional nanosleep() and some futex-stuff).
> 
> ...?

It'd be good to kill -ABRT the process and then get the
backtrace with gdb. If you're running pacemaker, there's
hb_report to collect all relevant information (incl the
backtraces). Make sure that coredumps are allowed and install the
packages which contain the debugging information.

Thanks,

Dejan

> Thanks, Colin
> _______________________________________________
> Openais mailing list
> Openais at lists.linux-foundation.org
> https://lists.linux-foundation.org/mailman/listinfo/openais


More information about the Openais mailing list