[Openais] Strange behaviour of corosync

Tue Mar 23 08:35:01 PDT 2010

On Tue, Mar 23, 2010 at 12:28 AM, Steven Dake <sdake at redhat.com> wrote:
> On Tue, 2010-03-23 at 00:18 +0100, Andreas Mock wrote:
>> -----Ursprüngliche Nachricht-----
>> Von: Steven Dake <sdake at redhat.com>
>> Gesendet: 22.03.2010 22:56:03
>> An: Andreas Mock <Andreas.Mock at web.de>
>> Betreff: Re: [Openais] Strange behaviour of corosync
>>
>> >
>> >Thank you for going to the trouble of gathering a backtrace.  This is a
>> >different defect fixed in openais which we couldn't duplicate in
>> >corosync.  The problem is line #18 pthread_join() after an exit
>> >function.  This means pthread_join() was called in an atexit() handler
>> >which posix is iffy on.
>>
>>
>> Hi Steven,
>>
>> this error showed IMHO room for improvement at another piece of code.
>> After your response I knew that the corosync process is not needed any more and
>> I wanted to realease the cpu from their 200%CPU usage burden. ;-)
>>
>> A /etc/init.d/corosync stop ended in printing:
>> Signaling Corosync Cluster Engine (corosync) to terminate: [  OK  ]
>> Waiting for corosync services to unload:.......   many many many dots
>>
>> Probably the rc-script should be changed in a way that after waiting for
>> corosync to stop gracefully for a certain amount of time the script
>> should hit corosync with a kill -9. What do you think?
>>
>
> Andreas,
>
> Andrew really did all the work on the init script so he should comment.
> I believe it is designed to allow pacemaker to shutdown in an orderly
> fashion as to not stonith the node (which may happen with a kill -9).

Correct.  kill -9 == bad.