[Openais] corosync crash and a hang

Steven Dake sdake at redhat.com
Mon Nov 3 16:27:46 PST 2008


Chrissie,

Thanks a bunch for the trace...  I was unable to reproduce the issue
after 50ish runs, but the following patch should fix it.  Could you give
it a spin?

The issue is the signal to the cond variable can be missed at startup if
the thread is immediately executed instead of scheduled later (race
condition).

thanks
-steve

On Mon, 2008-11-03 at 14:35 +0000, Christine Caulfield wrote:
> Since logsys2 was committed I can easily make corosync crash with any
> corosync-objctl command. It crashes in a very unhelpful way ... with
> mention of a GDB bug. So it might be stack related I suppose. In
> particular 'corosync-objctl -a' or 'corosync-objctl -w quorum.quorate=1'
> 
> Also sometimes I see startup hangs where corosync doesn't even get
> going. These look to be threads deadlocked on logsys condition variables:
> 
> (gdb) thr a a bt
> 
> 
> Thread 3 (Thread 0xb7b39b90 (LWP 6929)):
> #0  0x00110416 in __kernel_vsyscall ()
> #1  0x009f6ba5 in pthread_cond_wait@@GLIBC_2.3.2 () from
> /lib/libpthread.so.0
> #2  0x0806e3e9 in wthread_wait () at logsys.c:434
> #3  0x0806e27a in logsys_worker_thread (data=0x0) at logsys.c:452
> #4  0x009f332f in start_thread () from /lib/libpthread.so.0
> #5  0x0092e20e in clone () from /lib/libc.so.6
> 
> Thread 2 (Thread 0xb7f36230 (LWP 6928)):
> #0  0x00110416 in __kernel_vsyscall ()
> #1  0x00923a57 in poll () from /lib/libc.so.6
> #2  0x080532be in prioritized_timer_thread (data=0x0) at timer.c:123
> #3  0x009f332f in start_thread () from /lib/libpthread.so.0
> #4  0x0092e20e in clone () from /lib/libc.so.6
> 
> Thread 1 (Thread 0xb7f0b6c0 (LWP 6925)):
> #0  0x00110416 in __kernel_vsyscall ()
> #1  0x009f6ba5 in pthread_cond_wait@@GLIBC_2.3.2 () from
> /lib/libpthread.so.0
> #2  0x0806e3e9 in wthread_wait () at logsys.c:434
> #3  0x0806e47f in wthread_create () at logsys.c:511
> ---Type <return> to continue, or q <return> to quit---
> #4  0x0806e554 in _logsys_wthread_create () at logsys.c:540
> #5  0x0806ef8e in logsys_fork_completed () at logsys.c:790
> #6  0x0804dd89 in main (argc=2, argv=0xbfa36124) at main.c:644
> (gdb)
> 
> 
> Neither of these happened to me before the logsys2 commit.
> 
> Chrissie
> _______________________________________________
> Openais mailing list
> Openais at lists.linux-foundation.org
> https://lists.linux-foundation.org/mailman/listinfo/openais
-------------- next part --------------
A non-text attachment was scrubbed...
Name: wthread-wait-deadlock.patch
Type: text/x-patch
Size: 792 bytes
Desc: not available
Url : http://lists.linux-foundation.org/pipermail/openais/attachments/20081103/6a380807/attachment.bin 


More information about the Openais mailing list