[Openais] corosync crash and a hang
Steven Dake
sdake at redhat.com
Mon Nov 3 16:27:46 PST 2008
Chrissie,
Thanks a bunch for the trace... I was unable to reproduce the issue
after 50ish runs, but the following patch should fix it. Could you give
it a spin?
The issue is the signal to the cond variable can be missed at startup if
the thread is immediately executed instead of scheduled later (race
condition).
thanks
-steve
On Mon, 2008-11-03 at 14:35 +0000, Christine Caulfield wrote:
> Since logsys2 was committed I can easily make corosync crash with any
> corosync-objctl command. It crashes in a very unhelpful way ... with
> mention of a GDB bug. So it might be stack related I suppose. In
> particular 'corosync-objctl -a' or 'corosync-objctl -w quorum.quorate=1'
>
> Also sometimes I see startup hangs where corosync doesn't even get
> going. These look to be threads deadlocked on logsys condition variables:
>
> (gdb) thr a a bt
>
>
> Thread 3 (Thread 0xb7b39b90 (LWP 6929)):
> #0 0x00110416 in __kernel_vsyscall ()
> #1 0x009f6ba5 in pthread_cond_wait@@GLIBC_2.3.2 () from
> /lib/libpthread.so.0
> #2 0x0806e3e9 in wthread_wait () at logsys.c:434
> #3 0x0806e27a in logsys_worker_thread (data=0x0) at logsys.c:452
> #4 0x009f332f in start_thread () from /lib/libpthread.so.0
> #5 0x0092e20e in clone () from /lib/libc.so.6
>
> Thread 2 (Thread 0xb7f36230 (LWP 6928)):
> #0 0x00110416 in __kernel_vsyscall ()
> #1 0x00923a57 in poll () from /lib/libc.so.6
> #2 0x080532be in prioritized_timer_thread (data=0x0) at timer.c:123
> #3 0x009f332f in start_thread () from /lib/libpthread.so.0
> #4 0x0092e20e in clone () from /lib/libc.so.6
>
> Thread 1 (Thread 0xb7f0b6c0 (LWP 6925)):
> #0 0x00110416 in __kernel_vsyscall ()
> #1 0x009f6ba5 in pthread_cond_wait@@GLIBC_2.3.2 () from
> /lib/libpthread.so.0
> #2 0x0806e3e9 in wthread_wait () at logsys.c:434
> #3 0x0806e47f in wthread_create () at logsys.c:511
> ---Type <return> to continue, or q <return> to quit---
> #4 0x0806e554 in _logsys_wthread_create () at logsys.c:540
> #5 0x0806ef8e in logsys_fork_completed () at logsys.c:790
> #6 0x0804dd89 in main (argc=2, argv=0xbfa36124) at main.c:644
> (gdb)
>
>
> Neither of these happened to me before the logsys2 commit.
>
> Chrissie
> _______________________________________________
> Openais mailing list
> Openais at lists.linux-foundation.org
> https://lists.linux-foundation.org/mailman/listinfo/openais
-------------- next part --------------
A non-text attachment was scrubbed...
Name: wthread-wait-deadlock.patch
Type: text/x-patch
Size: 792 bytes
Desc: not available
Url : http://lists.linux-foundation.org/pipermail/openais/attachments/20081103/6a380807/attachment.bin
More information about the Openais
mailing list