[Openais] corosync crash and a hang

Steven Dake sdake at redhat.com
Wed Nov 5 08:51:45 PST 2008


On Tue, 2008-11-04 at 08:22 +0000, Christine Caulfield wrote:
> Steven Dake wrote:
> > Find attached a patch which fixes the segfault you are seeing with
> > objctl.
> > 
> > The issue was that strncpy was not null terminating a string if the src
> > length and the copy length were the same (as per manpage).  This
> > resulted in negative length memcpy (which is actually a pretty big
> > memcpy:) operations in the formatting routines which caused all kinds of
> > havoc.
> > 
> > This patch also increases the subsystem width to 6 bytes wide instead of
> > 5.  I had forgotten about the added length of confdb in our subsystems.
> > 
> 
> Thanks Steve,
> 
> Both of those patches work find for me.
> 
> I wonder if it's worth adding a warning about long logsys names. I did
> have one called "TESTQUORUM" for a while !


Long logsys names will be truncated to the logging field width (6 bytes)
in the log file.

Regards
-steve


> 
> Chrissie
> > 
> > On Mon, 2008-11-03 at 14:35 +0000, Christine Caulfield wrote:
> >> Since logsys2 was committed I can easily make corosync crash with any
> >> corosync-objctl command. It crashes in a very unhelpful way ... with
> >> mention of a GDB bug. So it might be stack related I suppose. In
> >> particular 'corosync-objctl -a' or 'corosync-objctl -w quorum.quorate=1'
> >>
> >> Also sometimes I see startup hangs where corosync doesn't even get
> >> going. These look to be threads deadlocked on logsys condition variables:
> >>
> >> (gdb) thr a a bt
> >>
> >>
> >> Thread 3 (Thread 0xb7b39b90 (LWP 6929)):
> >> #0  0x00110416 in __kernel_vsyscall ()
> >> #1  0x009f6ba5 in pthread_cond_wait@@GLIBC_2.3.2 () from
> >> /lib/libpthread.so.0
> >> #2  0x0806e3e9 in wthread_wait () at logsys.c:434
> >> #3  0x0806e27a in logsys_worker_thread (data=0x0) at logsys.c:452
> >> #4  0x009f332f in start_thread () from /lib/libpthread.so.0
> >> #5  0x0092e20e in clone () from /lib/libc.so.6
> >>
> >> Thread 2 (Thread 0xb7f36230 (LWP 6928)):
> >> #0  0x00110416 in __kernel_vsyscall ()
> >> #1  0x00923a57 in poll () from /lib/libc.so.6
> >> #2  0x080532be in prioritized_timer_thread (data=0x0) at timer.c:123
> >> #3  0x009f332f in start_thread () from /lib/libpthread.so.0
> >> #4  0x0092e20e in clone () from /lib/libc.so.6
> >>
> >> Thread 1 (Thread 0xb7f0b6c0 (LWP 6925)):
> >> #0  0x00110416 in __kernel_vsyscall ()
> >> #1  0x009f6ba5 in pthread_cond_wait@@GLIBC_2.3.2 () from
> >> /lib/libpthread.so.0
> >> #2  0x0806e3e9 in wthread_wait () at logsys.c:434
> >> #3  0x0806e47f in wthread_create () at logsys.c:511
> >> ---Type <return> to continue, or q <return> to quit---
> >> #4  0x0806e554 in _logsys_wthread_create () at logsys.c:540
> >> #5  0x0806ef8e in logsys_fork_completed () at logsys.c:790
> >> #6  0x0804dd89 in main (argc=2, argv=0xbfa36124) at main.c:644
> >> (gdb)
> >>
> >>
> >> Neither of these happened to me before the logsys2 commit.
> >>



More information about the Openais mailing list