[Openais] Re: FW: Evt Deadlock

Steven Dake sdake at mvista.com
Tue Jan 24 11:27:22 PST 2006


Muni,

If this happens again instruct your testers to send a SIGSEGV to your
application via kill.  Make sure to ulimit -c unlimited.  Then you can
use gdb to debug the core created and we can see what call paths the
deadlock occurs upon.  You can use the "threads" command to switch
between thread 0 1 etc.  This is the technique I used to find the AMF
crash.

This information would help us considerably find which locks are
contended upon (or if it is actually a mutex that is contended).

Also defect 1029 (merged) could result in this deadlock situtation if
the check failed in the handle destroy.  It would leave the handle
database mutex locked in an error condition (the handle was invalid
passed to saHandleDestroy.  Later accesses to this mutex would lock up
the multithreaded app.  This would point to another problem you may be
having in a caller to saHandleDestroy.  It sure would be nice to know
where that HandleDestroy call failed (the call stack) as it points at a
bug in the evt library if this is the result of the deadlock.  One rule
we have is that handles should always be valid passed to
saHandleDestroy.

If you want to help find the source of this handle destroy problem in
0.70.1 please apply the attached patch to your 0.70.1 and make sure to
save your core/sources if the assert occurs.

Mark, I'd take a second look at your saHandleDestroy calls as they may
have some kind of problem.

Regards
-steve

On Tue, 2006-01-24 at 08:54 -0600, Muni Bajpai wrote:
> Steve, 
> 
> Posting to the group as well.
> 
> -----Original Message-----
> From: Bajpai, Muni [RICH1:B670:EXCH] 
> Sent: Monday, January 23, 2006 4:25 PM
> To: 'Mark Haverkamp'
> Subject: RE: Evt Deadlock
> 
> SO we have one evt thread writing events and then there is this thread
> in question which was dispatching and then was told to exit by the
> application.
> 
> So it is definitely possible that a lock was held by the other thread
> doing on regular time intervals
> saEvtEventAllocate
> saEvtEventAttributesSet
> saEvtEventPublish
> saEvtEventFree
> 
> I'll do some more research too
> 
> Thanks
> 
> Muni
> 
> 
> -----Original Message-----
> From: Mark Haverkamp [mailto:markh at osdl.org] 
> Sent: Monday, January 23, 2006 4:12 PM
> To: Bajpai, Muni [RICH1:B670:EXCH]
> Subject: Re: Evt Deadlock
> 
> On Mon, 2006-01-23 at 15:37 -0600, Muni Bajpai wrote:
> > Hey Mark,
> > 
> >  
> > 
> > One of our testers came up with this issue after running about 24
> > hours of traffic. This is the version without your Evt fixes which I
> > just merged and have started testing. What I wanted to know if this
> > issue is fixed by your changes. Basically we were in shutdown mode and
> > were trying to do an saEvtChannelClose
> > 
> 
> This particular thing wasn't addressed by my previous fixes.  
> 
> I don't see where something could have the event handle database locked
> forever since it is taken and released inside the handle functions.  Do
> you know what the other threads were doing at the time?  Is it possible
> that some other thread was killed while it held the mutex?  Anyway, I'll
> keep looking at the code and see if I can figure out how it could
> deadlock.
> 
> Mark.
> 
> 
> 
> >  
> > 
> > Looks like saEvtEventFree is dead locked on 
> > 
> > error = saHandleInstanceGet(&event_handle_db, eventHandle,
> > 
> >             (void*)&edi);
> > 
> >  
> > 
> >  
> > 
> >  
> > 
> > #0  0xb747e2ab in saEvtEventFree (eventHandle=0) at evt.c:1378
> > #1  0xb747f67c in chanHandleInstanceDestructor (instance=0x80bd14c) at
> > evt.c:266
> > #2  0xb74785c7 in saHandleInstancePut (handleDatabase=0xb74801c0,
> > inHandle=7222815479134420992) at util.c:687
> > #3  0xb747dbac in saEvtChannelClose
> > (channelHandle=7222815479134420992) at evt.c:1074
> > #4  0x08054b02 in EvtHandler::cleanupEVT (this=0x80b76ec) at
> > EvtHandler.cpp:1013
> > #5  0x0805ce7d in HalManager::shutdown (this=0xb3fe3bb0,
> > reason=0x808756c "The heartbeat to the Sig has failed.") at
> > HalManager.cpp:1062
> > #6  0x0806c82f in SigHandler::handle_exception (this=0x80c5e40) at
> > SigHandler.cpp:907
> > #7  0xb754f5e6 in ACE_Select_Reactor_Notify::dispatch_notify ()
> > from /opt/mcp/lib/libACE.so.5.3.1
> > #8  0xb754f6b2 in ACE_Select_Reactor_Notify::handle_input ()
> > from /opt/mcp/lib/libACE.so.5.3.1
> > #9  0xb754f47e in ACE_Select_Reactor_Notify::dispatch_notifications ()
> > from /opt/mcp/lib/libACE.so.5.3.1
> > #10 0xb7542b83 in
> > ACE_Select_Reactor_T<ACE_Select_Reactor_Token_T<ACE_Token>
> > >::dispatch_notification_handlers ()
> >    from /opt/mcp/lib/libACE.so.5.3.1
> > #11 0xb7542a57 in
> > ACE_Select_Reactor_T<ACE_Select_Reactor_Token_T<ACE_Token> >::dispatch
> > () from /opt/mcp/lib/libACE.so.5.3.1
> > #12 0xb753fff4 in
> > ACE_Select_Reactor_T<ACE_Select_Reactor_Token_T<ACE_Token>
> > >::handle_events () from /opt/mcp/lib/libACE.so.5.3.1
> > #13 0xb754d6e8 in ACE_Reactor::run_reactor_event_loop ()
> > from /opt/mcp/lib/libACE.so.5.3.1
> > #14 0x0805978c in main (argc=3, argv=0xbfffeff4) at halMain.cpp:976
> > Current language:  auto; currently c
> > 
> > 
-------------- next part --------------
A non-text attachment was scrubbed...
Name: debug-patch-for-muni.patch
Type: text/x-patch
Size: 418 bytes
Desc: not available
Url : http://lists.linux-foundation.org/pipermail/openais/attachments/20060124/e1a462de/debug-patch-for-muni-0001.bin


More information about the Openais mailing list