[Openais] RE: FW: Evt Deadlock
Steven Dake
sdake at mvista.com
Tue Jan 24 13:29:49 PST 2006
On Tue, 2006-01-24 at 14:34 -0600, Muni Bajpai wrote:
> Steve Defect 1029 was merged Jan 20th. This issue is from a December
> 21st picacho view i.e 0.70.
>
Yes 1029 may have fixed this problem.
> I agree with the gdb technique but unfortunately this time the tester
> was in too much of a hurry and just sigkilled it.
>
> In the 0.70 view I don't see how HandleDestroy can leave a mutex locked.
>
Here is the code:
if (check != handleDatabase->handles[handle].check) {
error = SA_AIS_ERR_BAD_HANDLE;
goto error_exit;
^^^ here is the error condition
}
handleDatabase->handles[handle].state =
SA_HANDLE_STATE_PENDINGREMOVAL;
error_exit:
pthread_mutex_unlock (&handleDatabase->mutex);
saHandleInstancePut (handleDatabase, inHandle);
The extra handle instance put on an invalid data area could cause the
problem.
> I doubt this is reproducible on demand so this Info is all we have for
> now.
>
> Will try to get this reproduced though
>
> Thanks
>
> Muni
>
> SaErrorT
> saHandleDestroy (
> struct saHandleDatabase *handleDatabase,
> SaUint64T inHandle)
> {
> SaAisErrorT error = SA_AIS_OK;
> uint32_t check = inHandle >> 32;
> uint32_t handle = inHandle & 0xffffffff;
>
> pthread_mutex_lock (&handleDatabase->mutex);
>
> if (check != handleDatabase->handles[handle].check) {
> error = SA_AIS_ERR_BAD_HANDLE;
> goto error_exit;
> }
>
> handleDatabase->handles[handle].state =
> SA_HANDLE_STATE_PENDINGREMOVAL;
>
> error_exit:
> pthread_mutex_unlock (&handleDatabase->mutex);
>
> saHandleInstancePut (handleDatabase, inHandle);
>
> return (error);
> }
>
> -----Original Message-----
> From: Steven Dake [mailto:sdake at mvista.com]
> Sent: Tuesday, January 24, 2006 1:27 PM
> To: Bajpai, Muni [RICH1:B670:EXCH]
> Cc: scd at broked.org; openais at lists.osdl.org
> Subject: Re: FW: Evt Deadlock
>
> Muni,
>
> If this happens again instruct your testers to send a SIGSEGV to your
> application via kill. Make sure to ulimit -c unlimited. Then you can
> use gdb to debug the core created and we can see what call paths the
> deadlock occurs upon. You can use the "threads" command to switch
> between thread 0 1 etc. This is the technique I used to find the AMF
> crash.
>
> This information would help us considerably find which locks are
> contended upon (or if it is actually a mutex that is contended).
>
> Also defect 1029 (merged) could result in this deadlock situtation if
> the check failed in the handle destroy. It would leave the handle
> database mutex locked in an error condition (the handle was invalid
> passed to saHandleDestroy. Later accesses to this mutex would lock up
> the multithreaded app. This would point to another problem you may be
> having in a caller to saHandleDestroy. It sure would be nice to know
> where that HandleDestroy call failed (the call stack) as it points at a
> bug in the evt library if this is the result of the deadlock. One rule
> we have is that handles should always be valid passed to
> saHandleDestroy.
>
> If you want to help find the source of this handle destroy problem in
> 0.70.1 please apply the attached patch to your 0.70.1 and make sure to
> save your core/sources if the assert occurs.
>
> Mark, I'd take a second look at your saHandleDestroy calls as they may
> have some kind of problem.
>
> Regards
> -steve
>
> On Tue, 2006-01-24 at 08:54 -0600, Muni Bajpai wrote:
> > Steve,
> >
> > Posting to the group as well.
> >
> > -----Original Message-----
> > From: Bajpai, Muni [RICH1:B670:EXCH]
> > Sent: Monday, January 23, 2006 4:25 PM
> > To: 'Mark Haverkamp'
> > Subject: RE: Evt Deadlock
> >
> > SO we have one evt thread writing events and then there is this thread
> > in question which was dispatching and then was told to exit by the
> > application.
> >
> > So it is definitely possible that a lock was held by the other thread
> > doing on regular time intervals
> > saEvtEventAllocate
> > saEvtEventAttributesSet
> > saEvtEventPublish
> > saEvtEventFree
> >
> > I'll do some more research too
> >
> > Thanks
> >
> > Muni
> >
> >
> > -----Original Message-----
> > From: Mark Haverkamp [mailto:markh at osdl.org]
> > Sent: Monday, January 23, 2006 4:12 PM
> > To: Bajpai, Muni [RICH1:B670:EXCH]
> > Subject: Re: Evt Deadlock
> >
> > On Mon, 2006-01-23 at 15:37 -0600, Muni Bajpai wrote:
> > > Hey Mark,
> > >
> > >
> > >
> > > One of our testers came up with this issue after running about 24
> > > hours of traffic. This is the version without your Evt fixes which I
> > > just merged and have started testing. What I wanted to know if this
> > > issue is fixed by your changes. Basically we were in shutdown mode
> and
> > > were trying to do an saEvtChannelClose
> > >
> >
> > This particular thing wasn't addressed by my previous fixes.
> >
> > I don't see where something could have the event handle database
> locked
> > forever since it is taken and released inside the handle functions.
> Do
> > you know what the other threads were doing at the time? Is it
> possible
> > that some other thread was killed while it held the mutex? Anyway,
> I'll
> > keep looking at the code and see if I can figure out how it could
> > deadlock.
> >
> > Mark.
> >
> >
> >
> > >
> > >
> > > Looks like saEvtEventFree is dead locked on
> > >
> > > error = saHandleInstanceGet(&event_handle_db, eventHandle,
> > >
> > > (void*)&edi);
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > > #0 0xb747e2ab in saEvtEventFree (eventHandle=0) at evt.c:1378
> > > #1 0xb747f67c in chanHandleInstanceDestructor (instance=0x80bd14c)
> at
> > > evt.c:266
> > > #2 0xb74785c7 in saHandleInstancePut (handleDatabase=0xb74801c0,
> > > inHandle=7222815479134420992) at util.c:687
> > > #3 0xb747dbac in saEvtChannelClose
> > > (channelHandle=7222815479134420992) at evt.c:1074
> > > #4 0x08054b02 in EvtHandler::cleanupEVT (this=0x80b76ec) at
> > > EvtHandler.cpp:1013
> > > #5 0x0805ce7d in HalManager::shutdown (this=0xb3fe3bb0,
> > > reason=0x808756c "The heartbeat to the Sig has failed.") at
> > > HalManager.cpp:1062
> > > #6 0x0806c82f in SigHandler::handle_exception (this=0x80c5e40) at
> > > SigHandler.cpp:907
> > > #7 0xb754f5e6 in ACE_Select_Reactor_Notify::dispatch_notify ()
> > > from /opt/mcp/lib/libACE.so.5.3.1
> > > #8 0xb754f6b2 in ACE_Select_Reactor_Notify::handle_input ()
> > > from /opt/mcp/lib/libACE.so.5.3.1
> > > #9 0xb754f47e in ACE_Select_Reactor_Notify::dispatch_notifications
> ()
> > > from /opt/mcp/lib/libACE.so.5.3.1
> > > #10 0xb7542b83 in
> > > ACE_Select_Reactor_T<ACE_Select_Reactor_Token_T<ACE_Token>
> > > >::dispatch_notification_handlers ()
> > > from /opt/mcp/lib/libACE.so.5.3.1
> > > #11 0xb7542a57 in
> > > ACE_Select_Reactor_T<ACE_Select_Reactor_Token_T<ACE_Token>
> >::dispatch
> > > () from /opt/mcp/lib/libACE.so.5.3.1
> > > #12 0xb753fff4 in
> > > ACE_Select_Reactor_T<ACE_Select_Reactor_Token_T<ACE_Token>
> > > >::handle_events () from /opt/mcp/lib/libACE.so.5.3.1
> > > #13 0xb754d6e8 in ACE_Reactor::run_reactor_event_loop ()
> > > from /opt/mcp/lib/libACE.so.5.3.1
> > > #14 0x0805978c in main (argc=3, argv=0xbfffeff4) at halMain.cpp:976
> > > Current language: auto; currently c
> > >
> > >
More information about the Openais
mailing list