[Openais] Re: FW: Evt Deadlock

Muni Bajpai muniba at nortel.com
Tue Jan 24 15:57:15 PST 2006


:) As Usual thanks for your patience and willingness to share your
thoughts.

I agree that it is not a recurring condition where the paths cross but
an anomaly in error handling and since a lot of that has been updated
recently it is actually better to test with them instead of spending
more cycles in the past

-----Original Message-----
From: Steven Dake [mailto:sdake at mvista.com] 
Sent: Tuesday, January 24, 2006 4:13 PM
To: Bajpai, Muni [RICH1:B670:EXCH]
Cc: Mark Haverkamp; openais at lists.osdl.org; scd at broked.org; Smith,
Kristen [RICH1:B670:EXCH]
Subject: RE: [Openais] Re: FW: Evt Deadlock

On Tue, 2006-01-24 at 15:42 -0600, Muni Bajpai wrote:
> So I think the basic premise of what I saw was that In the HandlePut
> call, the thread in question was holding the channel_handle_db lock
and
> requesting the event_handle_db lock. So for deadlock to happen another
> thread has to hold at least the event_handle_db lock and requesting
the
> channel_handle_db lock. 
> 
> I'm not sure if any path in make_event can fulfill the above criteria
> (0.70 code) to cause a lock but then again I might be missing
something
> 
> Thanks
> 

Muni

The mutexes are not held for long periods in the library.  Instead we
use reference counting to avoid the need to hold these locks for long
periods and provide better concurrency in multi-threaded apps.

The scenario you describe cannot happen.  The call code is not:
request channel handle db lock
request event handle db lock
do some operation on the event data
release event handle db lock
release channel handle lock

instead with the reference counting code it is always:
request channel handle db lock
increase ref count on channel handle db 
release channel handle db lock
decrease ref count on channel handle db
request event handle db lock
increase ref count on event handle db 
release event handle db lock
decrease ref count on event handle db

I think more likely there is a bug (like the one mark fixed) with
saHandleDestroy being called on a hdb with an invalid handle.  This is
also totally consistent with the previous segfault we saw where the
handle was improperly passed.

I suspect Mark's fix should fix this problem, or also upgrading to
include defect 1029.

Regards
-steve

> Muni
> 
> -----Original Message-----
> From: Mark Haverkamp [mailto:markh at osdl.org] 
> Sent: Tuesday, January 24, 2006 3:17 PM
> To: sdake at mvista.com
> Cc: Bajpai, Muni [RICH1:B670:EXCH]; openais at lists.osdl.org;
> scd at broked.org
> Subject: Re: [Openais] Re: FW: Evt Deadlock
> 
> On Tue, 2006-01-24 at 12:27 -0700, Steven Dake wrote:
> [ ... ]
> 
> > 
> > Mark, I'd take a second look at your saHandleDestroy calls as they
may
> > have some kind of problem.
> 
> OK, I found a bad bug.  I don't know if it is related to anyones
> trouble, but it is bad.  In make_event (creates an event structure in
> the library code) the new event is destroyed if there are any errors.
I
> was using the wrong handle database to destroy the event handle.  Here
> is the patch.  This will need to be checked into the picacho branch
too.
> I'll create a bugzilla entry too.
> 
> Mark.
> 
> 






More information about the Openais mailing list