[cgl_discussion] Proposal for the implementation of Robust Mutexes (ala Sun's POSI X extension)

Joe DiMartino joe at osdl.org
Fri Mar 14 09:33:43 PST 2003


I like the idea to have the mutex remain in EOWNERDEAD state until all
have a chance to fix it, however there is a slight snag.

The way it is currently defined, the first pthread_mutex_[try]lock()
call gets EOWNERDEAD if mutex was held when original owner died.  The
call pthread_mutex_consistent_np() is to be called iff the current
owner can fix it.  Otherwise, releasing the mutex will automatically
convert the state to ENOTRECOVERABLE.

Here are the snags: First, there is no pthread_mutex_inconsistent_np()
which will set the state to ENOTRECOVERABLE.  Even if there were such
a call, how would any of the surviving possible owners know that all
other such owners have had a go at fixing it?  Imagine a busted mutex
with 3 queued requests.  The first gets ownership, can't fix it and
lets go (still EOWNERDEAD).  What does it do next - re-queue?  It most
likely needs this mutex to complete whatever it's working on.  Whether
it re-queues or not, the remaining two queued survivors eventually get
their turn to fix it, and if they can't, the final one still doesn't
know that everyone else has had a go.  So this mutex will remain forever
in the EOWNERDEAD state.

-- Joe DiMartino <joe at osdl.org>



On Thu, 2003-03-13 at 20:15, Perez-Gonzalez, Inaky wrote:
> Hi All
> 
> So after hearing all the involved parts (sorry, there was a bit of behind
> the scenes email exchange) the conclusions reached were this:
> 
> - There needs to be a state of ENOTRECOVERABLE when nothing can be done to
> recover the mutex and everybody should bail out
> 
> - The implementation needs to allow more than one waiter to try to recover
> the inconsistent mutex instead of just the first one (for the case of
> multiple programs and only one/a few of them being able to perform recovery)
> as Sun's extension is currently.
> 
> So would the following proposal fly? [trying to get the best of both]:
> 
> Have two states of inconsistency (as the current Sun implementation): 
> 
>  * EOWNERDEAD (inconsistent - programs should be able to recover it)
> 
>  * ENOTRECOVERABLE (inconsistent and nothing to do about it, bail out)
> 
> Now, an owner that dies makes the mutex go to EOWNERDEAD.
> 
> Waiters that claim and acquire an EOWNERDEAD mutex can do two things:
> 
> - decide they don't know what to do, just unlock it and hope somebody else
> knows - this DOES NOT SWITCH THE MUTEX to ENOTRECOVERABLE as in the Sun
> extension
> 
> - fix the mess, call pthread_mutex_consistent(), the mutex is back to
> consistency now, normal operation resumes
> 
> - try to fix the mess, decide there is nothing it can be done and the whole
> thing needs to be restarted from the beginning. Call
> pthread_mutex_not_recoverable() to set the EOWNERDEAD mutex to
> ENOTRECOVERABLE. Anybody that claims and acquires the mutex now will see the
> ENOTRECOVERABLE state and act in consequence. There is no way back to
> EOWNERDEAD or consistency from here.
> 
> Now, this allows the applications flexibility in fixing the mess, not just
> the first waiter that gets it has to forcibly fix it.
> 
> And transparently to the user, the thread library can implement a Sun
> emulation mode. Get EOWNERDEAD ... before unlocking, if the mutex is still
> EOWNERDEAD, call pthread_mutex_not_recoverable(), then unlock. Done.
> 
> It will still be a wee of a PITA in the kernel, but it can be made.
> 
> Sounds reasonable?
> 
> Iñaky Pérez-González -- Not speaking for Intel -- all opinions are my own
> (and my fault)
> 
> _______________________________________________
> cgl_discussion mailing list
> cgl_discussion at lists.osdl.org
> http://lists.osdl.org/mailman/listinfo/cgl_discussion




More information about the cgl_discussion mailing list