[cgl_discussion] RE: Proposal for the implementation of Robust Mutexes (ala Sun's POSIX extension)

David W. McDaniel damcdani at cisco.com
Fri Mar 14 06:54:14 PST 2003

  100% agreement.

-----Original Message-----
From: Perez-Gonzalez, Inaky [mailto:inaky.perez-gonzalez at intel.com]
Sent: Thursday, March 13, 2003 10:16 PM
To: 'cgl_discussion at osdl.org'
Cc: Liu, Bing Wei; Howell, David P; 'David W. McDaniel'; 'Pradeep
Subject: Proposal for the implementation of Robust Mutexes (ala Sun's
POSIX extension)

Hi All

So after hearing all the involved parts (sorry, there was a bit of behind
the scenes email exchange) the conclusions reached were this:

- There needs to be a state of ENOTRECOVERABLE when nothing can be done to
recover the mutex and everybody should bail out

- The implementation needs to allow more than one waiter to try to recover
the inconsistent mutex instead of just the first one (for the case of
multiple programs and only one/a few of them being able to perform recovery)
as Sun's extension is currently.

So would the following proposal fly? [trying to get the best of both]:

Have two states of inconsistency (as the current Sun implementation):

 * EOWNERDEAD (inconsistent - programs should be able to recover it)

 * ENOTRECOVERABLE (inconsistent and nothing to do about it, bail out)

Now, an owner that dies makes the mutex go to EOWNERDEAD.

Waiters that claim and acquire an EOWNERDEAD mutex can do two things:

- decide they don't know what to do, just unlock it and hope somebody else

- fix the mess, call pthread_mutex_consistent(), the mutex is back to
consistency now, normal operation resumes

- try to fix the mess, decide there is nothing it can be done and the whole
thing needs to be restarted from the beginning. Call
pthread_mutex_not_recoverable() to set the EOWNERDEAD mutex to
ENOTRECOVERABLE. Anybody that claims and acquires the mutex now will see the
ENOTRECOVERABLE state and act in consequence. There is no way back to
EOWNERDEAD or consistency from here.

Now, this allows the applications flexibility in fixing the mess, not just
the first waiter that gets it has to forcibly fix it.

And transparently to the user, the thread library can implement a Sun
emulation mode. Get EOWNERDEAD ... before unlocking, if the mutex is still
EOWNERDEAD, call pthread_mutex_not_recoverable(), then unlock. Done.

It will still be a wee of a PITA in the kernel, but it can be made.

Sounds reasonable?

Iñaky Pérez-González -- Not speaking for Intel -- all opinions are my own
(and my fault)

More information about the cgl_discussion mailing list