[cgl_discussion] POSIX requirements question for TEMs/ISVs/OEMs - Robust Mutex Support

Howell, David P david.p.howell at intel.com
Tue Feb 18 15:02:53 PST 2003


I am working to provide a definition for the robust mutexes feature that
was added 

to NGPT. This was done to satisfy a CGL requirement from a member
company. It provides

new POSIX functionality for robust use of shared mutexes, using a POSIX


My aim with this posting is to get feedback from TEMs, OEMs, and ISVs on
their need 

for this feature. If it is something needed by telecom space
applications I will go 

forward and propose it to the Austin Group for inclusion in future POSIX

otherwise we can drop it. 


A quick sketch of the implementation:


Robust mutex extensions to POSIX


Robust mutexes support permits a mutex to synchronize threads between
processes groups 

effectively, even when processes exit or abort unexpectedly. This gives
a POSIX implementation

of the Solaris robust mutexes implementation.



- A robust mutex is initialized with the robust mutex attribute. It must
be an inter process shared 

  mutex, allocated in a shared memory segment mapped into the processes
that use it.

- Each process that uses the robust mutex must call pthread_mutex_init
at least once to 

  register the robust mutex with the system and initialize it. If
pthread_mutex_init is called 

  on a previously initialized mutex it will not re-initialize the mutex.

- When a process dies, if the robust mutex is held by the process, it
gets unlocked. 

- The next thread that attempts to lock the mutex will acquire it, but
with an return value from 

  pthread_mutex_lock/pthread_mutex_trylock of EOWNERDEAD instead of
success. If the process

   which got the lock with EOWNERDEAD died, the next locker will get the
mutex with an error 

   return value of EOWNERDEAD as well.

- Applications using a robust mutex must always check the return code
from these APIs to 

  see if this occurred, and if it did should attempt to make the state
protected by the mutex 

  consistent, since this state could have been left inconsistent when
the last owner died. 

- If the new robust mutex owner is able to make the state consistent, it
should re-initialize the 

  mutex and then unlock it. If it can't make the state consistent, for
whatever reason, it should 

  not re-initialize the mutex, but should just unlock it; All subsequent
calls to lock the mutex 

  will fail with the error return ENOTRECOVERABLE.

- The robust mutex can be made consistent again by un-initializing the
mutex and reinitializing it.


NGPT Implementation details, not sure how useful for requirements, but
it does exist:

- POSIX APIs affected:

      - pthread_mutexattr_setrobust_np - New API for setting

      - pthread_mutexattr_getrobust_np - New API for getting the robust

      - pthread_mutex_consistent_np - New API to make a non consistent
robust mutex consistent

      - pthread_mutex_init - Handling for robust attribute

      - pthread_mutex_lock - Lock time handling of robust attribute

      - pthread_mutex_trylock - Lock time handling of robust attribute
(common with pthread_mutex_lock)

      - pthread.h - Addition of PTHREAD_MUTEX_ROBUST_NP



Some notes from earlier discussions:

- The existing implementations that have a thread feature like this is
in the 

  Solaris thread APIs (not POSIX). See Solaris mutex_init,

  Attribute for details of how it's done with this implementation.

- NGPT's implementation uses extensions to the existing POSIX thread

- Without a POSIX standard initiative/tie-in this likely isn't going
anywhere in 

  other open source pthreads implementations (like NPTL).

- This seems to be critical functionality for the use of shared mutexes,

  key enough for Solaris to adopt and support it and key enough for it

  be a requirement that was implemented for NGPT.

- There is no other POSIX way to do this without accessing the internals

  the pthread_mutex_t structures and hacking other internal parts of

  POSIX thread APIs, i.e. to find out a shared mutex owner to test if it

  alive or dead, to allow a application restart to regain access to

  shared mutexes, and to check and correct for shared mutex consistency.

- It does provide a valuable service for hardened/restartable

  that use shared mutexes to synchronize between process groups.

- It would be a while (Linux 2.6, best case) before it would be in a

  of the POSIX specifications that gets ratified. So, this is beyond CGL



It'd be really nice if this could be passed down to architects/engineers

are familiar with the APPs that are provided to comment on this. Any
help is 

very much appreciated.



Dave Howell



These are my opinions and not official opinions of Intel Corp.


David Howell

Intel Corporation

Telco Server Development

Server Products Division

Voice: (803) 461-6112  Fax: (803) 461-6292


Intel Corporation

Columbia Design Center, CBA-2

250 Berryhill Road, Suite 100

Columbia, SC 29210


david.p.howell at intel.com


-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.linux-foundation.org/pipermail/cgl_discussion/attachments/20030218/73454b2a/attachment-0001.htm

More information about the cgl_discussion mailing list