[cgl_discussion] Re: [PATCH] Robust and Real-time Mutexes in NPTL

Carlo Wood carlo at alinoe.com
Fri Jun 27 07:04:50 PDT 2003

On Fri, Jun 27, 2003 at 10:08:44AM +0800, Liu, Bing Wei wrote:
> This is a ---prototype implementation--- that extends NPTL with
> real-time and robust mutex features.  The intention is to have an
> approach to locking that will benefit the applications that care
> about responsiveness and robustness of the locks without affecting
> other types of applications. Currently this is work in progress, and
> the implementation is subject to much flux, both in the user or
> kernel code.

Hi Liu,

I am having some ideas to detect possible dead-locks *before*
they actually happen.  I don't think I will have time to actually
implement this though, so I thought I might run the idea past
you; perhaps you are interested to write an implementation for

Basically, the idea is to have some debug mode for the mutexes
that will detect possible dead-locks - kinda like the current

Let me explain what I mean using the simplest case first: a simple
mutex that is either locked or not locked.

Consider two of such locks: X and Y.

Then, a dead-lock would occur when thread '1' locks X, thread '2'
locks Y and then thread '1' waits for mutex Y and thread '2'
waits for mutex X.

This dead-lock could be detected before it actually happens
by noticing that at one moment (anywhere during execution) thread '1'
locks X and then, while holding X, locks Y - while later (or
vica versa) another thread first locks Y and then, while holding
Y, locks X.

So, basically one would need to make a list of all locks
involved and assign an 'ordering' to them.  If thread '1' locks
X and then locks Y, then we could "define" that X < Y.
When later another thread first locks Y and then X, we
notice that it is no longer possible to define an ording for
the mutexes and barf.

Lets make this more complex: suppose we have several simple
mutexes: A, B, C, D, E and F.

At some time during execution, thread '1' locks first C and
then D:  C < D.  Furthermore we get for example:

C < D
A < E
B < F
F < A	--> B < F < A < E
C < A
D < A	--> C < D < A
E < D	--> B < F < A < E < D < A
D < B   --> error

Not something you'd detect quickly yourself (assuming it
never happened during testing, but only after the spaceshuttle
was launced and about to transmit the message that would
have stopped the nuclear holocaust).

The order in which these pairs are locked is not important.
Note that if a thread first locks 'B' then 'F' and then 'A', then
this leads to two pairs: B<F and F<A.  While when thread '1'
first locks A and then B (A<B), thread '2' locks first B and
then C (B<C), then this implies a pair A<C.

All of the above is very easy to implement and hardly a
challenge - but, it gets more interesting when we also
include the read/write locks:

Read locks are allowed in reverse order, lets call a read lock
R1, R2 etc, and the corresponding write locks W1, W2 etc.
Lets therefore write a comma instead of '<'.

Then it is allowed to have

R1,R2 + R2,R1		(thread 'x' takes read-lock 1 and then,
                         while holding lock 1, read-lock 2.
			 While at some other moment another
			 thread takes read-lock 2 and then,
			 while holding lock 2, lock 1).

also it is allowed to have

R1,W2 + R2,R1

and even

R1,W2 + R2,W1

So we might as well denote this with:

R1,R2 + R2,R/W1

Where R/W1 means: R1 or W1.

It is not allowed to have

R1,W2 + R2,W1

If we build a complete list we get:

R1,R2 + R2,R1	allowed
R1,R2 + R2,W1	allowed
R1,R2 + W2,R1	allowed
R1,R2 + W2,W1	not allowed
R1,W2 + R2,R1	allowed
R1,W2 + R2,W1	not allowed
R1,W2 + W2,R1	allowed
R1,W2 + W2,W1	not allowed
W1,R2 + R2,R1	allowed
W1,R2 + R2,W1	allowed
W1,R2 + W2,R1	allowed
W1,R2 + W2,W1	allowed
W1,W2 + R2,R1	not allowed
W1,W2 + R2,W1	not allowed
W1,W2 + W2,R1   not allowed
W1,W2 + W2,W1	not allowed

I have to admit that I already wrote an implementation for
this, even including a "high priority read-only mutex" that
will get its lock even while another thread was waiting too,
bot in order to write (while normally a write lock gets
precedence over a read-only lock in my implementation);
It uses six "groups", represented by six bits in a
bit mask: Every possible pair of two mutexes (read-only,
write, or normal for each of them) corresponds to a fixed
bit mask: the "groups" that pair belongs to.  Every time
a new pair is locked (or implied by certain rules) the
bit mask is AND-ed with the new mask and an error condition
is raised when the resulting bit mask is 0: there is no
allowable "group" left anymore and thus we have a *potential*
deadlock situation.

The groups in the above case are (from my libcwd project):

uint16_t const group1 = 0x002;  // Instance 1 locked first.
uint16_t const group2 = 0x004;  // Instance 2 locked first.
uint16_t const group3 = 0x020;  // Instance 1 read only and high_priority when second.
uint16_t const group4 = 0x040;  // Instance 2 read only and high_priority when second.
uint16_t const group5 = 0x200;  // (Instance 1 locked first and (either one read only and high_priority when second))
                                // or (both read only and high_priority when second).
uint16_t const group6 = 0x400;  // (Instance 2 locked first and (either one read only and high_priority when second))
                                // or (both read only and high_priority when second).

In pseudo code, the decoding of the bit mask works as

    // During the decoding, we assume that the first instance is the smallest (instance 1).
    keypair_info.state = group1;
    if (first instance is read-only)
      keypair_info.state |= (group3 | group5);
    if (second instance is read-only and high priority)
      keypair_info.state |= (group4 | group5);
      if (first instance is read-only)
	keypair_info.state |= group6;
    // Put the smallest instance in instance1.
    if (instance_first < instance_second)
      keypair_key.instance1 = instance_first;
      keypair_key.instance2 = instance_second;
    else	// Oh, the first instance was NOT the smallest.
      keypair_key.instance1 = instance_second;
      keypair_key.instance2 = instance_first;
      // Correct the state by swapping groups 1 <-> 2, 3 <-> 4, and 5 <-> 6.
      keypair_info.state = ((keypair_info.state << 1) | (keypair_info.state >> 1)) & (group1|group2|group3|group4|group5|group6);

Then, if the same key pair already exists (arbitrary order: R1,W2 and R2,R1 are
both the "pair" with instance numbers (1,2) here), then we do:

    uint16_t prev_state = stored_info.state;
    stored_info.state &= keypair_info.state;                    // Limit possible groups.
    if (stored_info.state == 0)                                 // No group left?
      // Possible dead-lock

The REAL challenge is to make this work with arbitrary priority mutexes }:-).

Also, there are several complex "rules" missing in my implementation
that dictate implicit mutex pairs (involving more than two threads
that together do something that is equivalent to a single thread
taking two locks in a certain order.  For example, X1,W2 + R2,Y3 implies X1,Y3
(Where X and Y are normal mutexes).  That means that when a pair
R2,Y3 is detected and we already detected a pair X1,W2 (or vica
versa) an implicit pair X1,Y3 should be generated).

I am convinced that a debugging mode like this would be
very very valuable.  My implementation for the simple
read/write locks has helped me to detect potential deadlocks
in a very early stage before they caused problems in Real Life

Since this has everything to do with robustness, I hope
I interested you.

Carlo Wood <carlo at alinoe.com>

More information about the cgl_discussion mailing list