[cgl_discussion] Question for TEMs/ISVs/OEMs regarding pthrea d requirements

Perez-Gonzalez, Inaky inaky.perez-gonzalez at intel.com
Sun Feb 16 02:14:26 PST 2003

My apologies, my last message is broken, I was editing text and typed Alt-S
by mistake; I'll keep it on where I stooped

> If I understand your example, A has the mutex and there are waiters 
> when A unlocks.  This should cause another CAS failure since 
> A will be 
> looking for "A" and not "A"+flag.  The mutex stays locked and the 
> kernel then gives it (IN THE KERNEL, i.e. atomicly) to the highest 
> contender.

[This is were instead of doing Ctrl-V I did Alt-S when pasting the code from
Emacs ... and was reformatting what I wrote to make it more clear - sorry

Exactly, but the unlock first CAS's only for A (the fast unlock path); if
that fails (either there are waiters (A+flag) or it is not our futex) then
we test for A+flag - if it is so, go to the kernel (and sure, the kernel
will do the ownership transfer atomically to the highest contender, setting
the flag if there are more); if none of the previous, error, it was not our
futex. Something like this:

    int unlock (int *lock, int locker_id)
      int old_value = locker_id;
      atomic {  /* normal compare and swap */
        if (*lock == old_value)
          *lock = 0;
          old_value = *lock;
      if (old_value == locker_id) 
        return 0; /* replace with testing for ZF if wished */
      if (*lock == (locker_id | WAITERS_PRESENT))
        return kernel_wake_up (lock, locker_id);

      /* Error, we did not own the lock! */
      return -EWOULDBLOCK;

We are the owners. If the CAS succeeds, it does not matter if anybody
contends because it is already 0 (UNLOCKED), no races here. If it fails, it
is still something else [in a bug-free program, LID|WP, so we will call the
kernel. Anyone claiming the futex until the kernel sets the ownership back
will hit the kernel and queue.

> > Just before calling the kernel, thread E, (low or high
> > priority) atomically LOCKs from UNLOCKED, so it does not even know
> > that there are waiters in the kernel. Now thread B is woken up by
> > A in the kernel_wake() path and just before checking the *futex
> > and setting the waiter flag or whatever, it is killed [or delayed
> > and later killed]--the thing is it does not have the chance to
> > claim the *futex and mark it as having waiters present in the
> > kernel; E unlocks, without knowing about C and D, who will be
> > waiting until frogs grow beards.
> The key is that, once there are waiters, the flag is set in the mutex 
> and the exit CAS will fail, transfering control to the kernel.
> The only race is between a contender who is on the way to the kernel 
> and a second contender who just happens along just as A  unlocks.  So 

My previous thing here was kind of wrong I think - I am rethinking it. If
there is a contender on its way to the kernel, then that means *futex is
LIDA|WP; if the owner, A, unlocks -in the kernel- before the contender has a
chance to queue, it will go there and see the queue being empty [actually,
finding no queue] and set the *futex, atomically to UNLOCKED (0). Now,
contender 2 will get it and don't see any waiters. Contender 1 will now
enter in the kernel and see that the value of *futex has changed since the
last time he saw it before falling down to the kernel, so he packs up and
goes back to contend.

I guess it is more than a race a hole for a priority inversion that has
almost no sollution. However, if contender 1 is higher priority than
contender 2, it should get there first; another thing is if they are in
different CPUs - then they will kind of clash and there is no way to solve
it other than have 1 contend again.

Maybe it could be optimized for the case of what happens if you go to the
kernel and the value changes - maybe it'd be interesting to do a contend in
that path in the kernel, to speed it up.

> > I know there is an slim chance and that you have to align seven
> > planets for it to happen, but the chance is there; minimal, but
> > present. And I don't want to be blamed in the future for a
> > deadlock that risked some critical system.
> Murphy says this will happen ONLY after all tests pass and 
> the product is shipped :)

Yep, tell that to the guys who designed and tested the Mars Pathfinder ...
after so many years they probably still sweat kerosene whenever they think
about their priority inversion problem :] ...

> >>No, he should not enter the kernel with the expectation 
> that it is in 
> >>any state, as I said above.
> > 
> > 
> > Does the case I exposed invalidate this?
> No.

So, in a second read - I expect it to be X, if it is not, I bail out and try
again. I need the expectation in order to avoid the problem you proposed

> > It does. If the mutex flag (the waiters present flag) is already 
> > set when we step on xx, we just set it again and fall down to the 
> > kernel. In ia32, it'd be something like this [it's been long since
> > I don't write asm, so I could have the arguments swapped and stuff]:
> If the compare fails the cmpxchg fails.  You don't need to do 
> anything 
> to reset it.  THAT is what prevents the race.

Again, I don't get this. I am not resetting anything. Now I want to compare
against the value that *futex had when the first CAS [and that's %eax]; if
it is the same I want it to change to whatever it was |WP to flag me in
[that's %ecx] - could it be that I am screwing up the arguments (AT&T vs.
Intel sintaxes?) and thisis confusing us both?

> > 
> >       1: movl          locker_id, %ecx
> >          movl          0, %eax
> >          lock cmpxchg  futex, %ecx
> >          jz            we_got_it_locked_we_are_owners
> >          movl          %eax, %ecx
> >          orl           WAITERS_PRESENT, %ecx
> >          lock cmpxchg  futex, %ecx
> >          jnz           1
> >        wait_in_kernel:
> >          ...

Ok, next algorithm is

if (too_freaking_late) {
  sleep (4:47);

too_freaking_late tests positive ...

G'd night :]

Inaky Perez-Gonzalez -- Not speaking for Intel - opinions are my own [or my

More information about the cgl_discussion mailing list