[cgl_discussion] Question for TEMs/ISVs/OEMs regarding pthrea d requirements

Perez-Gonzalez, Inaky inaky.perez-gonzalez at intel.com
Sun Feb 16 01:43:48 PST 2003


> > For starters, it is a plain priority inversion. There was 
> > somebody waiting with higher priority and the lock was 
> > given to a lower priority thread. That is a bug.
> > 
> > Also, it can potentially create a deadlock: thread A acquires,
> > threads B, C and D are waiting and thread A sets to UNLOCK before
> > calling the kernel_wake() [for the sake of simplicity, let's just
> > say we only wake up one waiter, but it could happen with two or
> > more]. 
> 
> I think you misunderstand the CAS instruction.  If there are waiters, 
> the compare will fail and the swap will not be done.

Oh man! I don't know if I am confusing you or you are confusing me, but sure
I don't get this - please help me straighten it out.

Ok, if there are waiters, *futex == LID0 | WP [LID0 = Locker ID of the
owner, WP = Waiters Present bit]. Using the assembly code below, I do

ecx = MY_LID   ; MY locker ID
eax = 0
atomic {
  if (*futex == 0) {
    ZF = 1;
    *futex = ecx; // *futex = MY_LID
  } 
  else { /* We take this path */
    ZF = 0;
    eax = *futex;  // eax is LID0 | WP
  }
}
// So now ZF = 0, eax = *futex = LID0 | WP, ecx = MY_LID
jz got_it // ZF = 0, so we don't take the jump
ecx = eax;  // ecx = LID0 | WP
ecx |= WP;  // mark again the WP, although its there in this case
atomic {
  if (*futex == (LID0 | WP)) { /* if still same as after previous CAS */
    ZF = 1;
    *futex = ecx;  // *futex = LID0 | WP, effectively a nop
  }
  else {
    ZF = 0;
    eax = *futex;
  }
}
jnz beginning // oops, it changed, try again

Now, sure the first cas fails, as it is intended; but I have the *futex in
eax (as it failed) to verify in the next CAS that it did not change. Now,
the second CAS will only fail if the value of *futex changes (and thus what
I am trying to write back is already invalid because it was based on that),
so I retry from the beginning.

> The only hole is 
> if there are no waiters and a contender (which will soon be a waiter) 
> fails in the CAS.  The owner still has the mutex, if he releases AND 
> another contender comes along there is a race for who will get it 
> between the one who failed the CAS and the new contender which may 
> take the lock without involving the kernel.  I contend that this is

This is what I don't see. The contender that failed the first CAS (contender
1) will go to the second CAS. In the second CAS he is assuming that *futex
equals to LID0 (owner's LID, as it saw it in the first CAS), but the
'hijacking' contender (contender 2) already set it to his LID2; so now, the
second CAS of contender 1 will compare *futex to the value he assumes it
should have, LID0, but it has LID2, so it fails to set it to LID0|WP and he
starts all over again. If it was what he saw in the failed 1st CAS, then he
sets it to LID0|WP and goes to the kernel to wait.

He depends on the value staying the same, not on changing, so they cannot
race. Or I still cannot see it and I need you to please shake me and explain
it to me as if I were stupid [please keep in mind I have been working almost
all Saturday ...]
 
> just too close to call and either outcome is ok.  Note that in a UP 
> machine, the higher priority task will get it anyway, as it 
> has enough 
> priority to beat the other contender.
> 
> If I understand your example, A has the mutex and there are waiters 
> when A unlocks.  This should cause another CAS failure since 
> A will be 
> looking for "A" and not "A"+flag.  The mutex stays locked and the 
> kernel then gives it (IN THE KERNEL, i.e. atomicly) to the highest 
> contender.

Exactly, but the unlock first CAS's only for A (the fast unlock path); if
that fails - AKA, either there are waiters (A+flag) or it is not our futex.
So next thing is test for A+flag - if it is so, go to the kernel (and sure,
the kernel will do the ownership transfer atomically to the highest
contender, setting the flag if there are more).

 
> > Just before calling the kernel, thread E, (low or high
> > priority) atomically LOCKs from UNLOCKED, so it does not even know
> > that there are waiters in the kernel. Now thread B is woken up by
> > A in the kernel_wake() path and just before checking the *futex
> > and setting the waiter flag or whatever, it is killed [or delayed
> > and later killed]--the thing is it does not have the chance to
> > claim the *futex and mark it as having waiters present in the
> > kernel; E unlocks, without knowing about C and D, who will be
> > waiting until frogs grow beards.
> 
> The key is that, once there are waiters, the flag is set in the mutex 
> and the exit CAS will fail, transfering control to the kernel.
> 
> The only race is between a contender who is on the way to the kernel 
> and a second contender who just happens along just as A  unlocks.  So 

This is why you pass, along with the kernel_wait(), the value you saw for
the futex last time you did a CAS. If when you are about to lock, it has
changed, then you go back to user space to contend again, so you avoid this
problem.

> there will only be these three tasks (or I suppose other could be 
> added, either calling the kernel or trying to lock the free mutex. 
> Only one of the many will get the mutex, the rest will all 
> meet in the 
> kernel).  Now the kernel will see one of three things:
> 1) A holds the lock, still
> 2) The mutex is unlocked, or
> 3) C holds the lock
> 
>  From the kernels point of view 1 and 2 are the same, just queue the 
> new waiter.  2 is easy, give the mutex to B and keep on trucking.
> > 
> > I know there is an slim chance and that you have to align seven
> > planets for it to happen, but the chance is there; minimal, but
> > present. And I don't want to be blamed in the future for a
> > deadlock that risked some critical system.
> 
> Murphy says this will happen ONLY after all tests pass and 
> the product 
> is shipped :)
> > 
> > 
> >>No, he should not enter the kernel with the expectation 
> that it is in 
> >>any state, as I said above.
> > 
> > 
> > Does the case I exposed invalidate this?
> 
> No.
> > 
> > Think also that the idea behind that was to separate the actual
> > implementation of the *futex count from the futex implementation,
> > to allow architectures to choose the most effective one. For this
> > I am requiring compare-and-swap, at least. Maybe this can be 
> > stretched in ... hmmm ...
> > 
> > 
> >>But you really want to check for NULL and replace with self 
> (now the 
> >>owner).  The sign bit would only get set if there is already 
> >>an owner. 
> >>  I suppose you could do a two stage loop something like:
> >>
> >>  xx:	if (mutex->entry_wd == 0){       // this line 
> and the next is
> >>		mutex->entry_wd = TID;  /// the compare and swap
> >>                  return ;		// we now own the mutex
> >>	}else{
> >>		if(mutex->entry_wd == what it was) {
> >>			mutex->entry_wd = what it was | flag;
> >>			call waitmutex(mutex);
> >>		}else{
> >>			goto xx;
> >>		}
> >>	}
> >>
> >>The problem with this code is that an exit by the owner can race to 
> >>the kernel with the contender.  And, of course, there is the third 
> > 
> > 
> > Not if your unlock code is like this:
> > 
> >     int unlock (int *lock, int locker_id)
> >     {
> >       int old_value;
> >       atomic {  /* normal compare and swap */
> >         if (*lock == locker_id) {
> >           old_value = *lock;
> >           *lock = 0;
> >         }
> >       }
> >       if (old_value == locker_id) 
> >         return 0;
> >     
> >       if (*lock == (locker_id | WAITERS_PRESENT))
> >         return kernel_wake_up (lock, locker_id);
> > 
> >       /* Error, we did not own the lock! */
> >       return -EWOULDBLOCK;
> >     }
> > 
> > At least I cannot see no races; I have looked to it left and right 
> > and can't find any ...
> > 
> > 
> >>possibility, i.e. that the mutex flag is already set which 
> the above 
> >>does not handle.
> > 
> > 
> > It does. If the mutex flag (the waiters present flag) is already 
> > set when we step on xx, we just set it again and fall down to the 
> > kernel. In ia32, it'd be something like this [it's been long since
> > I don't write asm, so I could have the arguments swapped and stuff]:
> 
> If the compare fails the cmpxchg fails.  You don't need to do 
> anything 
> to reset it.  THAT is what prevents the race.
> > 
> >       1: movl          locker_id, %ecx
> >          movl          0, %eax
> >          lock cmpxchg  futex, %ecx
> >          jz            we_got_it_locked_we_are_owners
> >          movl          %eax, %ecx
> >          orl           WAITERS_PRESENT, %ecx
> >          lock cmpxchg  futex, %ecx
> >          jnz           1
> >        wait_in_kernel:
> >          ...



Inaky Perez-Gonzalez -- Not speaking for Intel - opinions are my own [or my
fault]




More information about the cgl_discussion mailing list