[cgl_discussion] Question for TEMs/ISVs/OEMs regarding pthrea
d requirements
Perez-Gonzalez, Inaky
inaky.perez-gonzalez at intel.com
Sun Feb 16 01:43:48 PST 2003
> > For starters, it is a plain priority inversion. There was
> > somebody waiting with higher priority and the lock was
> > given to a lower priority thread. That is a bug.
> >
> > Also, it can potentially create a deadlock: thread A acquires,
> > threads B, C and D are waiting and thread A sets to UNLOCK before
> > calling the kernel_wake() [for the sake of simplicity, let's just
> > say we only wake up one waiter, but it could happen with two or
> > more].
>
> I think you misunderstand the CAS instruction. If there are waiters,
> the compare will fail and the swap will not be done.
Oh man! I don't know if I am confusing you or you are confusing me, but sure
I don't get this - please help me straighten it out.
Ok, if there are waiters, *futex == LID0 | WP [LID0 = Locker ID of the
owner, WP = Waiters Present bit]. Using the assembly code below, I do
ecx = MY_LID ; MY locker ID
eax = 0
atomic {
if (*futex == 0) {
ZF = 1;
*futex = ecx; // *futex = MY_LID
}
else { /* We take this path */
ZF = 0;
eax = *futex; // eax is LID0 | WP
}
}
// So now ZF = 0, eax = *futex = LID0 | WP, ecx = MY_LID
jz got_it // ZF = 0, so we don't take the jump
ecx = eax; // ecx = LID0 | WP
ecx |= WP; // mark again the WP, although its there in this case
atomic {
if (*futex == (LID0 | WP)) { /* if still same as after previous CAS */
ZF = 1;
*futex = ecx; // *futex = LID0 | WP, effectively a nop
}
else {
ZF = 0;
eax = *futex;
}
}
jnz beginning // oops, it changed, try again
Now, sure the first cas fails, as it is intended; but I have the *futex in
eax (as it failed) to verify in the next CAS that it did not change. Now,
the second CAS will only fail if the value of *futex changes (and thus what
I am trying to write back is already invalid because it was based on that),
so I retry from the beginning.
> The only hole is
> if there are no waiters and a contender (which will soon be a waiter)
> fails in the CAS. The owner still has the mutex, if he releases AND
> another contender comes along there is a race for who will get it
> between the one who failed the CAS and the new contender which may
> take the lock without involving the kernel. I contend that this is
This is what I don't see. The contender that failed the first CAS (contender
1) will go to the second CAS. In the second CAS he is assuming that *futex
equals to LID0 (owner's LID, as it saw it in the first CAS), but the
'hijacking' contender (contender 2) already set it to his LID2; so now, the
second CAS of contender 1 will compare *futex to the value he assumes it
should have, LID0, but it has LID2, so it fails to set it to LID0|WP and he
starts all over again. If it was what he saw in the failed 1st CAS, then he
sets it to LID0|WP and goes to the kernel to wait.
He depends on the value staying the same, not on changing, so they cannot
race. Or I still cannot see it and I need you to please shake me and explain
it to me as if I were stupid [please keep in mind I have been working almost
all Saturday ...]
> just too close to call and either outcome is ok. Note that in a UP
> machine, the higher priority task will get it anyway, as it
> has enough
> priority to beat the other contender.
>
> If I understand your example, A has the mutex and there are waiters
> when A unlocks. This should cause another CAS failure since
> A will be
> looking for "A" and not "A"+flag. The mutex stays locked and the
> kernel then gives it (IN THE KERNEL, i.e. atomicly) to the highest
> contender.
Exactly, but the unlock first CAS's only for A (the fast unlock path); if
that fails - AKA, either there are waiters (A+flag) or it is not our futex.
So next thing is test for A+flag - if it is so, go to the kernel (and sure,
the kernel will do the ownership transfer atomically to the highest
contender, setting the flag if there are more).
> > Just before calling the kernel, thread E, (low or high
> > priority) atomically LOCKs from UNLOCKED, so it does not even know
> > that there are waiters in the kernel. Now thread B is woken up by
> > A in the kernel_wake() path and just before checking the *futex
> > and setting the waiter flag or whatever, it is killed [or delayed
> > and later killed]--the thing is it does not have the chance to
> > claim the *futex and mark it as having waiters present in the
> > kernel; E unlocks, without knowing about C and D, who will be
> > waiting until frogs grow beards.
>
> The key is that, once there are waiters, the flag is set in the mutex
> and the exit CAS will fail, transfering control to the kernel.
>
> The only race is between a contender who is on the way to the kernel
> and a second contender who just happens along just as A unlocks. So
This is why you pass, along with the kernel_wait(), the value you saw for
the futex last time you did a CAS. If when you are about to lock, it has
changed, then you go back to user space to contend again, so you avoid this
problem.
> there will only be these three tasks (or I suppose other could be
> added, either calling the kernel or trying to lock the free mutex.
> Only one of the many will get the mutex, the rest will all
> meet in the
> kernel). Now the kernel will see one of three things:
> 1) A holds the lock, still
> 2) The mutex is unlocked, or
> 3) C holds the lock
>
> From the kernels point of view 1 and 2 are the same, just queue the
> new waiter. 2 is easy, give the mutex to B and keep on trucking.
> >
> > I know there is an slim chance and that you have to align seven
> > planets for it to happen, but the chance is there; minimal, but
> > present. And I don't want to be blamed in the future for a
> > deadlock that risked some critical system.
>
> Murphy says this will happen ONLY after all tests pass and
> the product
> is shipped :)
> >
> >
> >>No, he should not enter the kernel with the expectation
> that it is in
> >>any state, as I said above.
> >
> >
> > Does the case I exposed invalidate this?
>
> No.
> >
> > Think also that the idea behind that was to separate the actual
> > implementation of the *futex count from the futex implementation,
> > to allow architectures to choose the most effective one. For this
> > I am requiring compare-and-swap, at least. Maybe this can be
> > stretched in ... hmmm ...
> >
> >
> >>But you really want to check for NULL and replace with self
> (now the
> >>owner). The sign bit would only get set if there is already
> >>an owner.
> >> I suppose you could do a two stage loop something like:
> >>
> >> xx: if (mutex->entry_wd == 0){ // this line
> and the next is
> >> mutex->entry_wd = TID; /// the compare and swap
> >> return ; // we now own the mutex
> >> }else{
> >> if(mutex->entry_wd == what it was) {
> >> mutex->entry_wd = what it was | flag;
> >> call waitmutex(mutex);
> >> }else{
> >> goto xx;
> >> }
> >> }
> >>
> >>The problem with this code is that an exit by the owner can race to
> >>the kernel with the contender. And, of course, there is the third
> >
> >
> > Not if your unlock code is like this:
> >
> > int unlock (int *lock, int locker_id)
> > {
> > int old_value;
> > atomic { /* normal compare and swap */
> > if (*lock == locker_id) {
> > old_value = *lock;
> > *lock = 0;
> > }
> > }
> > if (old_value == locker_id)
> > return 0;
> >
> > if (*lock == (locker_id | WAITERS_PRESENT))
> > return kernel_wake_up (lock, locker_id);
> >
> > /* Error, we did not own the lock! */
> > return -EWOULDBLOCK;
> > }
> >
> > At least I cannot see no races; I have looked to it left and right
> > and can't find any ...
> >
> >
> >>possibility, i.e. that the mutex flag is already set which
> the above
> >>does not handle.
> >
> >
> > It does. If the mutex flag (the waiters present flag) is already
> > set when we step on xx, we just set it again and fall down to the
> > kernel. In ia32, it'd be something like this [it's been long since
> > I don't write asm, so I could have the arguments swapped and stuff]:
>
> If the compare fails the cmpxchg fails. You don't need to do
> anything
> to reset it. THAT is what prevents the race.
> >
> > 1: movl locker_id, %ecx
> > movl 0, %eax
> > lock cmpxchg futex, %ecx
> > jz we_got_it_locked_we_are_owners
> > movl %eax, %ecx
> > orl WAITERS_PRESENT, %ecx
> > lock cmpxchg futex, %ecx
> > jnz 1
> > wait_in_kernel:
> > ...
Inaky Perez-Gonzalez -- Not speaking for Intel - opinions are my own [or my
fault]
More information about the cgl_discussion
mailing list