[cgl_discussion] Question for TEMs/ISVs/OEMs regarding pthrea d requirements

Corey Minyard cminyard at mvista.com
Wed Feb 5 11:11:44 PST 2003

This is why you have to have compare-and-swap to do it in userland.  As 
George pointed out, it's possible to implement compare-and swap with a 
very quick kernel call on a processor that doesn't have it otherwise.

Please, go to http://ssthreads.sf.net, pull the design document, and 
read the mutex portion.  It talks about this in great detail.  I have 
thought of a simpler algorithm (instead of a stack), but I'm not 100% 
sure it's better, and it's not significantly simplier.  If the kernel is 
involved, though, it simplifies things a lot.

The other side of this is that if someone owns the mutex, you have to 
make sure they don't release the mutex until you are done boosting their 
priority.  So if someone owns the mutex, you have to atomically set the 
mutex to require them to go back into the kernel to release it.  There's 
also infinite loops you have to worry about, if a process deadlocks on a 
mutex, you will have a circular list of threads waiting on that mutex 
(and all inbetween).


Perez-Gonzalez, Inaky wrote:

>>Given that one can put together an atomic compare-and-swap in user 
>>space, I think one can enter an uncontested pi-mutex with just that, 
>>i.e. compare / modify + a test for success.  This assumes 
>>that one has 
>>a unique id fits in the given word size, plus one bit extra (in the 
>>kernel I used the "current" pointer and the least bit).  Exit of the 
>>area is the same overhead as entry.
>Yep, I also thought of this, but I reached the conclusion that it cannot be
>done - I could be wrong:
>The only ID we can allow ourselves is the TID, which does not offer ANY
>guarantee that is going to always have bits so-and-so unused so we can put
>the futex count in there - I mean, I know I won't have 4 G tasks in a
>system, but ... Ok, I assume I can << 2 it and it'd be ok.
>futex = (tid << 2) | (whatever_count & 0x3)
>Now, can we count on that being truly atomic? 
>The thing is, for example, initially the lock is released [lock == 0] and we
>try to acquire it:
>atomic {
>  if (lock == 0)
>    lock = (TID << 2) | 0x1; /* we got it */
>       /* Problem 1: if getting TID 
>          means going to the kernel, we are screwed;
>          let's hope the library caches it. */
>  else {
>    futex++;
>    futex_wait (&lock);
>  }
>So, this is easy; how do we put that into assembler? [pseudo c/inlinegcc/asm
>borrowed and scrambled from NPTL's lowlevellock.h)
>    lock xaddl %0, %3
>    testl      %0, %0
>    jz         uncontended
>    leal       %3, %%ecx
>    call       ___lll_mutex_lock_wait
>: "=a" (result), "=&c" (ignore1), 
>  "=&d" (ignore2), "=m" (futex)
>: "0" (1), "3" (futex)
>: "memory"
>Ok, how it works is: %3 is the futex [0 - unlocked; 1 - locked, no waiters;
>2 - locked, one or more waiters]; the first line is the juicy one; it does,
>as you guys know:
>temp = %0 + %3
>%0 = %3
>%3 = temp
>that is:
>temp = 1 + futex
>reg[0] = futex
>futex = temp
>Now if the futex was unlocked, this will set it to one and we jump to
>unconteded, as the testl on reg[0] will tell us it was zero and now we got
>it; if it was locked, no waiters, it moves to 2 and we need to go down to
>the kernel to wait [call ___lll_...]; same thing, if it was in 2, goes to 3
>and we go to the kernel - when coming back, it will be put to 2 [so it does
>not keep going up forever]. So, we need three bits, at least [and this is
>not really true; given the right conditions, it could go up to a lot until
>we reset it to two, so this needs attention -- for now let's assume it will
>stay bounded].
>Now, how do we add the TID in there? This does not work:
>movl        (TID << 2) | 1, %ecx
>lock xaddl  %ecx, futex 
>As we will overwrite the TID of the owner of the futex if it is in states 1
>and/or 2. We cannot replace the value because the operation would not be
>atomic and somebody could jump in the middle.
>This would not help either:
>movl $1,%ecx
>lock xaddl %ecx,futex
>testl %ecx,%ecx
>jnz contention
>; Uncontended, we got it
>xorl (TID << 2), futex
>For states 1 and 2 it works, but not for 0; in this case, from the time when
>the lock is acquired and the count goes from zero to 1, there is a window of
>three instructions (testl, jnz, xorl) until we set the TID, so that an
>incoming higher priority lock contender hitting it and raising our priority
>if need arises would find an invalid TID.
>So, here my assembler knowledge gets drained, and I think
>movl $1, %ecx
>movl futex, %ebx
>movl (TID << 2) | 1, %edx
>lock add-or-load %ebx,%ecx,%edx
>jnz contention ; go to the kernel
>.... wait ...
>where add-or-load means [replace futex with %ebx]:
>if (futex == 0) {
>  futex = %edx;
>  ZF = 1;
>} else {
>  futex += %ecx;
>  ZF = 0;
>But I guess this is asking for $1M or a date with Claudia Schiffer ... I
>don't think there is a similar instruction in any architecture ...
>So that's why I don't know how to do the identifier-embedded-in-the-word
>thing; I'd love to be proved wrong and shown a way to go.
>>I think we should think of this as instruction emulation where we go 
>>to the kernel VERY quickly (both in and out).  If we don't allow 
>>interruption, we would not need to save much context.  The major over 
>>head would be the system call instruction.
>Do you mean go to the kernel even in the uncontested case? I think this is
>what you meant, even if it is taking the fast path of it just being an
>interrupt, but I am sure the answer from anybody else would be something in
>the lines of "no way" - they want the uncontested case to stay in user
>Inaky Perez-Gonzalez -- Not speaking for Intel - opinions are my own [or my

More information about the cgl_discussion mailing list