[cgl_discussion] Question for TEMs/ISVs/OEMs regarding pthrea d requirements

Perez-Gonzalez, Inaky inaky.perez-gonzalez at intel.com
Wed Feb 5 10:42:47 PST 2003

> Given that one can put together an atomic compare-and-swap in user 
> space, I think one can enter an uncontested pi-mutex with just that, 
> i.e. compare / modify + a test for success.  This assumes 
> that one has 
> a unique id fits in the given word size, plus one bit extra (in the 
> kernel I used the "current" pointer and the least bit).  Exit of the 
> area is the same overhead as entry.

Yep, I also thought of this, but I reached the conclusion that it cannot be
done - I could be wrong:

The only ID we can allow ourselves is the TID, which does not offer ANY
guarantee that is going to always have bits so-and-so unused so we can put
the futex count in there - I mean, I know I won't have 4 G tasks in a
system, but ... Ok, I assume I can << 2 it and it'd be ok.

futex = (tid << 2) | (whatever_count & 0x3)

Now, can we count on that being truly atomic? 

The thing is, for example, initially the lock is released [lock == 0] and we
try to acquire it:

atomic {
  if (lock == 0)
    lock = (TID << 2) | 0x1; /* we got it */
       /* Problem 1: if getting TID 
          means going to the kernel, we are screwed;
          let's hope the library caches it. */
  else {
    futex_wait (&lock);

So, this is easy; how do we put that into assembler? [pseudo c/inlinegcc/asm
borrowed and scrambled from NPTL's lowlevellock.h)

    lock xaddl %0, %3
    testl      %0, %0
    jz         uncontended
    leal       %3, %%ecx
    call       ___lll_mutex_lock_wait
: "=a" (result), "=&c" (ignore1), 
  "=&d" (ignore2), "=m" (futex)
: "0" (1), "3" (futex)
: "memory"

Ok, how it works is: %3 is the futex [0 - unlocked; 1 - locked, no waiters;
2 - locked, one or more waiters]; the first line is the juicy one; it does,
as you guys know:

temp = %0 + %3
%0 = %3
%3 = temp

that is:

temp = 1 + futex
reg[0] = futex
futex = temp

Now if the futex was unlocked, this will set it to one and we jump to
unconteded, as the testl on reg[0] will tell us it was zero and now we got
it; if it was locked, no waiters, it moves to 2 and we need to go down to
the kernel to wait [call ___lll_...]; same thing, if it was in 2, goes to 3
and we go to the kernel - when coming back, it will be put to 2 [so it does
not keep going up forever]. So, we need three bits, at least [and this is
not really true; given the right conditions, it could go up to a lot until
we reset it to two, so this needs attention -- for now let's assume it will
stay bounded].

Now, how do we add the TID in there? This does not work:

movl        (TID << 2) | 1, %ecx
lock xaddl  %ecx, futex 

As we will overwrite the TID of the owner of the futex if it is in states 1
and/or 2. We cannot replace the value because the operation would not be
atomic and somebody could jump in the middle.

This would not help either:

movl $1,%ecx
lock xaddl %ecx,futex
testl %ecx,%ecx
jnz contention
; Uncontended, we got it
xorl (TID << 2), futex

For states 1 and 2 it works, but not for 0; in this case, from the time when
the lock is acquired and the count goes from zero to 1, there is a window of
three instructions (testl, jnz, xorl) until we set the TID, so that an
incoming higher priority lock contender hitting it and raising our priority
if need arises would find an invalid TID.

So, here my assembler knowledge gets drained, and I think

movl $1, %ecx
movl futex, %ebx
movl (TID << 2) | 1, %edx
lock add-or-load %ebx,%ecx,%edx
jnz contention ; go to the kernel
.... wait ...

where add-or-load means [replace futex with %ebx]:

if (futex == 0) {
  futex = %edx;
  ZF = 1;
} else {
  futex += %ecx;
  ZF = 0;

But I guess this is asking for $1M or a date with Claudia Schiffer ... I
don't think there is a similar instruction in any architecture ...

So that's why I don't know how to do the identifier-embedded-in-the-word
thing; I'd love to be proved wrong and shown a way to go.

> I think we should think of this as instruction emulation where we go 
> to the kernel VERY quickly (both in and out).  If we don't allow 
> interruption, we would not need to save much context.  The major over 
> head would be the system call instruction.

Do you mean go to the kernel even in the uncontested case? I think this is
what you meant, even if it is taking the fast path of it just being an
interrupt, but I am sure the answer from anybody else would be something in
the lines of "no way" - they want the uncontested case to stay in user

Inaky Perez-Gonzalez -- Not speaking for Intel - opinions are my own [or my

More information about the cgl_discussion mailing list