[cgl_discussion] Question for TEMs/ISVs/OEMs regarding pthrea d requirements

Perez-Gonzalez, Inaky inaky.perez-gonzalez at intel.com
Thu Feb 20 15:07:43 PST 2003

> >>b) The lock is held by another user and is contended, i.e. this is a
> >>second or higher contender (also normal)
> >
> >
> > You need to make sure the WAITERS_PRESENT bit is set.
> If it is not, then you have a corrupt mutex.  Time to do somthing
> drastic.  Actually I assumed that the WAITERS_PRESENT bit is what made
> the kernel think it was case b.  I suppose there could be kernel
> tables for the mutex which also say the same thing.  In case they
> don't you have corruption :(

Not necessarily, because if you are in the kernel, it means you are
contending, so the WP bit should be set; if it is not, it is the case that
you mentioned before: F was locked by A, B (me) sets the WP bit and goes
down to the kernel; before I am able to queue up, A goes down to the kernel
and unlocks, but sees nobody, so does nothing. Still before I (B) am able to
actually reach the kernel, C locks (and as it is UNLOCKED, it sets no
WAITERS_PRESENT bit). Now finally I get there (to the kernel) and re-examine
the value; it has no present bit ... oops, I need to re-contend to set it
and then queue up, so that when C unlocks it goes to the kernel and wakes me

> >>c) The lock is held by the caller (this is a recursion error and
> >>should cause termination of the caller with ext ream prejudice)
> >
> >
> > I'd return -EDEADLOCK or something like that, but probably termination
> makes
> > sense too ...
> Too many folks don't check for errors  :(

Yep, but this is a primitive, not meant to be used by one of those folks who
don't check for errors :]; actually NPTL or NGPT are the ones who are going
to call for this in their mutex code; if they see that, they need to handle
it somehow.

And also, c) is the simplest case of a deadlock condition being detected; I
will use something in the lines of Corey's algorithm for deadlock detection,
although I still need to sharpen the rough edges, regarding performance
impact and stuff [probably it makes sense to leave it as a runtime option].

> >>d) The lock is not held (i.e. the holder has gone away while the call
> >>was making its way to the kernel, fine, give the lock to the caller
> >>and continue)
> >
> > So you need to set your locker ID.
> Yes, or go back and retry from user space.  May as well do it in the
> kernel as it avoids possible contention and you are already here.

That's it. Now if I can only access a user page from kernel space using
normal pointers, I am done - something like map 'struct page *foo' in the
kernel virtual address space... 

> > This is taken care of by the robust mutex support. Any contender that
> jumps
> > in the kernel and sees the futex_q has no registered owner needs to look
> it
> > up; if it finds none, it will be assumed it is garbage and the recovery
> > process will be started if so is requested.
> >
> > If it finds it, it sets it in the futex data and also sets a link from
> the
> > task structure to the futex data, so that on process death, it can be
> > accounted for.
> Great, I think we understand and agree at this point.  I assume that
> the mapping you want is possible, but I don't know how to do it.

Well, it is kind of tricky. We use the TID (that is supposedly different for
every running thread in the application and different from any other PID in
the system). Then we do a find_task_by_pid() on it and that gives us a
'struct task_struct *'.

However, that task might not be the original locker, as it could have
miserably died and some other task reused the TID; we need to make sure it
is it.

So what I am planning is use some kind of signature. Any task that will use
owner identification needs to tell the kernel to generate a signature; that
signature will be done hashing the TID plus some values which are static
throughout the lifetime of the task (namely, task->start_time, the pointers
task, task->signature, task->proc_dentry, task->fs, task->files). Now, we
generate that signature and store it into the space pointed to by

Now, whenever I get the task pointer from find_task_by_pid(), I check it is
who it should by copying again, to a buffer, the TID from *futex, and then,
in the same order, the same data and pointers I would use to generate the
signature (task->start_time, task, task->signature ... etc). Hash that out
and compare it with the contents of task->signature. If it is the same, I
have my owner; if not, the original owner died, need to start recovery.

As a hash mechanism anything can be used, although we want a solid algorithm
that does not give us false positives. I guess MD5 is going to be kind of
heavyweight, but again, the thing can be done very modular, so it will be
easy to replace it [as long as the hash space is wide enough].

Good thing is this is only done once for each thread that requires it (once
you declare a prio protected/inheriting or robust mutex, you generate the
signature for all the threads in the process and flag that all new threads
get it generated at thread creation); then, you only need to check it when
you are the first locker and you see that futex_q->owner is NULL [it is
guaranteed that before the owner task dies, futex_q->owner will be cleaned].

What do you think?

Iñaky Pérez-Gonzalez --- Not speaking for Intel -- all opinions are my own
(and my fault)

More information about the cgl_discussion mailing list