[cgl_discussion] Question for TEMs/ISVs/OEMs regarding pthrea d requirements

Perez-Gonzalez, Inaky inaky.perez-gonzalez at intel.com
Fri Feb 21 13:09:31 PST 2003

Hi all

> I still contend that it is easier to NOT set the WP bit in user land,
> but to just let the kernel do it.  This eliminates the issue of unlock
> finding a WP bit and no waiters.  There would only be one visit to the
> kernel in this case, but, and more important, WP would reliably
> indicate waters.

Mmmmm... I don't know if I agree or disagree. Makes sense if we can do the
operation in the kernel. Taking into account this is really only important
when the waiters are lower priority than the owners ... hmmm

I guess this should be measured to see the impact of doing it also in user
space or not.
> >>>>c) The lock is held by the caller (this is a recursion error and
> >>>>should cause termination of the caller with ext ream prejudice)
> >>>
> >>>I'd return -EDEADLOCK or something like that, but probably termination
> >>>makes sense too ...
> >>
> >>Too many folks don't check for errors  :(
> >
> > Yep, but this is a primitive, not meant to be used by one of those
> > folks who don't check for errors :]; actually NPTL or NGPT are the ones
> > who are going to call for this in their mutex code; if they see that,
> > they need to handle it somehow.
> If the kernel bounces it, you will end up in gdb at the correct
> location...

Being deadlock detection kind of heavyweight when there is a lot of task A
waiting on lock L held by B who waits for G that is locked by C that is
waiting for H ... I was planning on make it optional (flag you pass). If we
catch it without the flag, I guess we could "terminate with extreme
prejudice", else return EDEADLK.

> > However, that task might not be the original locker, as it could have
> > miserably died and some other task reused the TID; we need to make sure
> > it is it.
> The task and its threads must be connected in some way more robust and
> easier to use than find_task_by_pid().  I know that if the process
> dies or quits it takes all its threads with it.  In HRT I use the
> thread group id (tgid) which is the same for all tasks in a process.
> I think it would be sufficient that the given TID pointed to a task in
> the same "tgid"

Not really, I don't need to connect them - I just need to identify the
task_struct from the PID, and find_task_by_pid() does it. Think I only have
the PID; once I have the task_struct I can tie them with the tgid for all
the process, but I don't really need that at all.

> I tend to like simple things (like pi-mutex ;).  Aren't the kernel
> futex structures owned by the process in such a way that the process
> can clean them up when it exits?  I would expect them to be in a
> linked list headed at the process task_struct.  Then the futex it self

Yep, this is something I am adding. I have in the task_struct a futex_list
with all the owned futexes and a pointer to the futex the task is waiting
for, if any. 

On cleanup, it just walks the list and starts recovery as dead owner on all
the owned futexes.

Of course, this is what I use for deadlock detection :]

> could have the process PID in it and you would have a circular list
> that would allow the needed checks.  Or does this fall apart when you
> allow tasks outside of the thread group to lock the mutex?  If a mutex

No, it doesn't, because I don't consider threads to be different. Every
thread has a task_struct, and what ties them together into a single process
is that they have a tgid unifying them. For me, they are all different.
However, when the process, as a whole, dies, it will have to destroy each

> is only thread group wide, it is sufficient that the futex be in the
> same thread group as the caller/ user.  Process termination, for any
> reason, should free all futexes, so you should not be able to find any
> that are stale.

On each thread reap, I recover all the mutexes they own and I dequeue from
the ones they are waiting for, undoing priority boosts and all that ...

> By the way, how are you passing the kernel address of the mutex back
> and fort to user space?

Don't need it; the conversion is done in kernel space, the kernel address
from the user address. This is from the original futex code; check out
__pin_page() in kernel/futex.c.

> I had this issue with posix timers.  When a timer is created a kernel
> structure is allocated and the user gets a "handle" to it.  Jim
> Houston and I wrote an "id" allocator to handle this.  The code is now
> in 2.5 (as of 2.5.63, soon to be on a corner near you).  When the user
> passes back the "handle" the id allocator code looks up the pointer to
> the kernel structure and passes it back.  To close the loop and
> prevent spoofing, I put some "sort of random" bits in the high end of
> the "handle" and put the "handle" in the kernel structure.  If the
> returned address does not point to a structure that contains the same
> "handle" it is an error.  It is fast on allocation and even faster on
> lookup.

You could have use the PID allocator that WLI and Ingo did for 2.5, look in
kernel/pid.c. You can add different identifier spaces (now it is compile
time static), but it'd save you the work...hmmm never mind, too late, I am
afraid :(

Iñaky Pérez-Gonzalez --- Not speaking for Intel -- all opinions are my own
(and my fault)

More information about the cgl_discussion mailing list