[cgl_discussion] Question for TEMs/ISVs/OEMs regarding pthrea d requirements

george anzinger george at mvista.com
Thu Feb 20 22:43:38 PST 2003


Perez-Gonzalez, Inaky wrote:
>>>>b) The lock is held by another user and is contended, i.e. this is a
>>>>second or higher contender (also normal)
>>>
>>>
>>>You need to make sure the WAITERS_PRESENT bit is set.
>>
>>If it is not, then you have a corrupt mutex.  Time to do somthing
>>drastic.  Actually I assumed that the WAITERS_PRESENT bit is what made
>>the kernel think it was case b.  I suppose there could be kernel
>>tables for the mutex which also say the same thing.  In case they
>>don't you have corruption :(
> 
> 
> Not necessarily, because if you are in the kernel, it means you are
> contending, so the WP bit should be set; if it is not, it is the case that
> you mentioned before: F was locked by A, B (me) sets the WP bit and goes
> down to the kernel; before I am able to queue up, A goes down to the kernel
> and unlocks, but sees nobody, so does nothing. 

I still contend that it is easier to NOT set the WP bit in user land, 
but to just let the kernel do it.  This eliminates the issue of unlock 
finding a WP bit and no waiters.  There would only be one visit to the 
kernel in this case, but, and more important, WP would reliably 
indicate waters.

> Still before I (B) am able to
> actually reach the kernel, C locks (and as it is UNLOCKED, it sets no
> WAITERS_PRESENT bit). Now finally I get there (to the kernel) and re-examine
> the value; it has no present bit ... oops, I need to re-contend to set it
> and then queue up, so that when C unlocks it goes to the kernel and wakes me
> up.
> 
> 
>>>>c) The lock is held by the caller (this is a recursion error and
>>>>should cause termination of the caller with ext ream prejudice)
>>>
>>>
>>>I'd return -EDEADLOCK or something like that, but probably termination
>>
>>makes
>>
>>>sense too ...
>>
>>Too many folks don't check for errors  :(
> 
> 
> Yep, but this is a primitive, not meant to be used by one of those folks who
> don't check for errors :]; actually NPTL or NGPT are the ones who are going
> to call for this in their mutex code; if they see that, they need to handle
> it somehow.

If the kernel bounces it, you will end up in gdb at the correct 
location...
> 
> And also, c) is the simplest case of a deadlock condition being detected; I
> will use something in the lines of Corey's algorithm for deadlock detection,
> although I still need to sharpen the rough edges, regarding performance
> impact and stuff [probably it makes sense to leave it as a runtime option].

I think that you will find the deadlocks while running down the 
priority boost and also that you MUST detect them or loop (in the 
kernel) forever or worse.
> 
> 
>>>>d) The lock is not held (i.e. the holder has gone away while the call
>>>>was making its way to the kernel, fine, give the lock to the caller
>>>>and continue)
>>>
>>>So you need to set your locker ID.
>>
>>Yes, or go back and retry from user space.  May as well do it in the
>>kernel as it avoids possible contention and you are already here.
> 
> 
> That's it. Now if I can only access a user page from kernel space using
> normal pointers, I am done - something like map 'struct page *foo' in the
> kernel virtual address space... 
> 
> 
>>>This is taken care of by the robust mutex support. Any contender that
>>
>>jumps
>>
>>>in the kernel and sees the futex_q has no registered owner needs to look
>>
>>it
>>
>>>up; if it finds none, it will be assumed it is garbage and the recovery
>>>process will be started if so is requested.
>>>
>>>If it finds it, it sets it in the futex data and also sets a link from
>>
>>the
>>
>>>task structure to the futex data, so that on process death, it can be
>>>accounted for.
>>
>>Great, I think we understand and agree at this point.  I assume that
>>the mapping you want is possible, but I don't know how to do it.
> 
> 
> Well, it is kind of tricky. We use the TID (that is supposedly different for
> every running thread in the application and different from any other PID in
> the system). Then we do a find_task_by_pid() on it and that gives us a
> 'struct task_struct *'.
> 
> However, that task might not be the original locker, as it could have
> miserably died and some other task reused the TID; we need to make sure it
> is it.

The task and its threads must be connected in some way more robust and 
easier to use than find_task_by_pid().  I know that if the process 
dies or quits it takes all its threads with it.  In HRT I use the 
thread group id (tgid) which is the same for all tasks in a process. 
I think it would be sufficient that the given TID pointed to a task in 
the same "tgid"
> 
> So what I am planning is use some kind of signature. Any task that will use
> owner identification needs to tell the kernel to generate a signature; that
> signature will be done hashing the TID plus some values which are static
> throughout the lifetime of the task (namely, task->start_time, the pointers
> task, task->signature, task->proc_dentry, task->fs, task->files). Now, we
> generate that signature and store it into the space pointed to by
> task->signature.
> 
> Now, whenever I get the task pointer from find_task_by_pid(), I check it is
> who it should by copying again, to a buffer, the TID from *futex, and then,
> in the same order, the same data and pointers I would use to generate the
> signature (task->start_time, task, task->signature ... etc). Hash that out
> and compare it with the contents of task->signature. If it is the same, I
> have my owner; if not, the original owner died, need to start recovery.
> 
> As a hash mechanism anything can be used, although we want a solid algorithm
> that does not give us false positives. I guess MD5 is going to be kind of
> heavyweight, but again, the thing can be done very modular, so it will be
> easy to replace it [as long as the hash space is wide enough].
> 
> Good thing is this is only done once for each thread that requires it (once
> you declare a prio protected/inheriting or robust mutex, you generate the
> signature for all the threads in the process and flag that all new threads
> get it generated at thread creation); then, you only need to check it when
> you are the first locker and you see that futex_q->owner is NULL [it is
> guaranteed that before the owner task dies, futex_q->owner will be cleaned].

I tend to like simple things (like pi-mutex ;).  Aren't the kernel 
futex structures owned by the process in such a way that the process 
can clean them up when it exits?  I would expect them to be in a 
linked list headed at the process task_struct.  Then the futex it self 
could have the process PID in it and you would have a circular list 
that would allow the needed checks.  Or does this fall apart when you 
allow tasks outside of the thread group to lock the mutex?  If a mutex 
is only thread group wide, it is sufficient that the futex be in the 
same thread group as the caller/ user.  Process termination, for any 
reason, should free all futexes, so you should not be able to find any 
that are stale.

By the way, how are you passing the kernel address of the mutex back 
and fort to user space?

I had this issue with posix timers.  When a timer is created a kernel 
structure is allocated and the user gets a "handle" to it.  Jim 
Houston and I wrote an "id" allocator to handle this.  The code is now 
in 2.5 (as of 2.5.63, soon to be on a corner near you).  When the user 
passes back the "handle" the id allocator code looks up the pointer to 
the kernel structure and passes it back.  To close the loop and 
prevent spoofing, I put some "sort of random" bits in the high end of 
the "handle" and put the "handle" in the kernel structure.  If the 
returned address does not point to a structure that contains the same 
"handle" it is an error.  It is fast on allocation and even faster on 
lookup.
> 
> What do you think?
> 
> Iñaky Pérez-Gonzalez --- Not speaking for Intel -- all opinions are my own
> (and my fault)
> 
> 
> 

-- 
George Anzinger   george at mvista.com
High-res-timers:  http://sourceforge.net/projects/high-res-timers/
Preemption patch: http://www.kernel.org/pub/linux/kernel/people/rml




More information about the cgl_discussion mailing list