Testing lxc 0.6.5 in Fedora 13

Eric W. Biederman ebiederm at xmission.com
Tue Apr 6 08:13:13 PDT 2010


Matt Helsley <matthltc at us.ibm.com> writes:

> On Mon, Apr 05, 2010 at 08:44:43PM -0700, Roland McGrath wrote:
>> (I've been away for a couple of weeks.)
>> I concur with the things Oleg's said in this thread.
>> 
>> As to what's "correct" for the kernel in theory, it would certainly make
>> sense to clean up the ptrace cases to use the tracer (parent) pid_ns when
>> reporting any PID as such.  The wait and SIGCHLD code already does this, so
>> that would be consistent.  Off hand I don't see anything other than
>> tracehook_report_clone{,_complete}() that sees the wrong value now.
>
> Yup.
>
>> Fixing that requires a bit of hair.  The simple and clean approach is to
>> just have the tracehook calls (i.e. ptrace layer) extract the PID from the
>> task_struct using the desired pid_ns.  The trouble there is that the
>
> It's also possible to take an extra reference to the struct pid and pass
> that to the tracehook. That and the pid_ns of the tracer receiving the pid
> is all we'll ever need inside the tracehook layer. The only advantage, I
> think, is we wouldn't pin the task struct while holding the pid reference.
>
>> tracehook_report_clone_complete() call is made when that task_struct is no
>> longer guaranteed to be valid.  The contrary approach of extracting the
>> appropriate value for the tracer earlier breaks the clean layering because
>> the fork.c code really should not know at all that ->parent->nsproxy is the
>> place to look for what values to pass to tracehook calls.  I guess the
>> simple and clean fix is to get_task_struct() before wake_up_new_task()
>> and put_task_struct() after tracehook_report_clone_complete().  That does
>> add some gratuitous atomic incr/decr overhead, though.
>
> Also true.
>
> Of course my suggestion of holding the pid reference won't avoid adding
> atomic ops -- just changes which refcount they work on.
>
>> 
>> None of this has much of anything to do with strace, of course.  As I've
>> said, I don't see anything other than the PTRACE_GETEVENTMSG value for
>> PTRACE_EVENT_{CLONE,FORK,VFORK} reports that is wrong in the kernel.  As
>> Oleg said, strace doesn't use that at all.  (This is not the place to
>> discuss the details of strace further.)
>
> Also, looking at proposed changes (utrace and Eric Biederman's setns())
> storing a pid nr rather than a reference to a task struct or struct pid
> probably won't be correct.

My setns work has demonstrated that even for entering a namespace we
never ever need to change the struct pid of a task.  setns has no other
bearing on this problem then to say there is no foreseeable reason to
change the rules.

> In the case of Eric Biederman's setns(), if capable of changing pid namespace,
> we could have:
>
> 	Traced				Tracer
> 	fork()
> 					... (an arbitrary amount of time passes)
> 					setns()
> 					ptrace(GETEVENTMSG)

Forget that.  The pid namespace was architected so that we can ptrace a process
in another pid namespace.

> At which point returning a static pid number held in the message field
> produces the wrong pid.

No.  A processes always sees pids from the context of it's original pid
namespace.  All setns does is affect which pid namespace children will
be native in.


Eric


More information about the Containers mailing list