For review: pid_namespaces(7) man page

Eric W. Biederman ebiederm at xmission.com
Thu Mar 7 08:31:29 UTC 2013


"Michael Kerrisk (man-pages)" <mtk.manpages at gmail.com> writes:

> On Wed, Mar 6, 2013 at 1:40 AM, Eric W. Biederman <ebiederm at xmission.com> wrote:
>> "Michael Kerrisk (man-pages)" <mtk.manpages at gmail.com> writes:
>>
>>> On Tue, Mar 5, 2013 at 7:41 AM, Eric W. Biederman <ebiederm at xmission.com> wrote:
>>>> "Michael Kerrisk (man-pages)" <mtk.manpages at gmail.com> writes:
>>>>
>>>>> Eric,
>>>>>
>>>>> On Mon, Mar 4, 2013 at 6:52 PM, Eric W. Biederman
>>>>> <ebiederm at xmission.com> wrote:
>>>>>> "Michael Kerrisk (man-pages)" <mtk.manpages at gmail.com> writes:
>>>>>>
>>>>>>> On Fri, Mar 1, 2013 at 4:35 PM, Eric W. Biederman
>>>>> <ebiederm at xmission.com> wrote:
>>>>>>>> "Michael Kerrisk (man-pages)" <mtk.manpages at gmail.com> writes:
>>>>>>>>
>>>>>>>>> Hi Rob,
>>>>>>>>>
>>>>>>>>> On Fri, Mar 1, 2013 at 5:01 AM, Rob Landley <rob at landley.net>
>>>>> wrote:
>>>>>>>>>> On 02/28/2013 05:24:07 AM, Michael Kerrisk (man-pages) wrote:
>>>>> [...]
>>>>>>>>>>> Because the above unshare(2) and setns(2) calls only change the
>>>>>>>>>>> PID namespace for created children, the clone(2) calls neces‐
>>>>>>>>>>> sarily put the new thread in a different PID namespace from the
>>>>>>>>>>> calling thread.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Um, no they don't. They fail. That's the point.
>>>>>>>>>
>>>>>>>>> (Good catch.)
>>>>>>>>>
>>>>>>>>>> They _would_ put the new
>>>>>>>>>> thread in a different PID namespace, which breaks the definition
>>>>> of threads.
>>>>>>>>>>
>>>>>>>>>> How about:
>>>>>>>>>>
>>>>>>>>>> The above unshare(2) and setns(2) calls change the PID namespace
>>>>> of
>>>>>>>>>> children created by subsequent clone(2) calls, which is
>>>>> incompatible
>>>>>>>>>> with CLONE_VM.
>>>>>>>>>
>>>>>>>>> I decided on:
>>>>>>>>>
>>>>>>>>> The point here is that unshare(2) and setns(2) change the PID
>>>>>>>>> namespace for created children but not for the calling process,
>>>>>>>>> while clone(2) CLONE_VM specifies the creation of a new thread
>>>>>>>>> in the same process.
>>>>>>>>
>>>>>>>> Can we make that "for all new tasks created" instead of "created
>>>>>>>> children"
>>>>>>>>
>>>>>>>> Othewise someone might expect CLONE_THREAD would work as you
>>>>>>>> CLONE_THREAD creates a thread and not a child...
>>>>>>>
>>>>>>> The term "task" is kernel-space talk that rarely appears in man
>>>>> pages,
>>>>>>> so I am reluctant to use it.
>>>>>>
>>>>>> With respect to clone and in this case I am not certain we can
>>>>> properly
>>>>>> describe what happens without talking about tasks. But it is worth
>>>>>> a try.
>>>>>>
>>>>>>
>>>>>>> How about this:
>>>>>>>
>>>>>>> The point here is that unshare(2) and setns(2) change the PID
>>>>>>> namespace for processes subsequently created by the caller, but
>>>>>>> not for the calling process, while clone(2) CLONE_VM specifies
>>>>>>> the creation of a new thread in the same process.
>>>>>>
>>>>>> Hmm. How about this.
>>>>>>
>>>>>> The point here is that unshare(2) and setns(2) change the PID
>>>>>> namespace that will be used by in all subsequent calls to clone
>>>>>> and fork by the caller, but not for the calling process, and
>>>>>> that all threads in a process must share the same PID
>>>>>> namespace. Which makes a subsequent clone(2) CLONE_VM
>>>>>> specify the creation of a new thread in the a different PID
>>>>>> namespace but in the same process which is impossible.
>>>>>
>>>>> I did a little tidying:
>>>>>
>>>>> The point here is that unshare(2) and setns(2) change the
>>>>> PID namespace that will be used in all subsequent calls
>>>>> to clone(2) and fork(2), but do not change the PID names‐
>>>>> pace of the calling process. Because a subsequent
>>>>> clone(2) CLONE_VM would imply the creation of a new
>>>>> thread in a different PID namespace, the operation is not
>>>>> permitted.
>>>>>
>>>>> Okay?
>>>>
>>>> That seems reasonable.
>>>>
>>>> CLONE_THREAD might be better to talk about.  The check is CLONE_VM
>>>> because it is easier and CLONE_THREAD implies CLONE_THREAD.
>>>>
>>>>> Having asked that, I realize that I'm still not quite comfortable with
>>>>> this text. I think the problem is really one of terminology. At the
>>>>> start of this passage in the page, there is the sentence:
>>>>>
>>>>> Every thread in a process must be in the
>>>>> same PID namespace.
>>>>>
>>>>> Can you define "thread" in this context?
>>>>
>>>> Most definitely a thread group created with CLONE_THREAD.  It is pretty
>>>> ugly in just the old fashioned CLONE_VM case too, but that might be
>>>> legal.
>>>>
>>>> In a few cases I think the implementation overshoots and test for VM
>>>> sharing instead of thread group membership because VM sharing is easier
>>>> to test for, and we already have tests for that.
>>>
>>> So, in summary, the point is that CLONE_VM is being used as a proxy
>>> for CLONE_THREAD because the former is easier to test for, and
>>> CLONE_THREAD requires CLONE_VM, right?
>>
>> I am totally lost about what we are problem we are trying to resolve in
>> the text at this point.  So I am taking this opportunity to review
>> what is actually happening and hopefully give a clear and useful
>> explanation.
>
> The problem is that the existing text talks about multithreaded
> processes needing to be in the same PID namespace and then jumps to
> talking about restrictions with CLONE_VM (not CLONE_THREAD). The
> reader may not realize know that CLONE_VM is a near synonym for
> "multithreaded process".
>
> However, the text you provide here is wonderful detail:
>
>> The clone flags have some dependencies.
>> CLONE_SIGHAND requires CLONE_VM.
>> CLONE_THREAD requires CLONE_SIGHAND.
>>
>> Ultimately there are cases in here that are too strange to think about,
>> and that no one cares (except so far to document what is going on).  The
>> fundamental goal of these checks it to just not allow the cases that
>> are too strange to think about.
>>
>> From a technical point of view CLONE_THREAD requires being in the same
>> PID namespace so you can send signals to other threads in your process,
>> and you need to see in proc all of the threads of your process.
>>
>> From a technical point of view CLONE_SIGHAND requries being in the same
>> PID namespace because we need to know how to encode the PID of the
>> sending process at the time a signal is enqueued in the destination
>> queue.  A signal queue shared by processes in multiple PID namespaces
>> will defeat that.
>>
>> From a technical point of view CLONE_VM requires all of the threads to
>> be in a PID namespace, because from the point of view of coredump code
>> if two processes share the same address space they are threads and will
>> be core dumped together.  When a coredump is written the pid of each
>> thread is written into the coredump.  Writing the pids could not
>> meaningfully succeed if some of the pids were in a parent PID namespace.
>>
>> Therefore there is a technical requirement for each of CLONE_THREAD,
>> CLONE_SIGHAND, CLONE_VM to share a PID namespace.
>>
>> In the code in the kernel testing only for CLONE_VM is a shorthand for
>> testing for any of CLONE_THREAD, CLONE_SIGHAND, or CLONE_VM.
>
> I will incorporate most of the above into the page.
>
>> On the flip side the addition by unshare(CLONE_NEWPID) of
>> unshare(CLONE_THREAD) actually appears to be bogus
>
> I agree that it seems strange.

Having looked at it a little more I will be removing the unnecessary
CLONE_THREAD check in 3.10.

Eric


More information about the Containers mailing list