[PATCH 4/4] setns.2: Document the pid, user, and mount namespace support.

Tue Jan 1 09:30:45 UTC 2013

Hi Eric,

On Thu, Dec 27, 2012 at 6:40 PM, Eric W. Biederman
<ebiederm at xmission.com> wrote:
> "Michael Kerrisk (man-pages)" <mtk.manpages at gmail.com> writes:
>
>> Hi Eric,
>>
>> Some questions below.
>
> A quick note.  Getting the permission checks correct has been a little
> more interesting that I would have preferred.
>
> I had to add a nsown_capable(CAP_SYS_ADMIN) check to all of the setns()
> install methods except the user namespace.  Not a change in pre 3.8
> behavior but a change to my patch, and possibly a documentation change
> below.
>
>> On Tue, Nov 27, 2012 at 1:48 AM, Eric W. Biederman
>> <ebiederm at xmission.com> wrote:
>>>
>>> Signed-off-by: "Eric W. Biederman" <ebiederm at xmission.com>
>>> ---
>>>  man2/setns.2 |   41 +++++++++++++++++++++++++++++++++--------
>>>  1 files changed, 33 insertions(+), 8 deletions(-)
>>>
>>> diff --git a/man2/setns.2 b/man2/setns.2
>>> index 6aa01e1..63b04dc 100644
>>> --- a/man2/setns.2
>>> +++ b/man2/setns.2
>>> @@ -48,6 +48,18 @@ must refer to a network namespace.
>>>  .BR CLONE_NEWUTS
>>>  .I fd
>>>  must refer to a UTS namespace.
>>> +.TP
>>> +.BR CLONE_NEWPID
>>> +.I fd
>>> +must refer to a PID namespace.
>>> +.TP
>>> +.BR CLONE_NEWUSER
>>> +.I fd
>>> +must refer to a user namespace.
>>> +.TP
>>> +.BR CLONE_NEWNS
>>> +.I fd
>>> +must refer to a mount namespace.
>>>  .PP
>>>  Specifying
>>>  .I nstype
>>> @@ -63,6 +75,25 @@ and wants to ensure that the namespace is of a particular type.
>>>  .IR fd
>>>  if the file descriptor was opened by another process and, for example,
>>>  passed to the caller via a UNIX domain socket.)
>>> +
>>> +The pid namespace is a little different.  Reassociating the calling
>>> +thread with a pid namespace only changes the pid namespace that the
>>> +child processes will be created in.
>>> +
>>> +Changing the pid namespace for child processes is only allowed if the
>>> +pid namespace specified by
>>> +.IR fd
>>> +is a child pid namespace of the pid namespace of the current thread.
>>
>> I assume "current thread" above should be "calling thread", right?
>
> What I mean in "current" from a kernel perspective.
>
> It should be just "caller".

Okay. Changed.

> Threads must share a pid namespace so mentioning threads seems wrong.
>
>>> +
>>> +A multi-threaded process may not change user namespace with setns.  A
>>> +process may not reassociate the thread with the current user
>>> +namespace.
>>
>> What do you mean by "the current user nsamesapce"?
>
> fd = open("/proc/self/ns/user");
> setns(fd) -> -EINVAL.
>
> So from a userspace perspective I mean "the callers user namespace".
>
>>> The process reassociating itself with a user namespace
>>> +must have CAP_SYS_ADMIN privileges in the target user namespace.
>>>
>>> +A process may not be reassociated with a new mount namespace if it is
>>> +multi-threaded
>>
>> I tried to verify the precdeing two lines from the kernel source, but
>> did not work out where this check is made. Where is it?
>
> kernel/user_namespace.c:userns_install()
> fs/namespace.c:mntns_install()

Thanks.

> A couple of the security checks have been pushed down into a per
> namespace context, because the exact check that makes sense depends on
> the namespace.
>
>>> or it does not possess both CAP_SYS_CHROOT privileges
>>> +and CAP_SYS_ADMIN rights over the target mount namespace.
>>
>> Could you please expand/clarify the preceding two lines. As they
>> stand, I don't really understand them.
>
> Ugh.  The text is slightly wrong.
>
> The code is:
>         if (!ns_capable(mnt_ns->user_ns, CAP_SYS_ADMIN) ||
>             !nsown_capable(CAP_SYS_CHROOT) ||
>             !nsown_capable(CAP_SYS_ADMIN))
>                 return -EPERM;
>
> Basically you aren't allowed change your mount namespace into
> a mount namespace that doesn't see you as the all powerful root
> able to mount and unmount filesystems.
>
> You aren't allowed to change your mount namespace unless you possesses
> CAP_SYS_CHROOT and CAP_SYS_ADMIN.

Okay -- reworded.

So, I've done some more reworking of the text, which now reads as
folows. Could you please check this (and see my questions below).

       CLONE_NEWPID  behaves somewhat differently from the other
       nstype values: reassociating the calling  thread  with  a
       PID  namespace  only changes the PID namespace that child
       processes of the caller will be created in; it  does  not
       change the PID namespace of the caller itself.

I reworked the preceding piece a lot. Is it correct still?

       Reassoci‐
       ating with a PID namespace is only  allowed  if  the  PID
       namespace  specified by fd is a descendant (child, grand‐
       child, etc.)

Is the preceding sentence correct? (You talked only of children in
your original patch, but I believe it's more general than that.)

       PID namespace of the PID namespace  of  the
       caller.

       A  multi-threaded  process  may not change user namespace
       with setns().  A process may not reassociate  the  thread
       with  the caller's user namespace.

What does the last sentence above *mean*? I don't understand it.

       A process reassociat‐
       ing itself with a user namespace must have  CAP_SYS_ADMIN
       privileges in the target user namespace.

       A process may not be reassociated with a new mount names‐
       pace if it is multi-threaded.  Changing the mount  names‐
       pace requires that the caller possess both CAP_SYS_CHROOT
       and CAP_SYS_ADMIN capabilities.

Re the last sentence: are those capabilities required in (1) the
target namespace, or (2) the source namespace, or (3) both? I suspect
(1), but please confirm.

Thanks,

Michael

-- 
Michael Kerrisk
Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/
Author of "The Linux Programming Interface"; http://man7.org/tlpi/