[PATCH 4/4] setns.2: Document the pid, user, and mount namespace support.
Eric W. Biederman
ebiederm at xmission.com
Tue Jan 1 09:58:19 UTC 2013
"Michael Kerrisk (man-pages)" <mtk.manpages at gmail.com> writes:
> Hi Eric,
>
> On Thu, Dec 27, 2012 at 6:40 PM, Eric W. Biederman
> <ebiederm at xmission.com> wrote:
>> "Michael Kerrisk (man-pages)" <mtk.manpages at gmail.com> writes:
>>
>>> Hi Eric,
>>>
>>> Some questions below.
>>
>> A quick note. Getting the permission checks correct has been a little
>> more interesting that I would have preferred.
>>
>> I had to add a nsown_capable(CAP_SYS_ADMIN) check to all of the setns()
>> install methods except the user namespace. Not a change in pre 3.8
>> behavior but a change to my patch, and possibly a documentation change
>> below.
>>
>>> On Tue, Nov 27, 2012 at 1:48 AM, Eric W. Biederman
>>> <ebiederm at xmission.com> wrote:
>>>>
>>>> Signed-off-by: "Eric W. Biederman" <ebiederm at xmission.com>
>>>> ---
>>>> man2/setns.2 | 41 +++++++++++++++++++++++++++++++++--------
>>>> 1 files changed, 33 insertions(+), 8 deletions(-)
>>>>
>>>> diff --git a/man2/setns.2 b/man2/setns.2
>>>> index 6aa01e1..63b04dc 100644
>>>> --- a/man2/setns.2
>>>> +++ b/man2/setns.2
>>>> @@ -48,6 +48,18 @@ must refer to a network namespace.
>>>> .BR CLONE_NEWUTS
>>>> .I fd
>>>> must refer to a UTS namespace.
>>>> +.TP
>>>> +.BR CLONE_NEWPID
>>>> +.I fd
>>>> +must refer to a PID namespace.
>>>> +.TP
>>>> +.BR CLONE_NEWUSER
>>>> +.I fd
>>>> +must refer to a user namespace.
>>>> +.TP
>>>> +.BR CLONE_NEWNS
>>>> +.I fd
>>>> +must refer to a mount namespace.
>>>> .PP
>>>> Specifying
>>>> .I nstype
>>>> @@ -63,6 +75,25 @@ and wants to ensure that the namespace is of a particular type.
>>>> .IR fd
>>>> if the file descriptor was opened by another process and, for example,
>>>> passed to the caller via a UNIX domain socket.)
>>>> +
>>>> +The pid namespace is a little different. Reassociating the calling
>>>> +thread with a pid namespace only changes the pid namespace that the
>>>> +child processes will be created in.
>>>> +
>>>> +Changing the pid namespace for child processes is only allowed if the
>>>> +pid namespace specified by
>>>> +.IR fd
>>>> +is a child pid namespace of the pid namespace of the current thread.
>>>
>>> I assume "current thread" above should be "calling thread", right?
>>
>> What I mean in "current" from a kernel perspective.
>>
>> It should be just "caller".
>
> Okay. Changed.
>
>> Threads must share a pid namespace so mentioning threads seems wrong.
>>
>>>> +
>>>> +A multi-threaded process may not change user namespace with setns. A
>>>> +process may not reassociate the thread with the current user
>>>> +namespace.
>>>
>>> What do you mean by "the current user nsamesapce"?
>>
>> fd = open("/proc/self/ns/user");
>> setns(fd) -> -EINVAL.
>>
>> So from a userspace perspective I mean "the callers user namespace".
>>
>>>> The process reassociating itself with a user namespace
>>>> +must have CAP_SYS_ADMIN privileges in the target user namespace.
>>>>
>>>> +A process may not be reassociated with a new mount namespace if it is
>>>> +multi-threaded
>>>
>>> I tried to verify the precdeing two lines from the kernel source, but
>>> did not work out where this check is made. Where is it?
>>
>> kernel/user_namespace.c:userns_install()
>> fs/namespace.c:mntns_install()
>
> Thanks.
>
>> A couple of the security checks have been pushed down into a per
>> namespace context, because the exact check that makes sense depends on
>> the namespace.
>>
>>>> or it does not possess both CAP_SYS_CHROOT privileges
>>>> +and CAP_SYS_ADMIN rights over the target mount namespace.
>>>
>>> Could you please expand/clarify the preceding two lines. As they
>>> stand, I don't really understand them.
>>
>> Ugh. The text is slightly wrong.
>>
>> The code is:
>> if (!ns_capable(mnt_ns->user_ns, CAP_SYS_ADMIN) ||
>> !nsown_capable(CAP_SYS_CHROOT) ||
>> !nsown_capable(CAP_SYS_ADMIN))
>> return -EPERM;
>>
>> Basically you aren't allowed change your mount namespace into
>> a mount namespace that doesn't see you as the all powerful root
>> able to mount and unmount filesystems.
>>
>> You aren't allowed to change your mount namespace unless you possesses
>> CAP_SYS_CHROOT and CAP_SYS_ADMIN.
>
> Okay -- reworded.
>
> So, I've done some more reworking of the text, which now reads as
> folows. Could you please check this (and see my questions below).
>
> CLONE_NEWPID behaves somewhat differently from the other
> nstype values: reassociating the calling thread with a
> PID namespace only changes the PID namespace that child
> processes of the caller will be created in; it does not
> change the PID namespace of the caller itself.
> I reworked the preceding piece a lot. Is it correct still?
>
> Reassoci‐
> ating with a PID namespace is only allowed if the PID
> namespace specified by fd is a descendant (child, grand‐
> child, etc.)
>
> Is the preceding sentence correct? (You talked only of children in
> your original patch, but I believe it's more general than that.)
Yes. That is correct.
> PID namespace of the PID namespace of the
> caller.
>
> A multi-threaded process may not change user namespace
> with setns(). A process may not reassociate the thread
> with the caller's user namespace.
>
> What does the last sentence above *mean*? I don't understand it.
So the set of checks are:
/* Don't allow gaining capabilities by reentering
* the same user namespace.
*/
if (user_ns == current_user_ns())
return -EINVAL;
/* Threaded processes may not enter a different user namespace */
if (atomic_read(¤t->mm->mm_users) > 1)
return -EINVAL;
if (!ns_capable(user_ns, CAP_SYS_ADMIN))
return -EPERM;
Rereading it looks like I was going fast and suffered from dropping
important words.
A multi-threaded process may not change it's user namespace
with setns().
aka if you have threads setns for a user namespace will fail.
A process may not change the user namespace to the caller's user
namespace via setns. This is important because changing to a
user namespace via setns implies gaining all caps, and you should
not be able to gain all caps over your current user namespace.
Hopefully that clears it up.
> A process reassociat‐
> ing itself with a user namespace must have CAP_SYS_ADMIN
> privileges in the target user namespace.
>
> A process may not be reassociated with a new mount names‐
> pace if it is multi-threaded. Changing the mount names‐
> pace requires that the caller possess both CAP_SYS_CHROOT
> and CAP_SYS_ADMIN capabilities.
>
> Re the last sentence: are those capabilities required in (1) the
> target namespace, or (2) the source namespace, or (3) both? I suspect
> (1), but please confirm.
CAP_SYS_ADMIN is required in the current user namespace.
CAP_SYS_ADMIN is required over the target mount namesapce.
CAP_SYS_CHROOT is required in the current user namespace.
Eric
More information about the Containers
mailing list