[PATCH 4/4] setns.2: Document the pid, user, and mount namespace support.
Michael Kerrisk (man-pages)
mtk.manpages at gmail.com
Tue Jan 1 09:30:45 UTC 2013
Hi Eric,
On Thu, Dec 27, 2012 at 6:40 PM, Eric W. Biederman
<ebiederm at xmission.com> wrote:
> "Michael Kerrisk (man-pages)" <mtk.manpages at gmail.com> writes:
>
>> Hi Eric,
>>
>> Some questions below.
>
> A quick note. Getting the permission checks correct has been a little
> more interesting that I would have preferred.
>
> I had to add a nsown_capable(CAP_SYS_ADMIN) check to all of the setns()
> install methods except the user namespace. Not a change in pre 3.8
> behavior but a change to my patch, and possibly a documentation change
> below.
>
>> On Tue, Nov 27, 2012 at 1:48 AM, Eric W. Biederman
>> <ebiederm at xmission.com> wrote:
>>>
>>> Signed-off-by: "Eric W. Biederman" <ebiederm at xmission.com>
>>> ---
>>> man2/setns.2 | 41 +++++++++++++++++++++++++++++++++--------
>>> 1 files changed, 33 insertions(+), 8 deletions(-)
>>>
>>> diff --git a/man2/setns.2 b/man2/setns.2
>>> index 6aa01e1..63b04dc 100644
>>> --- a/man2/setns.2
>>> +++ b/man2/setns.2
>>> @@ -48,6 +48,18 @@ must refer to a network namespace.
>>> .BR CLONE_NEWUTS
>>> .I fd
>>> must refer to a UTS namespace.
>>> +.TP
>>> +.BR CLONE_NEWPID
>>> +.I fd
>>> +must refer to a PID namespace.
>>> +.TP
>>> +.BR CLONE_NEWUSER
>>> +.I fd
>>> +must refer to a user namespace.
>>> +.TP
>>> +.BR CLONE_NEWNS
>>> +.I fd
>>> +must refer to a mount namespace.
>>> .PP
>>> Specifying
>>> .I nstype
>>> @@ -63,6 +75,25 @@ and wants to ensure that the namespace is of a particular type.
>>> .IR fd
>>> if the file descriptor was opened by another process and, for example,
>>> passed to the caller via a UNIX domain socket.)
>>> +
>>> +The pid namespace is a little different. Reassociating the calling
>>> +thread with a pid namespace only changes the pid namespace that the
>>> +child processes will be created in.
>>> +
>>> +Changing the pid namespace for child processes is only allowed if the
>>> +pid namespace specified by
>>> +.IR fd
>>> +is a child pid namespace of the pid namespace of the current thread.
>>
>> I assume "current thread" above should be "calling thread", right?
>
> What I mean in "current" from a kernel perspective.
>
> It should be just "caller".
Okay. Changed.
> Threads must share a pid namespace so mentioning threads seems wrong.
>
>>> +
>>> +A multi-threaded process may not change user namespace with setns. A
>>> +process may not reassociate the thread with the current user
>>> +namespace.
>>
>> What do you mean by "the current user nsamesapce"?
>
> fd = open("/proc/self/ns/user");
> setns(fd) -> -EINVAL.
>
> So from a userspace perspective I mean "the callers user namespace".
>
>>> The process reassociating itself with a user namespace
>>> +must have CAP_SYS_ADMIN privileges in the target user namespace.
>>>
>>> +A process may not be reassociated with a new mount namespace if it is
>>> +multi-threaded
>>
>> I tried to verify the precdeing two lines from the kernel source, but
>> did not work out where this check is made. Where is it?
>
> kernel/user_namespace.c:userns_install()
> fs/namespace.c:mntns_install()
Thanks.
> A couple of the security checks have been pushed down into a per
> namespace context, because the exact check that makes sense depends on
> the namespace.
>
>>> or it does not possess both CAP_SYS_CHROOT privileges
>>> +and CAP_SYS_ADMIN rights over the target mount namespace.
>>
>> Could you please expand/clarify the preceding two lines. As they
>> stand, I don't really understand them.
>
> Ugh. The text is slightly wrong.
>
> The code is:
> if (!ns_capable(mnt_ns->user_ns, CAP_SYS_ADMIN) ||
> !nsown_capable(CAP_SYS_CHROOT) ||
> !nsown_capable(CAP_SYS_ADMIN))
> return -EPERM;
>
> Basically you aren't allowed change your mount namespace into
> a mount namespace that doesn't see you as the all powerful root
> able to mount and unmount filesystems.
>
> You aren't allowed to change your mount namespace unless you possesses
> CAP_SYS_CHROOT and CAP_SYS_ADMIN.
Okay -- reworded.
So, I've done some more reworking of the text, which now reads as
folows. Could you please check this (and see my questions below).
CLONE_NEWPID behaves somewhat differently from the other
nstype values: reassociating the calling thread with a
PID namespace only changes the PID namespace that child
processes of the caller will be created in; it does not
change the PID namespace of the caller itself.
I reworked the preceding piece a lot. Is it correct still?
Reassoci‐
ating with a PID namespace is only allowed if the PID
namespace specified by fd is a descendant (child, grand‐
child, etc.)
Is the preceding sentence correct? (You talked only of children in
your original patch, but I believe it's more general than that.)
PID namespace of the PID namespace of the
caller.
A multi-threaded process may not change user namespace
with setns(). A process may not reassociate the thread
with the caller's user namespace.
What does the last sentence above *mean*? I don't understand it.
A process reassociat‐
ing itself with a user namespace must have CAP_SYS_ADMIN
privileges in the target user namespace.
A process may not be reassociated with a new mount names‐
pace if it is multi-threaded. Changing the mount names‐
pace requires that the caller possess both CAP_SYS_CHROOT
and CAP_SYS_ADMIN capabilities.
Re the last sentence: are those capabilities required in (1) the
target namespace, or (2) the source namespace, or (3) both? I suspect
(1), but please confirm.
Thanks,
Michael
--
Michael Kerrisk
Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/
Author of "The Linux Programming Interface"; http://man7.org/tlpi/
More information about the Containers
mailing list