[GIT PULL] user namespace and namespace infrastructure changes for 3.8

Eric W. Biederman ebiederm at xmission.com
Thu Dec 13 22:01:41 UTC 2012


Andy Lutomirski <luto at amacapital.net> writes:

> On 12/11/2012 01:17 PM, Eric W. Biederman wrote:
>> 
>> Linus,
>> 
>> Please pull the for-linus git tree from:
>> 
>>    git://git.kernel.org:/pub/scm/linux/kernel/git/ebiederm/user-namespace.git for-linus
>> 
>>    HEAD: 98f842e675f96ffac96e6c50315790912b2812be proc: Usable inode numbers for the namespace file descriptors.
>> 
>>    This tree is against v3.7-rc3
>
> You've just allowed unprivileged users to create new pid namespaces,
> etc, by creating a new userns, then creating a new pid namespace inside
> that userns, then setns-ing from outside the userns into the pid ns.  Is
> this intentional?  (The mount ns is okay -- it checks for CAP_CHROOT on
> setns.)

Absolutely.  My commit messages talk about this.  I allow creating other
namespaces once inside a user namespace deliberately.  There is no
reason I know of to ban creation of pid and other namespaces once you
are inside of a user namespace.

But please also note the difference between capable and ns_capable.  Any
security check that is capable() still requires priviliges in the
initial user namespace.

> In user_namespace.c:
>
>         /* Threaded many not enter a different user namespace */
>         if (atomic_read(&current->mm->mm_users) > 1)
>                 return -EINVAL;
>
> The comment has a typo.  Also, you're checking the wrong condition:
> that's whether the vm is shared, not whether the thread group has more
> than one member.

Yes the comment should say.

Threaded processes may not enter a different user namespace.

As for the condition.  mm_users will equal one for a non-threaded
process.  And mm_users is the check we use in unshare to detect if
a threaded process calls unshare so I think the check seems perfectly
reasonable.  Especially since the vm must have more than one member if
there is more than one member in the thread group.

> In any case, why are threads special here?

You know I don't think I stopped to think about it.   The combination
of CLONE_NEWUSER and CLONE_THREAD have been denined since the first user
namespace support was merged in 2008.

I do know that things can get really strange when you mix multiple
namespaces in a process.  tkill of your own threads will stop working.
Which access permissions should apply to files you mmap, file handles
you have open, the core dumper etc.

We do allow setresuid per thread so we might be able to cope
with a process that mixes with user namespaces in different threads,
but I would want a close review of things before we allow that kind of
sharing.

> I think, although I haven't verified it, that these changes allow
> CAP_SYS_ADMIN to bypass the bounding set (and, in particular, to gain
> CAP_MODULE): unshare the user namespace and then setfd yourself back.  I
> think that setns should only grant caps when changing to a descendent
> namespace.

(See the end.  A significant bug in cap_capable slipped in about
 3.5. cap_capable is only supposed to grant permissions to the owner
 of a user namespace if it is a child user namespace).

These changes do not allow CAP_SYS_ADMIN to bypass the bounding set.

The test:

	if (!ns_capable(user_ns, CAP_SYS_ADMIN))
		return -EPERM;

verifies that the user namespace we are entering is a nested user
namespace, because we can only have CAP_SYS_ADMIN in our current
namespace and in nested user namespaces.

> Also in userns_install:
>
> 796         /* Don't allow gaining capabilities by reentering
> 797          * the same user namespace.
> 798          */
> 799         if (user_ns == current_user_ns())
> 800                 return -EINVAL;
>
> Why?

To keep processes that deliberately drop some capabilities from being
able to gain those capabilities back by reentering the current user
namespace.

Aka that test plus the ns_capable test prevent are the combination
that prevent a process gaining privileges in the current user namespace.

> You can trivially bypass this by creating a temporary user ns.
> (If you're the owner of your own ns, then you can create a subsidiary
> ns, map yourself into it, then setns back -- you'll still be the
> owner.)

Nope.   Once you have capabilities in a user namespace you do not have
any capabilities in the parent user namespace.  Entering a user
namespace is a one way operation.

> unshare has a bug.  This code:

Interesting...

Looking at it this is a very small misfeature.

What is happening is that commit_creds is setting is making the task
undumpable because we changed the set of capabilities in struct cred.

This in turn results in pid_revalidate setting the owner of
of /proc/self/uid_map to GLOBAL_ROOT_UID.



More information about the Containers mailing list