[PATCHv1 7/8] cgroup: cgroup namespace setns support

Andy Lutomirski luto at amacapital.net
Sun Oct 19 18:26:29 UTC 2014


On Sat, Oct 18, 2014 at 10:23 PM, Eric W. Biederman
<ebiederm at xmission.com> wrote:
> "Serge E. Hallyn" <serge at hallyn.com> writes:
>
>> Quoting Aditya Kali (adityakali at google.com):
>>> On Thu, Oct 16, 2014 at 2:12 PM, Serge E. Hallyn <serge at hallyn.com> wrote:
>>> > Quoting Aditya Kali (adityakali at google.com):
>>> >> setns on a cgroup namespace is allowed only if
>>> >> * task has CAP_SYS_ADMIN in its current user-namespace and
>>> >>   over the user-namespace associated with target cgroupns.
>>> >> * task's current cgroup is descendent of the target cgroupns-root
>>> >>   cgroup.
>>> >
>>> > What is the point of this?
>>> >
>>> > If I'm a user logged into
>>> > /lxc/c1/user.slice/user-1000.slice/session-c12.scope and I start
>>> > a container which is in
>>> > /lxc/c1/user.slice/user-1000.slice/session-c12.scope/x1
>>> > then I will want to be able to enter the container's cgroup.
>>> > The container's cgroup root is under my own (satisfying the
>>> > below condition0 but my cgroup is not a descendent of the
>>> > container's cgroup.
>>> >
>>> This condition is there because we don't want to do implicit cgroup
>>> changes when a process attaches to another cgroupns. cgroupns tries to
>>> preserve the invariant that at any point, your current cgroup is
>>> always under the cgroupns-root of your cgroup namespace. But in your
>>> example, if we allow a process in "session-c12.scope" container to
>>> attach to cgroupns root'ed at "session-c12.scope/x1" container
>>> (without implicitly moving its cgroup), then this invariant won't
>>> hold.
>>
>> Oh, I see.  Guess that should be workable.  Thanks.
>
> Which has me looking at what the rules are for moving through
> the cgroup hierarchy.
>
> As long as we have write access to cgroup.procs and are allowed
> to open the file for write, we can move any of our own tasks
> into the cgroup.  So the cgroup namespace rules don't seem
> to be a problem.
>
> Andy can you please take a look at the permission checks in
> __cgroup_procs_write.

The actual requirements for calling that function haven't changed,
right?  IOW, what does this have to do with cgroupns?  Is the idea
that you want a privileged user wrt a cgroupns's userns to be able to
use this?  If so:

Yes, that current_cred() thing is bogus.  (Actually, this is probably
exploitable right now if any cgroup.procs inode anywhere on the system
lets non-root write.)  (Can we have some kernel debugging option that
makes any use of current_cred() in write(2) warn?)

We really need a weaker version of may_ptrace for this kind of stuff.
Maybe the existing may_ptrace stuff is okay, actually.  But this is
completely missing group checks, cap checks, capabilities wrt the
userns, etc.

Also, I think that, if this version of the patchset allows non-init
userns to unshare cgroupns, then the issue of what permission is
needed to lock the cgroup hierarchy like that needs to be addressed,
because unshare(CLONE_NEWUSER|CLONE_NEWCGROUP) will effectively pin
the calling task with no permission required.  Bolting on a fix later
will be a mess.

--Andy

>
> As I read the code I see 3 security gaffaws in the permssion check.
> - Using current->cred instead of file->f_cred.
> - Not checking tcred->euid.
> - Checking GLOBAL_ROOT_UID instead of having a capable call.
>
> The file permission on cgroup.procs seem just sufficient to keep
> to keep those bugs from being easily exploitable.
>
> Eric



-- 
Andy Lutomirski
AMA Capital Management, LLC


More information about the Containers mailing list