[PATCHv1 7/8] cgroup: cgroup namespace setns support
luto at amacapital.net
Sun Oct 19 18:26:29 UTC 2014
On Sat, Oct 18, 2014 at 10:23 PM, Eric W. Biederman
<ebiederm at xmission.com> wrote:
> "Serge E. Hallyn" <serge at hallyn.com> writes:
>> Quoting Aditya Kali (adityakali at google.com):
>>> On Thu, Oct 16, 2014 at 2:12 PM, Serge E. Hallyn <serge at hallyn.com> wrote:
>>> > Quoting Aditya Kali (adityakali at google.com):
>>> >> setns on a cgroup namespace is allowed only if
>>> >> * task has CAP_SYS_ADMIN in its current user-namespace and
>>> >> over the user-namespace associated with target cgroupns.
>>> >> * task's current cgroup is descendent of the target cgroupns-root
>>> >> cgroup.
>>> > What is the point of this?
>>> > If I'm a user logged into
>>> > /lxc/c1/user.slice/user-1000.slice/session-c12.scope and I start
>>> > a container which is in
>>> > /lxc/c1/user.slice/user-1000.slice/session-c12.scope/x1
>>> > then I will want to be able to enter the container's cgroup.
>>> > The container's cgroup root is under my own (satisfying the
>>> > below condition0 but my cgroup is not a descendent of the
>>> > container's cgroup.
>>> This condition is there because we don't want to do implicit cgroup
>>> changes when a process attaches to another cgroupns. cgroupns tries to
>>> preserve the invariant that at any point, your current cgroup is
>>> always under the cgroupns-root of your cgroup namespace. But in your
>>> example, if we allow a process in "session-c12.scope" container to
>>> attach to cgroupns root'ed at "session-c12.scope/x1" container
>>> (without implicitly moving its cgroup), then this invariant won't
>> Oh, I see. Guess that should be workable. Thanks.
> Which has me looking at what the rules are for moving through
> the cgroup hierarchy.
> As long as we have write access to cgroup.procs and are allowed
> to open the file for write, we can move any of our own tasks
> into the cgroup. So the cgroup namespace rules don't seem
> to be a problem.
> Andy can you please take a look at the permission checks in
The actual requirements for calling that function haven't changed,
right? IOW, what does this have to do with cgroupns? Is the idea
that you want a privileged user wrt a cgroupns's userns to be able to
use this? If so:
Yes, that current_cred() thing is bogus. (Actually, this is probably
exploitable right now if any cgroup.procs inode anywhere on the system
lets non-root write.) (Can we have some kernel debugging option that
makes any use of current_cred() in write(2) warn?)
We really need a weaker version of may_ptrace for this kind of stuff.
Maybe the existing may_ptrace stuff is okay, actually. But this is
completely missing group checks, cap checks, capabilities wrt the
Also, I think that, if this version of the patchset allows non-init
userns to unshare cgroupns, then the issue of what permission is
needed to lock the cgroup hierarchy like that needs to be addressed,
because unshare(CLONE_NEWUSER|CLONE_NEWCGROUP) will effectively pin
the calling task with no permission required. Bolting on a fix later
will be a mess.
> As I read the code I see 3 security gaffaws in the permssion check.
> - Using current->cred instead of file->f_cred.
> - Not checking tcred->euid.
> - Checking GLOBAL_ROOT_UID instead of having a capable call.
> The file permission on cgroup.procs seem just sufficient to keep
> to keep those bugs from being easily exploitable.
AMA Capital Management, LLC
More information about the Containers