[PATCHv1 7/8] cgroup: cgroup namespace setns support

Eric W.Biederman ebiederm at xmission.com
Mon Oct 20 04:55:50 UTC 2014



On October 19, 2014 1:26:29 PM CDT, Andy Lutomirski <luto at amacapital.net> wrote:
>On Sat, Oct 18, 2014 at 10:23 PM, Eric W. Biederman
><ebiederm at xmission.com> wrote:
>> "Serge E. Hallyn" <serge at hallyn.com> writes:
>>
>>> Quoting Aditya Kali (adityakali at google.com):
>>>> On Thu, Oct 16, 2014 at 2:12 PM, Serge E. Hallyn <serge at hallyn.com>
>wrote:
>>>> > Quoting Aditya Kali (adityakali at google.com):
>>>> >> setns on a cgroup namespace is allowed only if
>>>> >> * task has CAP_SYS_ADMIN in its current user-namespace and
>>>> >>   over the user-namespace associated with target cgroupns.
>>>> >> * task's current cgroup is descendent of the target
>cgroupns-root
>>>> >>   cgroup.
>>>> >
>>>> > What is the point of this?
>>>> >
>>>> > If I'm a user logged into
>>>> > /lxc/c1/user.slice/user-1000.slice/session-c12.scope and I start
>>>> > a container which is in
>>>> > /lxc/c1/user.slice/user-1000.slice/session-c12.scope/x1
>>>> > then I will want to be able to enter the container's cgroup.
>>>> > The container's cgroup root is under my own (satisfying the
>>>> > below condition0 but my cgroup is not a descendent of the
>>>> > container's cgroup.
>>>> >
>>>> This condition is there because we don't want to do implicit cgroup
>>>> changes when a process attaches to another cgroupns. cgroupns tries
>to
>>>> preserve the invariant that at any point, your current cgroup is
>>>> always under the cgroupns-root of your cgroup namespace. But in
>your
>>>> example, if we allow a process in "session-c12.scope" container to
>>>> attach to cgroupns root'ed at "session-c12.scope/x1" container
>>>> (without implicitly moving its cgroup), then this invariant won't
>>>> hold.
>>>
>>> Oh, I see.  Guess that should be workable.  Thanks.
>>
>> Which has me looking at what the rules are for moving through
>> the cgroup hierarchy.
>>
>> As long as we have write access to cgroup.procs and are allowed
>> to open the file for write, we can move any of our own tasks
>> into the cgroup.  So the cgroup namespace rules don't seem
>> to be a problem.
>>
>> Andy can you please take a look at the permission checks in
>> __cgroup_procs_write.
>
>The actual requirements for calling that function haven't changed,
>right?  IOW, what does this have to do with cgroupns?

Excluding user namespaces the requirements have not changed.  

The immediate correlation is that to enter a cgroupns you must first put your process in one of it's cgroups.

So I was examining what it would take to enter the cgroup of cgroupns.

> Is the idea
>that you want a privileged user wrt a cgroupns's userns to be able to
>use this?  If so:
>
>Yes, that current_cred() thing is bogus.  (Actually, this is probably
>exploitable right now if any cgroup.procs inode anywhere on the system
>lets non-root write.)  (Can we have some kernel debugging option that
>makes any use of current_cred() in write(2) warn?)
>
>We really need a weaker version of may_ptrace for this kind of stuff.
>Maybe the existing may_ptrace stuff is okay, actually.  But this is
>completely missing group checks, cap checks, capabilities wrt the
>userns, etc.
>
>Also, I think that, if this version of the patchset allows non-init
>userns to unshare cgroupns, then the issue of what permission is
>needed to lock the cgroup hierarchy like that needs to be addressed,
>because unshare(CLONE_NEWUSER|CLONE_NEWCGROUP) will effectively pin
>the calling task with no permission required.  Bolting on a fix later
>will be a mess.

I imagine the pinning would be like the userns.

Ah but there is a potentially serious issue with the pinning.
With pinning we can make it impossible for root to move us to a different cgroup. 

I am not certain how serious that is but it bears thinking about.
If we don't implement pinning we should be able to implent everything with just filesystem mount options, and no new namespace required.

Sigh.

I am too tired tonight to see the end game in this.

Eric
>> As I read the code I see 3 security gaffaws in the permssion check.
>> - Using current->cred instead of file->f_cred.
>> - Not checking tcred->euid.
>> - Checking GLOBAL_ROOT_UID instead of having a capable call.
>>
>> The file permission on cgroup.procs seem just sufficient to keep
>> to keep those bugs from being easily exploitable.
>>
>> Eric



More information about the Containers mailing list