[PATCH 5/5] cgroup: introduce cgroup namespaces

Fri Jul 18 18:57:22 UTC 2014

On Fri, Jul 18, 2014 at 11:51 AM, Aditya Kali <adityakali at google.com> wrote:
> On Fri, Jul 18, 2014 at 9:51 AM, Andy Lutomirski <luto at amacapital.net> wrote:
>> On Jul 17, 2014 1:56 PM, "Aditya Kali" <adityakali at google.com> wrote:
>>>
>>> On Thu, Jul 17, 2014 at 12:57 PM, Andy Lutomirski <luto at amacapital.net> wrote:
>>> > What happens if someone moves a task in a cgroup namespace outside of
>>> > the namespace root cgroup?
>>> >
>>>
>>> Attempt to move a task outside of cgroupns root will fail with EPERM.
>>> This is true irrespective of the privileges of the process attempting
>>> this. Once cgroupns is created, the task will be confined to the
>>> cgroup hierarchy under its cgroupns root until it dies.
>>
>> Can a task in a non-init userns create a cgroupns?  If not, that's
>> unusual.  If so, is it problematic if they can prevent themselves from
>> being moved?
>>
>
> Currently, only a task with CAP_SYS_ADMIN in the init-userns can
> create cgroupns. It is stricter than for other namespaces, yes.

I'm slightly hesitant to have unshare(CLONE_NEWUSER |
CLONE_NEWCGROUPNS | ...) start having weird side effects that are
visible outside the namespace, especially when those side effects
don't happen (because the call fails entirely) if
unshare(CLONE_NEWUSER) happens first.  I don't see a real problem with
it, but it's weird.

>
>> I hate to say it, but it might be worth requiring explicit permission
>> from the cgroup manager for this.  For example, there could be a new
>> cgroup attribute may_unshare, and any attempt to unshare the cgroup ns
>> will fail with -EPERM unless the caller is in a may_share=1 cgroup.
>> may_unshare in a parent cgroup would not give child cgroups the
>> ability to unshare.
>>
>
> What you suggest can be done. The current patch-set punts the problem
> of permission checking by only allowing unshare from a
> capable(CAP_SYS_ADMIN) process. This can be implemented as a follow-up
> improvement to cgroupns feature if we want to open it to non-init
> userns.
>
> Being said that, I would argue that even if we don't have this
> explicit permission and relax the check to non-init userns, it should
> be 'OK' to let ns_capable(current_user_ns(), CAP_SYS_ADMIN) tasks to
> unshare cgroupns (basically, if you can "create" a cgroup hierarchy,
> you should probably be allowed to unshare() it).

But non-init-userns tasks can't create cgroup hierarchies, unless I
misunderstand the current code.  And, if they can, I bet I can find
three or four serious security issues in an hour or two. :)

> By unsharing
> cgroupns, the tasks can only confine themselves further under its
> cgroupns-root. As long as it cannot escape that hierarchy, it should
> be fine.

But they can also *lock* their hierarchy.

> In my experience, there is seldom a need to move tasks out of their
> cgroup. At most, we create a sub-cgroup and move the task there (which
> is allowed in their cgroupns). Even for a cgroup manager, I can't
> think of a case where it will be useful to move a task from one cgroup
> hierarchy to another. Such move seems overly complicated (even without
> cgroup namespaces). The cgroup manager can just modify the settings of
> the task's cgroup as needed or simply kill & restart the task in a new
> container.
>

I do this all the time.  Maybe my new systemd overlords will make me
stop doing it, at which point my current production setup will blow
up.

--Andy