[RFC PATCH 1/2] devices cgroup: allow can_attach() if ns_capable

Tue Jul 23 19:28:01 UTC 2013

Quoting Tejun Heo (tj at kernel.org):
> Hello,
> 
> On Tue, Jul 23, 2013 at 02:04:26PM -0500, Serge Hallyn wrote:
> > If task A is uid 1000 on the host, and creates task B as uid X in a new
> > user namespace, then task A, still being uid 1000 on the host, is
> > privileged with respect to B and his namespace - i.e.
> > ns_capable(B->userns, CAP_SYS_ADMIN) is true.
> 
> Well, that also is the exact type of priv delegation we're moving away
> from, so....

I think that's unreasonable, but I guess I'll have to go reread the
old thread.

> > > Besides, I find the whole check rather bogus and would actually much
> > > prefer just nuking the check and just follow the standard permission
> > > checks.
> > 
> > I'd be ok with that - but there's one case I'm not sure about:  If PAM
> > sets me up with /sys/fs/cgroup/devices/serge owned by me, then if I'm
> > thinking right, removing can_attach would mean I could move init into
> > /sys/fs/cgroup/devices/serge...
> > 
> > Is there something else stopping that from happening?
> 
> If PAM is giving out perms on cgroup directory, the whole system is
> prone to DoS in various ways anyway.  It's already utterly broken, so

If we have decent enforcement of hierarchy for devices.{allow,deny},
which we now do, then I don't see why this has to be the case.

> kinda moot point.  If there are people actually doing that in the
> wild, we can conditionalize it on cgroup_sane_behavior().

Guess we'll stop using cgroups for now.

-serge