[PATCH 0/3][V2] remove the ns_cgroup

Serge E. Hallyn serge.hallyn at canonical.com
Mon Sep 27 13:36:58 PDT 2010

Quoting Andrew Morton (akpm at linux-foundation.org):
> On Mon, 27 Sep 2010 12:14:10 +0200
> Daniel Lezcano <daniel.lezcano at free.fr> wrote:
> > The ns_cgroup is a control group interacting with the namespaces.
> > When a new namespace is created, a corresponding cgroup is 
> > automatically created too. The cgroup name is the pid of the process
> > who did 'unshare' or the child of 'clone'.
> > 
> > This cgroup is tied with the namespace because it prevents a
> > process to escape the control group and use the post_clone callback,
> > so the child cgroup inherits the values of the parent cgroup.
> > 
> > Unfortunately, the more we use this cgroup and the more we are facing
> > problems with it:
> > 
> >  (1) when a process unshares, the cgroup name may conflict with a previous
> >  cgroup with the same pid, so unshare or clone return -EEXIST
> > 
> >  (2) the cgroup creation is out of control because there may have an
> >  application creating several namespaces where the system will automatically
> >  create several cgroups in his back and let them on the cgroupfs (eg. a vrf
> >  based on the network namespace).
> > 
> >  (3) the mix of (1) and (2) force an administrator to regularly check and
> >  clean these cgroups.
> > 
> > This patchset removes the ns_cgroup by adding a new flag to the cgroup
> > and the cgroupfs mount option. It enables the copy of the parent cgroup
> > when a child cgroup is created. We can then safely remove the ns_cgroup as
> > this flag brings a compatibility. We have now to manually create and add the
> > task to a cgroup, which is consistent with the cgroup framework.
> So this is a non-backward-compatible userspace-visible change?

Yes, it is.

Patch 1 is needed to let lxc and libvirt both control containers with
same cgroup setup.  Patch 3 however isn't *necessary* for that.  Daniel,
what do you think about holding off on patch 3?

> What are the implications of this?

The ns cgroup does 2 things which no other cgroup does:  (1) it
moves tasks into a child cgroup any time they unshare or clone
a namespace.  And (2) it prevents them from moving up to a parent
cgroup.  The latter in particular makes it the only way, without
using an LSM, of locking root into a cgroup, until user namespaces
are further developed (*).


(*) - Maybe something to add to that new kernel todo list

More information about the Containers mailing list