cgroup attach/fork hooks consistency with the ns_cgroup

Wed Jun 17 14:26:14 PDT 2009

Quoting Daniel Lezcano (daniel.lezcano at free.fr):
> Hi,
>
> I noticed two different behaviours, the second one looks weird for me:
>
>  1) when the cgroup is manually created:
> 	mkdir /cgroup/foo
> 	echo $$ > /cgroup/foo/tasks
>
>  only the "attach" callback is called as expected.
>
>  2) when the cgroup is automatically created via the ns_cgroup with the  
> clone function and the namespace flags,
>
>   the "attach" *and* the "fork" callbacks are called.
>
>
> IMHO, these two different behaviours look inconsistent. Won't this lead  
> to some problems or a specific code to handle both cases if a cgroup is  
> using the fork and the attach hooks ?
>
> For example, let's imagine we create a control group which shows the  
> number of tasks running. We have a global atomic and we display its  
> value in the cgroupfs.
>
> When a task attaches to the cgroup, we do atomic_inc in the attach  
> callback. For all its child, the fork hook will do atomic_inc and exit  
> hook will do atomic_dec.
>
> If we create the cgroup manually like the case 1) that works. But if we  
> use the control group with the ns_cgroup the task counter will be set to  
> 2 for the first tasks entering the cgroup because the attach callback  
> will increment the counter and the fork callback will increment it again.
>
> In attachment a source code to illustrate the example.
>
> Shouldn't the ns_cgroup_clone be called after the cgroup_fork_callbacks  
> in copy_process function ? So we don't call the fork callback for the  
> first tasks and we keep the consistency ?

The ns cgroup is really only good for preventing root in a container
from escaping its cgroup-imposed limits.  The same can be done today
using smack or selinux, and eventually will be possible using user
namespaces.  Would anyone object to removing ns_cgroup?

It won't just remove kernel/ns_cgroup.c, but some subtle code in
fork.c, nsproxy.c, and of course cgroup.c as well.

There admittedly is minute convenience gain in not having to
manually create a new cgroup and attach a cloned child to it, but
that wasn't the intent of the cgroup.

-serge