[PATCH 00/10] cgroups: Task counter subsystem v6

Fri Nov 4 13:17:26 UTC 2011

On 11/03/2011 03:56 PM, Paul Menage wrote:
> On Thu, Nov 3, 2011 at 10:35 AM, Glauber Costa<glommer at parallels.com>  wrote:
>>
>>> If multiple subsystems on the same hierarchy each need to
>>> walk up the pointer chain on the same event, then after the first
>>> subsystem has done so the chain will be in cache for any subsequent
>>> walks from other subsystems.
>>
>> No, it won't. Precisely because different subsystems have completely
>> independent pointer chains.
>
> Because they're following res_counter parent pointers, etc, rather
> than using the single cgroups parent pointer chain?

No. Because:

/sys/fs/cgroup/my_subsys/
/sys/fs/cgroup/my_subsys/foo1
/sys/fs/cgroup/my_subsys/foo2
/sys/fs/cgroup/my_subsys/foo1/bar1

and:

/sys/fs/cgroup/my_subsys2/
/sys/fs/cgroup/my_subsys2/foo1
/sys/fs/cgroup/my_subsys2/foo1/bar1
/sys/fs/cgroup/my_subsys2/foo1/bar2

Are completely independent pointer chains. the only thing they share is 
the pointer to the root. And that's irrelevant in the pointer dance.
Also note that I used cpu and cpuacct as an example, and they don't use 
res_counters.

> So if that's the problem, rather than artificially constrain
> flexibility in order to improve micro-benchmarks, why not come up with
> approaches that keep both the flexibility and the performance?

Well, I am not opposed to that even if you happen to agree on what I 
said above. But in the end of the day, with many cgroups appearing, it
may not be about just micro benchmarks.

It is hard to draw the line, but I believe that avoiding creating new 
cgroups subsystems when possible plays in our favor.

Specifically for this one, my arguments are:

* cgroups are a task-grouping entity
* therefore, all cgroups already do some task manipulation in attach/dettach
* all cgroups subsystem already can register a fork handler

Adding a fork limit as a cgroup property seems a logical step to me 
based on that.

If, however, we are really creating this, I think we'd be better of 
referring to this as a "Task Controller" rather than a "Task Counter".

Then at least in the near future when people start trying to limit other 
task-related resources, this can serve as a natural placeholder for 
this. (See the syscall limiting that Lukasz is trying to achieve)

>
> - make res_counter hierarchies be explicitly defined via the cgroup
> parent pointers, rather than an parent pointer hidden inside the
> res_counter. So the cgroup parent chain traversal would all be along
> the common parent pointers (and res_counter would be one pointer
> smaller).
 >
>
> - allow subsystems to specify that they need a small amount of data
> that can be accessed efficiently up the cgroup chain. (Many subsystems
> wouldn't need this, and those that do would likely only need it for a
> subset of their per-cgroup data). Pack this data into as few
> cachelines as possible, allocated as a single lump of memory per
> cgroup. Each subsystem would know where in that allocation its private
> data lay (it would be the same offset for every cgroup, although
> dynamically determined at runtime based on the number of subsystems
> mounted on that hierarchy)
I thought about this second one myself.
I am not yet convinced this would be a win, but I believe there are chances.