[PATCH] cgroup(fix critical bug): new handling for tasks file

Mon Aug 25 19:29:27 PDT 2008

On Mon, Aug 25, 2008 at 6:47 PM, Lai Jiangshan <laijs at cn.fujitsu.com> wrote:
> In original code, we kmalloc a lot of memory when we open cgroup.tasks.
> _This memory is allocated by kmalloc_, So this memory is not statistical
> for this process or this user. This is a critical bug.

It's true that this does represent a way for an untrusted user to
consume a big chunk of memory on a machine. It's been around since
cpusets was introduced. I'm not sure that it's more serious than say,
a fork bomb or allocating lots of memory via lots of sockets with huge
network buffers. I don't see how this can cause an oops - slab.c and
slub.c both appear to just return NULL in that case, from a quick
inspection.

But improving the current code has been on my todo list for a while.
My thought on the best way to fix it is to use a proper seq_file -
that would remove the issues that you have currently with the
"unconsumed" portion of the output.

Using a seq_file iterator with your single pid array per cgroup might
work quite nicely - but isn't there still a problem that with a really
large cgroup we can overflow the kmalloc() limit for a single pid
array?

My original thoughts for solving this involved using a prio_heap to
ensure that every task in the cgroup was seen once:

1) similar to cgroup_scan_tasks, fill a prio_heap with the first N (by
task start time) tasks in the cgroup
2) sort these tasks by pid
3) report these by seq_file
4) once all the pids in the heap have been reported, update the
minimum task start time to be greater than the last task start time in
the reported group, and goto 1

This way we never have to allocate more than a single prio_heap per
open file; the downside is that we have more overhead when reporting
on really large cgroups since we have to make multiple scans of the
cgroup's task list.

Paul