[PATCH 1/2] Adds a read-only "procs" file similar to "tasks" that shows only unique tgids

KAMEZAWA Hiroyuki kamezawa.hiroyu at jp.fujitsu.com
Thu Jul 2 22:54:41 PDT 2009


On Thu, 2 Jul 2009 18:30:04 -0700
Andrew Morton <akpm at linux-foundation.org> wrote:

> On Thu, 2 Jul 2009 18:08:29 -0700 Paul Menage <menage at google.com> wrote:
> 
> > On Thu, Jul 2, 2009 at 5:53 PM, Andrew Morton<akpm at linux-foundation.org> wrote:
> > >> In the first snippet, count will be at most equal to length. As length
> > >> is determined from cgroup_task_count, it can be no greater than the
> > >> total number of pids on the system.
> > >
> > > Well that's a problem, because there can be tens or hundreds of
> > > thousands of pids, and there's a fairly low maximum size for kmalloc()s
> > > (include/linux/kmalloc_sizes.h).
> > >
> > > And even if this allocation attempt doesn't exceed KMALLOC_MAX_SIZE,
> > > large allocations are less unreliable. __There is a large break point at
> > > 8*PAGE_SIZE (PAGE_ALLOC_COSTLY_ORDER).
> > 
> > This has been a long-standing problem with the tasks file, ever since
> > the cpusets days.
> > 
> > There are ways around it - Lai Jiangshan <laijs at cn.fujitsu.com> posted
> > a patch that allocated an array of pages to store pids in, with a
> > custom sorting function that let you specify indirection rather than
> > assuming everything was in one contiguous array. This was technically
> > the right approach in terms of not needing vmalloc and never doing
> > large allocations, but it was very complex; an alternative that was
> > mooted was to use kmalloc for small cgroups and vmalloc for large
> > ones, so the vmalloc penalty wouldn't be paid generally. The thread
> > fizzled AFAICS.
> 
> It's a problem which occurs fairly regularly.  Some sites are fairly
> busted.  Many gave up and used vmalloc().  Others use an open-coded
> array-of-pages thing.
> 
> This happens enough that I expect the kernel would benefit from a
> general dynamic-array library facility.  Something whose interface
> mimics the C-level array operations but which is internally implemented
> via some data structure which uses PAGE_SIZE allocations.  Probably a
> simple two-level thing would suffice.
> 
I think both of kmalloc usage here are very bad.

Why we can't do what readdir(/proc) does ? I'm sorry I misunderstand.
Following is an easy example.


0. at open, inilialize f_pos to 0. f_pos is used as "pid"
   remember "css_set with hole" as template in f_private?(or somewhere) at open
   ...like this.
--
   struct cgroupfs_root *root = cgrp->root;
   struct cgroup *template = kzalloc(sizeof(void*) * CGROUP_SUBSYS_COUNT);

   for (i = 0; i < CGROUP_SUBSYS_COUNT; i++)
	if (root->subsys_bits & (1UL << i))
		template[i] =  cgrp->subsys[i];
--


1. at read(), find task_struct of "pid" in f_pos.
2. look up task_struct of "pid" and compare with f_private
--
   struct cgroup *template = f_private;

   for (i = 0; i < CGROUP_SUBSYS_COUNT; i++) {
	if (!template[i])
		contiue;
	if (template[i] != task_subsys_state(task, i))
		break;
   }
   if (i == CGROUP_SUBSYS_COUNT)
	print task;
 --
   
4. f_pos++ until filling seq_buffer.


Thanks,
-Kame







More information about the Containers mailing list