[PATCH 6/6] Makes procs file writable to move all threads by tgid at once

Matt Helsley matthltc at us.ibm.com
Tue Aug 4 14:40:38 PDT 2009


[ Cc'ing Rafael and linux-pm for more eyes on proposed freezer usage. ]

On Mon, Aug 03, 2009 at 12:55:33PM -0700, Paul Menage wrote:
> On Mon, Aug 3, 2009 at 12:45 PM, Serge E. Hallyn<serue at us.ibm.com> wrote:
> >
> > This is probably a stupid idea, but...  what about having zero
> > overhead at clone(), and instead, at cgroup_task_migrate(),
> > dequeue_task()ing all of the affected threads for the duration of
> > the migrate?
> 
> That doesn't sound too unreasonable, actually - it would certainly
> simplify things a fair bit. Is there a standard API for doing that?

I'm all for simplifying cgroup locking. I doubt anybody's against
it, given the "right" simplification.

I'm not sure if the freezer is actually the right thing to
use for this though. Perhaps CFS/scheduler folks could advise?

> dequeue_task() itself doesn't really look like a public API. I guess
> that the task freezer would be one way to accomplish this?
 
The freezer won't actually remove the task from the runqueue -- just
cause it to go into a schedule() loop until it's thawed.

[ Incidentally, sorry if this is a dumb question, but why don't frozen
tasks go onto a special wait queue rather than loop around schedule() ? 
At least for the cgroup freezer I can imagine keeping the wait queue
with the cgroup subsystem... ]

The freezer sends a fake signal to the task which will interrupt syscalls
and userspace to handle the signal. So all of the frozen tasks would be
looping around schedule() just inside the syscall entry layer "handling"
the fake signal until they are thawed.

This could interrupt a read of the cgroup pidlist for example.

I don't think it's 100% reliable -- vfork-ing tasks could delay freezing
the task "indefinitely" if the vfork'ing userspace tasks are 
clueless/malicious.

However the signaling code used there uses kick_process() which may be
needed for this idea.

So if I understand correctly it goes something like:

for each thread
	dequeue from runqueue onto ?what?
	kick thread (I think this should ensure that the thread is no longer
			"current" on any CPU since we dequeued..)

<seems we'd need something to ensure that the previous operations on each
thread have "completed" as far as all other cpus are concerned...>

for each thread
	cgroup migrate

for each thread
	enqueue back on runqueue from ?what? (is this still the right
						queue?)

Cheers,
	-Matt Helsley


More information about the Containers mailing list