[PATCH][cr]: Fix ghost task bug

Louis Rilling Louis.Rilling at kerlabs.com
Fri Mar 4 09:29:14 PST 2011


On 04/03/11 11:07 -0500, Oren Laadan wrote:
> On 03/03/2011 11:35 AM, Louis Rilling wrote:
> > On 03/03/11 10:38 -0500, Oren Laadan wrote:
> >> On 03/01/2011 10:31 AM, Louis Rilling wrote:
> >>> On 28/02/11 17:10 -0500, Oren Laadan wrote:
> >>>> So looking at the code again, we could add one condition in exit.c
> >>>> at wait_consider_task(), after the test of p->exit_state == EXIT_DEAD,
> >>>> to also test:
> >>>>
> >>>> inline static bool is_ghost_task(p)
> >>>> {
> >>>> 	return (p->flags & (PF_EXITING|PF_RESTARTING) ==
> >>>> 		PF_EXITING|PF_RESTARTING) && task_detached(p)
> >>>> }
> >>>>
> >>>> 	if (p->flags & is_ghost_task(p))
> >>>> 		return 0;
> >>>>
> >>>> Or something along the lines (e.g. used EXIT_ZOMBIE comparison instead
> >>>> of PF_EXITING). While requiring a kernel patch, it is relatively short,
> >>>> clean and easy to review.
> > 
> > EXIT_ZOMBIE comparison would not optimize much imho, since p->flags must be
> > checked anyway.
> > 
> > Nit1: I don't think that checking p->flags saves anything before calling
> > is_ghost_task().
> 
> Hmm.. right -
> That's a leftover from before I decided to introduce is_ghost_task()
> 
> > 
> > Nit2: why would you like to check that PF_EXITING and PF_RESTARTING come
> > together? Is it to make sure that no "real" restarted thread will be skipped
> > this way?
> 
> If wait() is called to get the state of stopped children, and for
> whatever reason the ghost is stopped or being ptraced (we should
> probably prevent that... but ok) - testing for the exiting/zombie 
> condition is an extra safety measure: only skip this task when it
> is actually exiting.

I don't see how a ghost task could be stopped or ptraced, since it calls
do_exit() right after becoming detached, and thus identifiable as a ghost.
Unless it gets ptraced right before calling sys_restart()? Even in that case,
it's not reapable by ptrace since it's not in stopped state. OTOH, it may still
be reaped in wait_task_continued() (see below).

> 
> Do you not think it's needed ?

Not sure. As far as I can see, other restarting (with PF_RESTARTING) and
detached tasks can only be sub-threads, and are mostly not reapable in any way
as long as PF_RESTARTING is set. They can surely be reaped neither by
wait_task_zombie(), nor by wait_task_stopped(). The only possibility I see is by
wait_task_continued(), because a previous "wakeup from stopped" has not been
consumed before the checkpoint.

But, and I think that this is a good reason to check PF_EXITING (or
->exit_state), if threads are skipped this way, then wait() might incorrectly
return -ECHILD instead of sleeping.

Wait. Even with this, after ->exit_signal is set to -1, and before PF_EXITING is
set, wait_consider_task() can still consider the ghost as potentially reapable
in the future. Deadlock again.

In fact, it's probably much saner to have something atomic, like:

	write_lock_irq(&tasklist_lock);
	p->flags |= PF_EXITING;
	p->exit_signal = -1;
	__wake_up_parent(p, p->parent);
	write_unlock_irq(&tasklist_lock);

Unfortunately this is not accepted by do_exit(). So two kinds of solutions:
either set a new flag à la PF_RESTART_GHOST, and only check for this flag in
wait_consider_task(),
or somewhere in do_exit() (latest in exit_notify()), have
another mean to recognize ghost tasks, and do the ->exit_signal = -1 +
__wake_up_parent() there.

What's your opinion?

Thanks,

Louis

-- 
Dr Louis Rilling			Kerlabs
Skype: louis.rilling			Batiment Germanium
Phone: (+33|0) 6 80 89 08 23		80 avenue des Buttes de Coesmes
http://www.kerlabs.com/			35700 Rennes
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 197 bytes
Desc: Digital signature
Url : http://lists.linux-foundation.org/pipermail/containers/attachments/20110304/adf2c066/attachment.pgp 


More information about the Containers mailing list