[PATCH][cr]: Fix ghost task bug

Louis Rilling Louis.Rilling at kerlabs.com
Thu Mar 3 08:35:31 PST 2011


On 03/03/11 10:38 -0500, Oren Laadan wrote:
> 
> 
> On 03/01/2011 10:31 AM, Louis Rilling wrote:
> > On 28/02/11 17:10 -0500, Oren Laadan wrote:
> >>
> >>
> >> On 02/28/2011 10:09 AM, Louis Rilling wrote:
> >>> On 28/02/11  9:43 -0500, Oren Laadan wrote:
> >>>>
> >>>>
> >>>> On 02/26/2011 08:54 AM, Louis Rilling wrote:
> >>>>> On Fri, Feb 25, 2011 at 11:01:32AM -0800, Sukadev Bhattiprolu wrote:
> >>>>>> Louis Rilling [Louis.Rilling at kerlabs.com] wrote:
> >>>>>> | On 24/02/11 23:55 -0800, Sukadev Bhattiprolu wrote:
> >>>>>> | > 
> >>>>>> | > diff --git a/kernel/checkpoint/restart.c b/kernel/checkpoint/restart.c
> >>>>>> | > index b0ea8ec..8ecc052 100644
> >>>>>> | > --- a/kernel/checkpoint/restart.c
> >>>>>> | > +++ b/kernel/checkpoint/restart.c
> >>>>>> | > @@ -972,6 +972,7 @@ static int do_ghost_task(void)
> >>>>>> | >  	if (ret < 0)
> >>>>>> | >  		ckpt_err(ctx, ret, "ghost restart failed\n");
> >>>>>> | >  
> >>>>>> | > +	current->exit_signal = -1;
> >>>>>> | 
> >>>>>> | Setting ->exit_signal outside of tasklist_lock makes me nervous. All other
> >>>>>> | places that change ->exit_signal hold write_lock_irq(&tasklist_lock), and
> >>>>>> | eligibile_child() (for an instance of a reader being another task) holds
> >>>>>> | read_lock(&tasklist_lock). But maybe this does not matter for ghost tasks.
> >>>>>> | 
> >>>>>>
> >>>>>> Yes, an earlier version had the write_lock(&tasklist_lock). Will add it
> >>>>>> back.
> >>>>>>
> >>>>>> | >  	restore_debug_exit(ctx);
> >>>>>> | >  	ckpt_ctx_put(ctx);
> >>>>>> | >  	do_exit(0);
> >>>>>> | > @@ -1465,7 +1466,22 @@ void exit_checkpoint(struct task_struct *tsk)
> >>>>>> | >  	/* restarting zombies will activate next task in restart */
> >>>>>> | >  	if (tsk->flags & PF_RESTARTING) {
> >>>>>> | >  		BUG_ON(ctx->active_pid == -1);
> >>>>>> | > +
> >>>>>> | > +		/*
> >>>>>> | > +		 * if we are a "ghost" task, that was terminated by the
> >>>>>> | > +		 * container-init (from zap_pid_ns_processes()), we should
> >>>>>> | > +		 * wake up the parent since we are now a detached process.
> >>>>>> | > +		 */
> >>>>>> | > +		read_lock_irq(&tasklist_lock);
> >>>>>> | 
> >>>>>> | read_lock() is enough. tasklist_lock is never taken for write from IRQs or
> >>>>>> | softIRQs.
> >>>>>> | 
> >>>>>> | > +                if (tsk->exit_state == EXIT_DEAD && !tsk->parent->exit_state) {
> >>>>>> | > +                        ckpt_debug("[%d, %s]: exit_checkpoint(): notifying "
> >>>>>> | > +					"parent\n", tsk->pid, tsk->comm);
> >>>>>> | > +                        __wake_up_parent(tsk, tsk->parent);
> >>>>>> | > +                }
> >>>>>> | > +		read_unlock_irq(&tasklist_lock);
> >>>>>> | 
> >>>>>> | Looking at this closer, I wonder if this wakeup logic should be called from
> >>>>>> | do_ghost_task(), right after setting ->exit_signal. This way there would be no
> >>>>>> | need for a tricky condition to recognize ghost tasks, and (I think) this is
> >>>>>> | closer to the other cases changing ->exit_signal (reparent_leader() and
> >>>>>> | exit_notify()).
> >>>>>>
> >>>>>> Yes, we tried the following in the earlier version. 
> >>>>>>
> >>>>>> void ghost_auto_reapable()
> >>>>>> {
> >>>>>>         write_lock(&tasklist_lock);
> >>>>>>         current->exit_signal = -1;
> >>>>>>         __wake_up_parent(current, current->parent);
> >>>>>>         write_unlock(&tasklist_lock);
> >>>>>> }
> >>>>>>
> >>>>>> And called this from do_ghost_task(). But with this, the parent could
> >>>>>> wake up, find that it still has an eligible child (the ghost) to wait 
> >>>>>> for, and go back to waiting before the ghost enters the EXIT_DEAD state.
> >>>>>> And so we would lose the wake up.
> >>>>>>
> >>>>>> (zap_pid_ns_processes() passes the __WALL so the ghost would be considered
> >>>>>> an eligible child).
> >>>>>
> >>>>> I think I see now. The point is that ->exit_signal = -1 is only meant to work
> >>>>> correctly for sub-threads, which the parent does not need to wait for. IOW, the
> >>>>> notion of detached task is only implemented for sub-threads.
> >>>>>
> >>>>> IIUC, setting ->exit_signal to -1 is only used here to let exit_notify() set
> >>>>> ->exit_state to EXIT_DEAD, right? Otherwise, setting ->exit_signal to 0 and
> >>>>> letting do_notify_parent() proceed for ghost tasks would have be sufficient I
> >>>>> guess (provided that the confusion between ghost tasks and zombies could be
> >>>>> easily avoided in do_notify_parent()).
> >>>>>
> >>>>> Then I agree that the proposed patch looks like a reasonably simple approach.
> >>>>>
> >>>>> Thanks for the explanation,
> >>>>>
> >>>>
> >>>> Louis, Suka:
> >>>>
> >>>> One subtlety with the method is that if a process get reparented
> >>>> (for whatever reason) then the ->exit_signal field is reset to 
> >>>> SIGCHLD. Fortunately, that should not affect us because our ghost
> >>>> tasks never become orphaned.
> >>>
> >>> AFAICS, this is not true for detached tasks: neither in reparent_leader(), nor
> >>> in exit_notify(). Unless I missed another place?
> >>
> >> Doh.. you are totally right, I missed that.
> >> Sometime it feels really good to be wrong :)
> > 
> > ;)
> > 
> >>
> >>>
> >>>>
> >>>> However, I can't avoid thinking that maybe there is a better way 
> >>>> to do this altogether ?
> >>>>
> >>>> The requirement is simple: ghost tasks are temporary tasks whose
> >>>> role is to keep certain pid's alive for the duration of the restart
> >>>> and then exit without a trace before the restarted tasks resume
> >>>> execution.
> >>>>
> >>>> The reason I opted for the ->exit_signal = -1 is because it makes
> >>>> sure that the parent need not explicitly collect the child ghost.
> >>>>
> >>>> If I had set it to 0, then it would not send a signal, but still
> >>>> would require a wait() to collect it (right ?).
> >>>
> >>> Right.
> >>>
> >>>>
> >>>> Can you think of a way to achieve this functionality without the 
> >>>> subtleties that we have observed so far ?  even at the cost of a 
> >>>> minor change to, say, wait() logic or what not ?
> >>>
> >>> I wonder if things would be easier by providing a mean to distinguish ghost
> >>> tasks from truely restarted tasks in do_notify_parent().
> >>>
> >>> Assuming this, there might be a kernel-patch-free way, although I can't say if
> >>> this fits userspace restart constraints:
> >>>
> >>> Setting the signal handler of ghost tasks' parent to SIG_IGN, and ->exit_signal
> >>
> >> Yes, I thought about it. But those parent are part of the restart,
> >> and changing their signal handlers will is tricky and will require
> >> complex sync logic at the end of restart to ensure that all tasks
> >> restore their handlers after the ghosts are gone and released, but
> >> before any task gets to userspace ... I don't want to go there.
> > 
> > Yes, I can imagine well the added complexity in userspace.
> > 
> >>
> >>> of ghost tasks to SIGCHLD, will make them autoreap. This requires that the
> >>> parent does not need to synchronize with other children until all ghost tasks
> >>> have exited, and that the parent remains alive too.
> >>>
> >>> So, why not adding some flag PF_RESTART_GHOST or TIF_RESTART_GHOST?
> >>
> >> Actually, we already have a way to distinguish them:
> >>
> >> 	if ((tsk->flags & PF_RESTARTING) && task_detached(tsk))
> >>
> >> One problem with this is that we only set exit_signal to -1 after
> >> the ghost was born, so if the parent is already waiting I think 
> >> that will be racy, as per suka's comment above.
> > 
> > Yes.
> > 
> >>
> >> So looking at the code again, we could add one condition in exit.c
> >> at wait_consider_task(), after the test of p->exit_state == EXIT_DEAD,
> >> to also test:
> >>
> >> inline static bool is_ghost_task(p)
> >> {
> >> 	return (p->flags & (PF_EXITING|PF_RESTARTING) ==
> >> 		PF_EXITING|PF_RESTARTING) && task_detached(p)
> >> }
> >>
> >> 	if (p->flags & is_ghost_task(p))
> >> 		return 0;
> >>
> >> Or something along the lines (e.g. used EXIT_ZOMBIE comparison instead
> >> of PF_EXITING). While requiring a kernel patch, it is relatively short,
> >> clean and easy to review.

EXIT_ZOMBIE comparison would not optimize much imho, since p->flags must be
checked anyway.

Nit1: I don't think that checking p->flags saves anything before calling
is_ghost_task().

Nit2: why would you like to check that PF_EXITING and PF_RESTARTING come
together? Is it to make sure that no "real" restarted thread will be skipped
this way?

> > 
> > Adding the check in do_wait() is racy IMHO. What does prevent do_ghost_task()
> > from setting ->exit_signal = -1 after the parent sleeps in do_wait()?
> 
> If we do it per your suggestion (and I also had suggested the
> same originally):
> 
> >>>>>> void ghost_auto_reapable()
> >>>>>> {
> >>>>>>         write_lock(&tasklist_lock);
> >>>>>>         current->exit_signal = -1;
> >>>>>>         __wake_up_parent(current, current->parent);
> >>>>>>         write_unlock(&tasklist_lock);
> >>>>>> }
> 
> then it should be safe, no ?

s/write_lock/write_lock_irq/ and I'll call it safe :)

> 
> > 
> > Otherwise, why not making ghost tasks simply sub-threads of the "parent"? They
> > would autoreap and nobody would wait or have to wait for them. I would feel so
> > much better if we had not to change ->exit_signal in do_ghost_task()...
> 
> There are two types of ghosts:
> 
> 1) A placeholder needed to ensure that some processes are orphaned
> correctly (while retaining their sid)
> 
> 2) A placeholder needed to ensure that a certain pid exists during
> the restart (and can be, e.g. used by other processes as pgid).
> 
> in both cases, a thread of the parent won't do what I need: in the
> first case, the child will be re-parented to another thread (leader?).

any other thread, possibly the leader. Right.

> And in the second, the pgid will be be usable as a pgid.

You probably meant that the pid would not be usable as a pgid (because not group
leader pid, possible session mismatch, etc.).

> 
> But please keep throwing suggestions, we're making progress here...
> 

(Just had a closer look to how pgids are restored)

Well, given that it's useful to have a consistent pgid/session for ghost tasks,
and that it looks too difficult to rework the synchronization logic before
reaping ghost tasks (referring to the ->exit_signal = 0 + parent sighandler =
SIG_IGN suggestion), I'm falling short of cleaner suggestions :)

The ghost_auto_reapable() thing above + wait_consider_task() change looks like
the cleanest solution so far. My only fear is that people could be reluctant to
adding this (small?) overhead to do_wait() "only" for cr.

Thanks,

Louis

-- 
Dr Louis Rilling			Kerlabs
Skype: louis.rilling			Batiment Germanium
Phone: (+33|0) 6 80 89 08 23		80 avenue des Buttes de Coesmes
http://www.kerlabs.com/			35700 Rennes
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 197 bytes
Desc: Digital signature
Url : http://lists.linux-foundation.org/pipermail/containers/attachments/20110303/a9bae263/attachment.pgp 


More information about the Containers mailing list