[PATCH][cr]: Fix ghost task bug
Louis Rilling
Louis.Rilling at kerlabs.com
Thu Mar 3 08:35:31 PST 2011
On 03/03/11 10:38 -0500, Oren Laadan wrote:
>
>
> On 03/01/2011 10:31 AM, Louis Rilling wrote:
> > On 28/02/11 17:10 -0500, Oren Laadan wrote:
> >>
> >>
> >> On 02/28/2011 10:09 AM, Louis Rilling wrote:
> >>> On 28/02/11 9:43 -0500, Oren Laadan wrote:
> >>>>
> >>>>
> >>>> On 02/26/2011 08:54 AM, Louis Rilling wrote:
> >>>>> On Fri, Feb 25, 2011 at 11:01:32AM -0800, Sukadev Bhattiprolu wrote:
> >>>>>> Louis Rilling [Louis.Rilling at kerlabs.com] wrote:
> >>>>>> | On 24/02/11 23:55 -0800, Sukadev Bhattiprolu wrote:
> >>>>>> | >
> >>>>>> | > diff --git a/kernel/checkpoint/restart.c b/kernel/checkpoint/restart.c
> >>>>>> | > index b0ea8ec..8ecc052 100644
> >>>>>> | > --- a/kernel/checkpoint/restart.c
> >>>>>> | > +++ b/kernel/checkpoint/restart.c
> >>>>>> | > @@ -972,6 +972,7 @@ static int do_ghost_task(void)
> >>>>>> | > if (ret < 0)
> >>>>>> | > ckpt_err(ctx, ret, "ghost restart failed\n");
> >>>>>> | >
> >>>>>> | > + current->exit_signal = -1;
> >>>>>> |
> >>>>>> | Setting ->exit_signal outside of tasklist_lock makes me nervous. All other
> >>>>>> | places that change ->exit_signal hold write_lock_irq(&tasklist_lock), and
> >>>>>> | eligibile_child() (for an instance of a reader being another task) holds
> >>>>>> | read_lock(&tasklist_lock). But maybe this does not matter for ghost tasks.
> >>>>>> |
> >>>>>>
> >>>>>> Yes, an earlier version had the write_lock(&tasklist_lock). Will add it
> >>>>>> back.
> >>>>>>
> >>>>>> | > restore_debug_exit(ctx);
> >>>>>> | > ckpt_ctx_put(ctx);
> >>>>>> | > do_exit(0);
> >>>>>> | > @@ -1465,7 +1466,22 @@ void exit_checkpoint(struct task_struct *tsk)
> >>>>>> | > /* restarting zombies will activate next task in restart */
> >>>>>> | > if (tsk->flags & PF_RESTARTING) {
> >>>>>> | > BUG_ON(ctx->active_pid == -1);
> >>>>>> | > +
> >>>>>> | > + /*
> >>>>>> | > + * if we are a "ghost" task, that was terminated by the
> >>>>>> | > + * container-init (from zap_pid_ns_processes()), we should
> >>>>>> | > + * wake up the parent since we are now a detached process.
> >>>>>> | > + */
> >>>>>> | > + read_lock_irq(&tasklist_lock);
> >>>>>> |
> >>>>>> | read_lock() is enough. tasklist_lock is never taken for write from IRQs or
> >>>>>> | softIRQs.
> >>>>>> |
> >>>>>> | > + if (tsk->exit_state == EXIT_DEAD && !tsk->parent->exit_state) {
> >>>>>> | > + ckpt_debug("[%d, %s]: exit_checkpoint(): notifying "
> >>>>>> | > + "parent\n", tsk->pid, tsk->comm);
> >>>>>> | > + __wake_up_parent(tsk, tsk->parent);
> >>>>>> | > + }
> >>>>>> | > + read_unlock_irq(&tasklist_lock);
> >>>>>> |
> >>>>>> | Looking at this closer, I wonder if this wakeup logic should be called from
> >>>>>> | do_ghost_task(), right after setting ->exit_signal. This way there would be no
> >>>>>> | need for a tricky condition to recognize ghost tasks, and (I think) this is
> >>>>>> | closer to the other cases changing ->exit_signal (reparent_leader() and
> >>>>>> | exit_notify()).
> >>>>>>
> >>>>>> Yes, we tried the following in the earlier version.
> >>>>>>
> >>>>>> void ghost_auto_reapable()
> >>>>>> {
> >>>>>> write_lock(&tasklist_lock);
> >>>>>> current->exit_signal = -1;
> >>>>>> __wake_up_parent(current, current->parent);
> >>>>>> write_unlock(&tasklist_lock);
> >>>>>> }
> >>>>>>
> >>>>>> And called this from do_ghost_task(). But with this, the parent could
> >>>>>> wake up, find that it still has an eligible child (the ghost) to wait
> >>>>>> for, and go back to waiting before the ghost enters the EXIT_DEAD state.
> >>>>>> And so we would lose the wake up.
> >>>>>>
> >>>>>> (zap_pid_ns_processes() passes the __WALL so the ghost would be considered
> >>>>>> an eligible child).
> >>>>>
> >>>>> I think I see now. The point is that ->exit_signal = -1 is only meant to work
> >>>>> correctly for sub-threads, which the parent does not need to wait for. IOW, the
> >>>>> notion of detached task is only implemented for sub-threads.
> >>>>>
> >>>>> IIUC, setting ->exit_signal to -1 is only used here to let exit_notify() set
> >>>>> ->exit_state to EXIT_DEAD, right? Otherwise, setting ->exit_signal to 0 and
> >>>>> letting do_notify_parent() proceed for ghost tasks would have be sufficient I
> >>>>> guess (provided that the confusion between ghost tasks and zombies could be
> >>>>> easily avoided in do_notify_parent()).
> >>>>>
> >>>>> Then I agree that the proposed patch looks like a reasonably simple approach.
> >>>>>
> >>>>> Thanks for the explanation,
> >>>>>
> >>>>
> >>>> Louis, Suka:
> >>>>
> >>>> One subtlety with the method is that if a process get reparented
> >>>> (for whatever reason) then the ->exit_signal field is reset to
> >>>> SIGCHLD. Fortunately, that should not affect us because our ghost
> >>>> tasks never become orphaned.
> >>>
> >>> AFAICS, this is not true for detached tasks: neither in reparent_leader(), nor
> >>> in exit_notify(). Unless I missed another place?
> >>
> >> Doh.. you are totally right, I missed that.
> >> Sometime it feels really good to be wrong :)
> >
> > ;)
> >
> >>
> >>>
> >>>>
> >>>> However, I can't avoid thinking that maybe there is a better way
> >>>> to do this altogether ?
> >>>>
> >>>> The requirement is simple: ghost tasks are temporary tasks whose
> >>>> role is to keep certain pid's alive for the duration of the restart
> >>>> and then exit without a trace before the restarted tasks resume
> >>>> execution.
> >>>>
> >>>> The reason I opted for the ->exit_signal = -1 is because it makes
> >>>> sure that the parent need not explicitly collect the child ghost.
> >>>>
> >>>> If I had set it to 0, then it would not send a signal, but still
> >>>> would require a wait() to collect it (right ?).
> >>>
> >>> Right.
> >>>
> >>>>
> >>>> Can you think of a way to achieve this functionality without the
> >>>> subtleties that we have observed so far ? even at the cost of a
> >>>> minor change to, say, wait() logic or what not ?
> >>>
> >>> I wonder if things would be easier by providing a mean to distinguish ghost
> >>> tasks from truely restarted tasks in do_notify_parent().
> >>>
> >>> Assuming this, there might be a kernel-patch-free way, although I can't say if
> >>> this fits userspace restart constraints:
> >>>
> >>> Setting the signal handler of ghost tasks' parent to SIG_IGN, and ->exit_signal
> >>
> >> Yes, I thought about it. But those parent are part of the restart,
> >> and changing their signal handlers will is tricky and will require
> >> complex sync logic at the end of restart to ensure that all tasks
> >> restore their handlers after the ghosts are gone and released, but
> >> before any task gets to userspace ... I don't want to go there.
> >
> > Yes, I can imagine well the added complexity in userspace.
> >
> >>
> >>> of ghost tasks to SIGCHLD, will make them autoreap. This requires that the
> >>> parent does not need to synchronize with other children until all ghost tasks
> >>> have exited, and that the parent remains alive too.
> >>>
> >>> So, why not adding some flag PF_RESTART_GHOST or TIF_RESTART_GHOST?
> >>
> >> Actually, we already have a way to distinguish them:
> >>
> >> if ((tsk->flags & PF_RESTARTING) && task_detached(tsk))
> >>
> >> One problem with this is that we only set exit_signal to -1 after
> >> the ghost was born, so if the parent is already waiting I think
> >> that will be racy, as per suka's comment above.
> >
> > Yes.
> >
> >>
> >> So looking at the code again, we could add one condition in exit.c
> >> at wait_consider_task(), after the test of p->exit_state == EXIT_DEAD,
> >> to also test:
> >>
> >> inline static bool is_ghost_task(p)
> >> {
> >> return (p->flags & (PF_EXITING|PF_RESTARTING) ==
> >> PF_EXITING|PF_RESTARTING) && task_detached(p)
> >> }
> >>
> >> if (p->flags & is_ghost_task(p))
> >> return 0;
> >>
> >> Or something along the lines (e.g. used EXIT_ZOMBIE comparison instead
> >> of PF_EXITING). While requiring a kernel patch, it is relatively short,
> >> clean and easy to review.
EXIT_ZOMBIE comparison would not optimize much imho, since p->flags must be
checked anyway.
Nit1: I don't think that checking p->flags saves anything before calling
is_ghost_task().
Nit2: why would you like to check that PF_EXITING and PF_RESTARTING come
together? Is it to make sure that no "real" restarted thread will be skipped
this way?
> >
> > Adding the check in do_wait() is racy IMHO. What does prevent do_ghost_task()
> > from setting ->exit_signal = -1 after the parent sleeps in do_wait()?
>
> If we do it per your suggestion (and I also had suggested the
> same originally):
>
> >>>>>> void ghost_auto_reapable()
> >>>>>> {
> >>>>>> write_lock(&tasklist_lock);
> >>>>>> current->exit_signal = -1;
> >>>>>> __wake_up_parent(current, current->parent);
> >>>>>> write_unlock(&tasklist_lock);
> >>>>>> }
>
> then it should be safe, no ?
s/write_lock/write_lock_irq/ and I'll call it safe :)
>
> >
> > Otherwise, why not making ghost tasks simply sub-threads of the "parent"? They
> > would autoreap and nobody would wait or have to wait for them. I would feel so
> > much better if we had not to change ->exit_signal in do_ghost_task()...
>
> There are two types of ghosts:
>
> 1) A placeholder needed to ensure that some processes are orphaned
> correctly (while retaining their sid)
>
> 2) A placeholder needed to ensure that a certain pid exists during
> the restart (and can be, e.g. used by other processes as pgid).
>
> in both cases, a thread of the parent won't do what I need: in the
> first case, the child will be re-parented to another thread (leader?).
any other thread, possibly the leader. Right.
> And in the second, the pgid will be be usable as a pgid.
You probably meant that the pid would not be usable as a pgid (because not group
leader pid, possible session mismatch, etc.).
>
> But please keep throwing suggestions, we're making progress here...
>
(Just had a closer look to how pgids are restored)
Well, given that it's useful to have a consistent pgid/session for ghost tasks,
and that it looks too difficult to rework the synchronization logic before
reaping ghost tasks (referring to the ->exit_signal = 0 + parent sighandler =
SIG_IGN suggestion), I'm falling short of cleaner suggestions :)
The ghost_auto_reapable() thing above + wait_consider_task() change looks like
the cleanest solution so far. My only fear is that people could be reluctant to
adding this (small?) overhead to do_wait() "only" for cr.
Thanks,
Louis
--
Dr Louis Rilling Kerlabs
Skype: louis.rilling Batiment Germanium
Phone: (+33|0) 6 80 89 08 23 80 avenue des Buttes de Coesmes
http://www.kerlabs.com/ 35700 Rennes
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 197 bytes
Desc: Digital signature
Url : http://lists.linux-foundation.org/pipermail/containers/attachments/20110303/a9bae263/attachment.pgp
More information about the Containers
mailing list