[RFC v6][PATCH 0/9] Kernel based checkpoint/restart

Thu Oct 16 05:35:59 PDT 2008

Oren Laadan wrote:
> Cedric Le Goater wrote:
>> Dave Hansen wrote:
>>> On Mon, 2008-10-13 at 10:13 +0200, Cedric Le Goater wrote:
>>>> hmm, that's rather complex, because we have to take into account the 
>>>> kernel stack, no ? This is what Andrey was trying to solve in his patchset 
>>>> back in September :
>>>>
>>>>         http://lkml.org/lkml/2008/9/3/96
>>>>
>>>> the restart phase simulates a clone and switch_to to (not) restore the kernel 
>>>> stack. right ? 
>>> Do we ever have to worry about the kernel stack if we simply say that
>>> tasks have to be *in* userspace when we checkpoint them. 
>> at a syscall boundary for example. that would make our life easier 
>> definitely. 
>>
> 
> The ideal situation is never worry about kernel stack: either we catch
> the task in user space or at a syscall boundary. This is taken care of
> by freezing the tasks prior to checkpoint.
> 
> The one exception (and it is a tedious one !) are states in which the
> task is already frozen by definition: any ptrace blocking point where
> the tracee waits for the tracer to grant permission to proceed with
> its execution. Another example is in vfork(), waiting for completion.

I would say these are perfect places for "may be non-checkpointable" :)

> In both cases, there will be a kernel stack and we cannot avoid it.
> The bad news is that it may be a bit tedious to restart these cases.
> The good news, however, is that they are very well defined locations
> with well defined semantics. So upon restart all that is needed is
> to emulate the expected behavior had we not been checkpointed. This,
> luckily, does not require rebuilding the kernel stack, but instead
> some smart glue code for a finite set of special cases.