kernel summit topic - 'containers end-game'
Serge E. Hallyn
serue at us.ibm.com
Mon Jul 6 07:51:37 PDT 2009
Quoting Daniel Lezcano (dlezcano at fr.ibm.com):
> Serge E. Hallyn wrote:
> - The initiator of the checkpoint initialize the barrier and send a
> signal SIGCKPT to all the checkpointable tasks and these ones will jump
> on the handler and block on the barrier.
> - When all these tasks reach this barrier, the initiator of the
> checkpoint dumps the system wide resources (memory, sysv ipc, struct
> files, etc ...).
> - When this is done, the tasks are released and they store their
> process wide resources (semundo, file descriptor, etc ...) to a
> current->ckpt_restart buffer and then set the status of the operation
> and block on the barrier.
> - The initiator of the checkpoint then collects all these informations
> and dump them.
Do you envision all of the dumping being done in kernel or by userspace?
> - Finally the initiator of the checkpoint release the tasks.
> - The user executes the statefile, that spawns the process tree and all
> the processes are blocked in the barrier.
> - The initiator of the restart restore the system wide resources
> and fill the restarted processes' current->ckpt_restart buffer.
Same question about restore...
> - The initiator sends a SIGRESTART to all the tasks and unblock the tasks
> - all the tasks restore their process wide resources regarding the
> current->ckpt_restart buffer.
> - all the tasks write their status and block on the barrier
> - the initiator of the restart release the tasks which will return to
> their execution context when they were checkpointed.
> This approach is different of you are doing but I am pretty sure most of
> the code is re-usable. I see different advantages of this approach:
> - because the process resources are checkpointed / restarted from
> current, it would be easy to reuse some syscalls code (from the kernel
> POV) and that would reduce the code duplication and maintenance overhead.
> - the approach is more fine grained as we can implement piece by piece
> the checkpoint / restart.
> - as the statefile is in the elf format, gdb could be used to debug a
> statefile as a core file
Note btw that Dave has found that a checkpoint is faster than a core-dump
at the moment :) That's not entirely an aside - I need to reread your
email a few times and really process your suggestion, but given that some
users want to dump hundreds of gigabytes of memory, not slowing down the
checkpoint is a big consideration.
> - as each process checkpoint / restart themselves, most of the
> execution context is stored in the stack which is CR with the memory, so
> when returning from the signal handler, the process returns to the right
> context. That is less complicated and more generic than externally
> checkpoint the execution context of a frozen task which would be
> potentially different for the restart.
> I hope Serge you can present this approach as an alternative of the
> current patchset __if__ this one is not acceptable.
I'll try to understand it better than I do right now - I don't think
it's for discussing at ksummit, but definately if we have a mini-summit
or during the next round of discussions (during or immediately after
the ckpt-v17 publish).
More information about the Containers