checkpoint/restart ABI

Serge E. Hallyn serue at us.ibm.com
Tue Aug 12 07:49:05 PDT 2008


Quoting Peter Chubb (peterc at gelato.unsw.edu.au):
> >>>>> "Jeremy" == Jeremy Fitzhardinge <jeremy at goop.org> writes:
> 
> Jeremy> Dave Hansen wrote:
> >> Arnd, Jeremy and Oren,
> >> 
> 
> 
> Jeremy>    * multiple processes * pipes * UNIX domain sockets * INET
> Jeremy> sockets (both inter and intra machine) * unlinked open files *
> Jeremy> checkpointing file content * closed files (ie, files which
> Jeremy> aren't currently open, but will be soon, esp tmp files) *
> Jeremy> shared memory * (Peter, what have I forgotten?)
> 
> File sharing; multiple threads with wierd sharing arrangements (think:
> clone with various parameters, followed by exec in some of the threads
> but not others); MERT/system-V shared memory, semaphores and message
> queues; devices (audio, framebuffer, etc), HugeTLBFS, numa issues
> (pinning, memory layout), processes being debugged (so,
> checkpoint.restart a gdb/target pair), futexes, etc., etc.  Linux
> process state keeps expanding.
> 
> Jeremy> Having gone through this before, I don't think an all-kernel
> Jeremy> solution can work except for the most simple cases.
> 
> I agree ... it's better to put mechanisms into the kernel that can
> then be used by a user-space programme to actually do the
> checkpointing and restarting.
> 
> Beefing up ptrace or fixing /proc to be a real debugging interface
> would be a start ... when you can get at *all* the info you need,

Except we don't really want to export all the info you need for a
complete restartable checkpoint.  And especially not make it
generally writable.

We have also started down that path using ptrace (see cryo, at
git://git.sr71.net/~hallyn/cryodev.git).

Right before the containers mini-summit, where the general agreement was
that a complete in-kernel solution ought to be pursued, I had tried
a restart using a binary format that read a checkpoint file and used
cryo (userspace using ptrace) for the rest of the restart, only
because there was no other reasonable way to set tsk->did_exec on
restart.

> quickly and easily, the userspace checkpoint falls out fairly
> naturally.  You still have to work out an extensible file format to
> store stuff, and how to restore all that state you've so lovingly
> collected.
> 
> Jeremy> Lightweight filesystem checkpointing, such as btrfs provides,
> Jeremy> would seem like a powerful mechanism for handling a lot of the
> Jeremy> filesystem state problems.  It would have been useful when we
> Jeremy> did this...
> 
> And how!  saving bits of files was very timeconsuming.

Yes, we're looking forward to using btrfs' snapshots :)

-serge


More information about the Containers mailing list