Checkpoint/Restart mini-summit

Eric W. Biederman ebiederm at xmission.com
Tue Jul 15 11:44:40 PDT 2008


Daniel Lezcano <dlezcano at fr.ibm.com> writes:

> Hi all,
>
> Here is a proposition a more detailed agenda for the checkpoint/restart 
> mini-summit. If everybody is ok with it, I will update the wiki.
>
> Comments are welcome :)

A reading list is useful, even to help get some ideas circulating
before we get there.

Ultimately the technical details will need to be resolve by
people discussing things and sending patches back and forth
on the mailing lists.

I don't think a detailed agenda is going to get us anywhere.
Especially not one focused on the implementation details.

I think we need to start by seeing what we can agree on.  Certainly we
agree that checkpoint/restart needs to be part of the picture.  What
are the problems that the linux community can solve with
checkpoint/restart.

Then we need to talk about what kind of implementation we want to
merge into mainline.  How do we sell it, and how do we implement
it without affecting long term maintainability.

I think the granularity of our operations, and what state we
save is important.  I don't think how we save it is important
unless it affects one of our requirements.

As for the posix draft and the historical Cray & SGI implementations.
They were on the wrong track.  The did not have namespace support
so they could not in general restore their checkpoints.

There are also a lot of things you have failed to touch on, that
I'm not going to go into now.

With any luck the mini-summit before OLS will be the start of a
conversation that will go on all week, and continue on the mailing
lists.

The real question is how do we coordinate our efforts to build a good
linux checkpoint/restart implementation.

> * Documentation
>    * Zap : www.ncl.cs.columbia.edu/publications/usenix2007_fordist.pdf
>    * Metacluster : lxc.sourceforge.net/doc/ols2006/lxc-ols2006.pdf
>    * OpenVZ : http://wiki.openvz.org/Checkpointing_and_live_migration
>    * Checkpoint/Restart technology : 
> http://en.wikipedia.org/wiki/Application_checkpointing
>    * Virtual Servers and Checkpoint/Restart in Mainstream Linux : Sigops 
> document

There is also the classic emacs undump.
The very simple vmadump from bproc.

Eric


More information about the Containers mailing list