[RFC v13][PATCH 00/14] Kernel based checkpoint/restart

Dave Hansen dave at linux.vnet.ibm.com
Thu Feb 12 14:57:37 PST 2009


On Thu, 2009-02-12 at 13:30 -0600, Matt Mackall wrote:
> On Thu, 2009-02-12 at 10:11 -0800, Dave Hansen wrote:
...
> >  * Filesystem state
> >   * contents of files
> >   * mount tree for individual processes
> >  * flock
> >  * threads and sessions
> >  * CPU and NUMA affinity
> >  * sys_remap_file_pages()
> 
> I think the real questions is: where are the dragons hiding? Some of
> these are known to be hard. And some of them are critical checkpointing
> typical applications. If you have plans or theories for implementing all
> of the above, then great. But this list doesn't really give any sense of
> whether we should be scared of what lurks behind those doors.

This is probably a better question for people like Pavel, Alexey and
Cedric to answer.  

> Some of these things we probably don't have to care too much about. For
> instance, contents of files - these can legitimately change for a
> running process. Open TCP/IP sockets can legitimately get reset as well.
> But others are a bigger deal.

Legitimately, yes.  But, practically, these are things that we need to
handle because we want to make any checkpoint/restart as transparent as
possible.  Resetting people's network connections is not exactly illegal
but not very nice or transparent either.

> Also, what happens if I checkpoint a process in 2.6.30 and restore it in
> 2.6.31 which has an expanded idea of what should be restored? Do your
> file formats handle this sort of forward compatibility or am I
> restricted to one kernel?

In general, you're restricted to one kernel.  But, people have mentioned
that, if the formats change, we should be able to write in-userspace
converters for the checkpoint files.  

-- Dave



More information about the Containers mailing list