[Ksummit-2010-discuss] checkpoint-restart: naked patch

Oren Laadan orenl at cs.columbia.edu
Mon Nov 22 09:34:54 PST 2010

On Sun, 21 Nov 2010, Gene Cooperman wrote:

> Below, we'll summarize the four major questions that we've understood from
> this discussion so far.  But before doing so, I want to point out that a single
> process or process tree will always have many possible interactions with
> the rest of the world.  Within our own group, we have an internal slogan:
>   "You can't checkpoint the world."
> A virtual machine can have a relatively closed world, which makes it more
> robust, but checkpointing will always have some fragile parts.

That depends of what your definition of "world". One definition
is "world := VM", as you state above. Another is "world := container"
which I stated in my post(s). You can checkpoint both.

For those cases where the "world" cannot be fully checkpointed, 
I explicitly pointed  that we should focus on the core c/r 
functionality, because the "glue" can be done either way.

> We give four examples below: 
> a.  time virtualization

IMHO, irrelevant to current discussion. And btw, this is done in
linux-cr for live migration of tcp connections.

> b.  external database
> c.  NSCD daemon

This falls within the category of "glue", and is - as I try once
again to remind - tentirely oorthogonal to the topic of where
to do c/r.

> d.  screen and other full-screen text programs
> These are not the only examples of difficult interactions with the
> rest of the world.

This actually never required a userspace "component" with Zap
or linux-cr (to the best of my knowledge)..

Even if it did - the question is not how to deal with "glue"
(you demonstrated quite well how to do that with DMTCP), but 
how should teh basic, core c/r functionality work - which is
below, and orthogonal to the "glue".

Let us please focus on the base c/r engine functionality...

(gotta disconnect now .. more later)


More information about the Containers mailing list