[BIG RFC] Filesystem-based checkpoint

Eric W. Biederman ebiederm at xmission.com
Thu Oct 30 16:33:16 PDT 2008


Dave Hansen <dave at linux.vnet.ibm.com> writes:

> I hate the syscall.  It's a very un-Linux-y way of doing things.  There,
> I said it.  Here's an alternative.  It still uses the syscall to
> initiate things, but it uses debugfs to transport the data instead.
> This is just a concept demonstration.  It doesn't actually work, and I
> wouldn't be using debugfs in practice.

A syscall is a very linux-y way to do it.

If you called it a core dump instead of a checkpoint you have exactly the same set
of issues.

Why we are doing vfs_write instead of file->f_op->write I don't understand.

> System calls in Linux are fast.  Doing lots of them is not a problem.
> If it becomes one, we can always export a condensed version of this
> format next to the expanded one, kinda like ftrace does.  Atomicity with
> this approach is also not a problem.  The system call in this approach
> doesn't return until the checkpoint is completely written out.

Extra copies for something (memory) you want to transfer quickly
and efficiently is a problem.

Reading the memory of another process is a problem, to the point
that the /proc/<pid>/mem interface has been removed from the kernel.
  
> This lets userspace pick and choose what parts of the checkpoint it
> cares about.  It enables us to do all the I/O from userspace: no
> in-kernel sys_read/write().  I think this interface is much more
> flexible than a plain syscall.

Then get with Roland McGraff and build the next generation user
space debugging interface.

> Want to do a fast checkpoint?  Fine, copy all data, use a lot of memory,
> store it in-kernel.  Dump that out when the filesystem is accessed.
> Destroy it when userspace asks.

> So, why not?

Besides the part of creating a bunch of questionable interfaces
that we need to support forever.

Ultimately the question is how do you do checkpoint restore and I just
don't see that happening with a filesystem interface.  Way way way too many
dangerous syscalls that are only needed for one thing.

Checkpoint/Restore are an atomic operation, and filesystems suck and building
high level atomic primitives.

Eric


More information about the Containers mailing list