C/R minisummit notes
dlezcano at fr.ibm.com
Wed Jul 23 04:30:07 PDT 2008
* What are the problems that the linux community can solve with the
Eric Biederman reminds at the previous OLS nobody complained about the
Pavel Emylianov : The startup of Oracle takes some minutes, if we
checkpoint just after the startup, Oracle can be restarted from this
point later and provide fast startup
Oren Laaden : Time travel, we can do monotonic snapshot and go back on
one of this snaphost.
Eric Biedreman : Priority running, checkpoint/kill an application and
run another application with a bigger priority
Denis Lunev : Task migration, move application on one host to another host
Daniel Lezcano : SSI (task migration)
* Preparing the kernel internals
OL : Can we implement a kernel module and move CR functionality into
the kernel itself later ?
EB : Better to add a little CR functionnality into the kernel itself
and add more after.
DLu : Problem with kernel version
OL : Compatibility with intermediate kernel version should be possible
with userspace conversion tools
DLu : Non sequential file for checkpoint statefile is a challenge
OL : yes, but possible and useful for compression/encryption
We showed that there are five steps to realize a checkpoint:
1 - Pre-dump
2 - Freeze
3 - Dump
4 - Resume/kill
5 - Post-dump
At this point we state we want create a proof of concept and
checkpoint/restart the simplest application.
We will add iteratively more and more kernel resources.
Process hierarchy created from kernel or userspace ?
OL : Seems better to send a chunk of data to kernel and that restores
the processes hierarchy
PE : Agreed
OL : We should be able to checkpoint from inside the container, keep
that in mind for later.
=> we need a syscall or a ioctl
The first items to address before implementing the Checkpoint are:
1 - Make a container object (the context)
2 - Freeze the container (extend cgroup freezer ?)
3 - syscall | ioctl
* simplest application : A single process, without any file, no
checkpoint of text file (same file system for restart), no signals, no
syscall in the application, no ipc/no msgq, no network
* multiple processes + zombie state
* files, pipe, signals, socketpair ?
This proof of concept must came with a documentation describing what is
supported, what is not supported and what we plan to do.
More information about the Containers