[PATCH 0/6] /proc/pid/checkpointable

Serge E. Hallyn serue at us.ibm.com
Wed Mar 25 10:29:38 PDT 2009


Quoting Eric W. Biederman (ebiederm at xmission.com):
> Dave Hansen <dave at linux.vnet.ibm.com> writes:
> 
> > On Wed, 2009-03-18 at 13:03 -0700, Mike Waychison wrote:
> >> Polluting the dmesg buffer with messages from common failures (consider 
> >> a multi-user cluster where checkpoints may or may not succeed) isn't 
> >> very useful.
> >
> > Yeah, I've already gotten an earful from Serge and Dan S. about this. :)
> >
> > Serge suggested that, perhaps, the audit framework could be used.  We
> > might also use an ftrace buffer if we want to keep a whole ton of
> > messages around, too.
> >
> > dmesg is definitely not workable long-term at all.
> 
> How about having place holder objects in the generated checkpoint.
> Then instead of having a failure you have a non-restoreable checkpoint.
> But you know which fd, or which mmaped region, or which other thing
> is causing the problem and if you want more information you can
> look at that resource.
> 
> That gives user space the freedom and scrub out the non-checkpointable
> bits and replace them with something like /dev/null so that we can
> continue on and restore the checkpoint anyway, if we think our
> app can cope with some things going away.
> 
> Eric

I like this idea.

Subystems which are temporarily entirely unsupported (like sysvipc)
would need at least a dummy section in the format wherein we can at
least say 'unsupported', otherwise we'll still just get a meaningless
-EINVAL.

I actually got bitten yesterday by trying to checkpoint a task that
wasn't frozen.  I forgot v14 had that check, and my failures (a
segfault actually) weren't helpful.

-serge


More information about the Containers mailing list