[PATCH 0/6] /proc/pid/checkpointable

Wed Mar 25 05:25:24 PDT 2009

Dave Hansen <dave at linux.vnet.ibm.com> writes:

> On Wed, 2009-03-18 at 13:03 -0700, Mike Waychison wrote:
>> Polluting the dmesg buffer with messages from common failures (consider 
>> a multi-user cluster where checkpoints may or may not succeed) isn't 
>> very useful.
>
> Yeah, I've already gotten an earful from Serge and Dan S. about this. :)
>
> Serge suggested that, perhaps, the audit framework could be used.  We
> might also use an ftrace buffer if we want to keep a whole ton of
> messages around, too.
>
> dmesg is definitely not workable long-term at all.

How about having place holder objects in the generated checkpoint.
Then instead of having a failure you have a non-restoreable checkpoint.
But you know which fd, or which mmaped region, or which other thing
is causing the problem and if you want more information you can
look at that resource.

That gives user space the freedom and scrub out the non-checkpointable
bits and replace them with something like /dev/null so that we can
continue on and restore the checkpoint anyway, if we think our
app can cope with some things going away.

Eric