[RFC][PATCH 00/11] track files for checkpointability
Serge E. Hallyn
serue at us.ibm.com
Fri Mar 6 10:24:51 PST 2009
Quoting Dave Hansen (dave at linux.vnet.ibm.com):
> On Fri, 2009-03-06 at 10:23 -0600, Serge E. Hallyn wrote:
> > Which imo is fine, but my question is whether that leaves any actual
> > value in the persistent per-resource uncheckpointable flag.
> OK, let's take a look back at this discussion a little bit and how we
> got here.
> Ingo quotes:
> > Yeah, per resource it should be. That's per task in the normal
> > case - except for threaded workloads where it's shared by
> > threads.
> > Uncheckpointable should be a one-way flag anyway. We want this
> > to become usable, so uncheckpointable functionality should be as
> > painful as possible, to make sure it's getting fixed ...
> > Is there any automated test that could discover C/R breakage via
> > brute force? All that matters in such cases is to get the "you
> > broke stuff" information as soon as possible. If it comes at an
> > early stage developers can generally just fix stuff.
> You add these things together and you get what I posted. My patch is:
> 1. per resource
> 2. has a one way flag
> 3. Gives messages to developers at an early stage (dmesg) and lets them
> explore it more thoroughly (/proc)
> But, these "early stage" messages are completely opposed to an approach
> that uses sys_checkpoint() in some form (like with a -1 fd as an
Well I disagree with that. The 'early stage' messages could be seen as
1. a short-term way to prioritize resources to support
2. a long-term way to catch new resources introduced
without checkpoint/restart support
I don't believe 2. would work. I think 1. would work, but that we
risk imposing permanent code changes to support a temporary goal.
In contrast, the sys_checkpoint() check will always be needed to
check whether a particular application is checkpointable. For
instance a task will never be checkpointable if it shares a mm-struct
with a task not being checkpointed.
> Think of it like lockdep. We *could* have designed lockdep to simply
> give us a nice message whenever we do an a/b b/a deadlock. That would
> be helpful. Or, we could design it to record all lock acquisitions that
> didn't deadlock to see if they ever possibly deadlock. (We did the
> second one, btw). That gave an early, useful, warning that developers
> could fix before we encounter an actual problem. I'm advocating such a
> mechanism for c/r.
If you can convince me that it'll do that you'll have me on board :)
More information about the Containers