[RFC v14][PATCH 00/54] Kernel based checkpoint/restart

Louis Rilling Louis.Rilling at kerlabs.com
Tue May 5 01:20:57 PDT 2009


On 04/05/09  8:01 -0500, Serge E. Hallyn wrote:
> Quoting Oren Laadan (orenl at cs.columbia.edu):
> > > I see one drawback with this approach if you allow checkpoint of
> > > application that is not isolated in a container. In that case, you may
> > > want to select which IPC objects to dump to not dump all the IPC objects
> > > living in the system. Indeed, this is why we have chosen in Kerrighed to
> > > checkpoint IPC objects independently of tasks, since we have no
> > > container/namespaces support currently.
> > 
> > I assume that in this case it will be the application itself that
> > will somehow tell the system which specific sysvipc objects (ids) it
> > cares about.
> > 
> > (I'm not sure how would the system otherwise know what to dump and
> > what to leave out).
> > 
> > I originally proposed the construct of cradvise() syscall to handle
> > exactly those cases where the application would like to advise the
> > kernel about certain resources. So, extending the previous example,
> > a task may call something like:
> > 
> >    cradvise(CHECKPOINT_SYSVIPC_SHM, false);  /* generally skip shm */
> >    cradvise(CHECKPOINT_SYSVIPC_SHMID, id, true);  /* but include this */
> > 
> > or:
> >    cradvise(CHECKPOINT_SYSVIPC_SHM, true);  /* generally include shm */
> >    cradvise(CHECKPOINT_SYSVIPC_SHMID, id, false);  /* but skip this */
> > 
> > Anyway, these are just examples of the concept and what sort of generic
> > interface can be used to implement it; don't pick on the details...
> > 
> > Oren.
> 
> Oren, I have to be honest:  I could of course be wrong, but imo there
> is 0 chance of such a bigger-and-uglier-than-ioctl syscall as cradvise
> being accepted upstream.  There may be good uses for it, but I think
> it's worthwhile thinking of ways around it whenever possible.
> 
> In this particular case, wouldn't it be better to do something like:
> 
> 	1. freeze + checkpoint full application + container (== C1)
> 	2. continue application, which does a clone(CLONE_COPYIPC) (*1)
> 	3. application removes all shms except the one to be
> 	checkpointed
> 	4. freeze + checkpoint application again ( == C2)
> 	5. restart applicaiton from C1
> 

Besides COW issues mentioned by Oren in his reply, this approach does not
seem to provide the required flexibility. The point is to avoid checkpointing
some IPC objects together with the application, but we still need those IPC
objects, and the application still uses them. Moreover, on restart the
administrator should be able to first install the required IPC objects, e.g.
re-create them from scratch, or restore them from another checkpoint, and second
restart the application, linking it to the previously
re-created/restored/whatever SHMs.

Thanks,

Louis

-- 
Dr Louis Rilling			Kerlabs
Skype: louis.rilling			Batiment Germanium
Phone: (+33|0) 6 80 89 08 23		80 avenue des Buttes de Coesmes
http://www.kerlabs.com/			35700 Rennes
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: Digital signature
Url : http://lists.linux-foundation.org/pipermail/containers/attachments/20090505/d0aeaf3b/attachment.pgp 


More information about the Containers mailing list