[RFC][PATCH 2/2] CR: handle a single task with private memory maps

Serge E. Hallyn serue at us.ibm.com
Thu Jul 31 14:25:35 PDT 2008


Quoting Oren Laadan (orenl at cs.columbia.edu):
>
>
> Serge E. Hallyn wrote:
>> Quoting Oren Laadan (orenl at cs.columbia.edu):
>>> +int do_checkpoint(struct cr_ctx *ctx)
>>> +{
>>> +	int ret;
>>> +
>>> +	/* FIX: need to test whether container is checkpointable */
>>> +
>>> +	ret = cr_write_hdr(ctx);
>>> +	if (!ret)
>>> +		ret = cr_write_task(ctx, current);
>>> +	if (!ret)
>>> +		ret = cr_write_tail(ctx);
>>> +
>>> +	/* on success, return (unique) checkpoint identifier */
>>> +	if (!ret)
>>> +		ret = ctx->crid;
>>
>> Does this crid have a purpose?
>
> yes, at least three; both are for the future, but important to set the
> meaning of the return value of the syscall already now. The "crid" is
> the CR-identifier that identifies the checkpoint. Every checkpoint is
> assigned a unique number (using an atomic counter).
>
> 1) if a checkpoint is taken and kept in memory (instead of to a file) then
> this will be the identifier with which the restart (or cleanup) would refer
> to the (in memory) checkpoint image
>
> 2) to reduce downtime of the checkpoint, data will be aggregated on the
> checkpoint context, as well as referenced to (cow-ed) pages. This data can
> persist between calls to sys_checkpoint(), and the 'crid', again, will be
> used to identify the (in-memory-to-be-dumped-to-storage) context.
>
> 3) for incremental checkpoint (where a successive checkpoint will only
> save what has changed since the previous checkpoint) there will be a need
> to identify the previous checkpoints (to be able to know where to take
> data from during restart). Again, a 'crid' is handy.
>
> [in fact, for the 3rd use, it will make sense to write that number as
> part of the checkpoint image header]
>
> Note that by doing so, a process that checkpoints itself (in its own
> context), can use code that is similar to the logic of fork():
>
> 	...
> 	crid = checkpoint(...);
> 	switch (crid) {
> 	case -1:
> 		perror("checkpoint failed");
> 		break;
> 	default:
> 		fprintf(stderr, "checkpoint succeeded, CRID=%d\n", ret);
> 		/* proceed with execution after checkpoint */
> 		...
> 		break;
> 	case 0:
> 		fprintf(stderr, "returned after restart\n");
> 		/* proceed with action required following a restart */
> 		...
> 		break;
> 	}
> 	...

Thanks - for this and the later explanations in replies to Louis.

Really I had no doubt it had a purpose :)  but wasn't sure what it was.
Quite clear now.  Thanks.

-serge


More information about the Containers mailing list