[RFC][PATCH 2/2] CR: handle a single task with private memory maps

Oren Laadan orenl at cs.columbia.edu
Wed Jul 30 15:20:32 PDT 2008



Serge E. Hallyn wrote:
> Quoting Oren Laadan (orenl at cs.columbia.edu):
>> +int do_checkpoint(struct cr_ctx *ctx)
>> +{
>> +	int ret;
>> +
>> +	/* FIX: need to test whether container is checkpointable */
>> +
>> +	ret = cr_write_hdr(ctx);
>> +	if (!ret)
>> +		ret = cr_write_task(ctx, current);
>> +	if (!ret)
>> +		ret = cr_write_tail(ctx);
>> +
>> +	/* on success, return (unique) checkpoint identifier */
>> +	if (!ret)
>> +		ret = ctx->crid;
> 
> Does this crid have a purpose?

yes, at least three; both are for the future, but important to set the
meaning of the return value of the syscall already now. The "crid" is
the CR-identifier that identifies the checkpoint. Every checkpoint is
assigned a unique number (using an atomic counter).

1) if a checkpoint is taken and kept in memory (instead of to a file) then
this will be the identifier with which the restart (or cleanup) would refer
to the (in memory) checkpoint image

2) to reduce downtime of the checkpoint, data will be aggregated on the
checkpoint context, as well as referenced to (cow-ed) pages. This data can
persist between calls to sys_checkpoint(), and the 'crid', again, will be
used to identify the (in-memory-to-be-dumped-to-storage) context.

3) for incremental checkpoint (where a successive checkpoint will only
save what has changed since the previous checkpoint) there will be a need
to identify the previous checkpoints (to be able to know where to take
data from during restart). Again, a 'crid' is handy.

[in fact, for the 3rd use, it will make sense to write that number as
part of the checkpoint image header]

Note that by doing so, a process that checkpoints itself (in its own
context), can use code that is similar to the logic of fork():

	...
	crid = checkpoint(...);
	switch (crid) {
	case -1:
		perror("checkpoint failed");
		break;
	default:
		fprintf(stderr, "checkpoint succeeded, CRID=%d\n", ret);
		/* proceed with execution after checkpoint */
		...
		break;
	case 0:
		fprintf(stderr, "returned after restart\n");
		/* proceed with action required following a restart */
		...
		break;
	}
	...

Oren.



More information about the Containers mailing list