C/R without "leaks" (was: Re: Creating tasks on restart: userspace vs kernel)

Alexey Dobriyan adobriyan at gmail.com
Wed Apr 15 12:56:29 PDT 2009


> Again, so to checkpoint one task in the topmost pid-ns you need to
> checkpoint (if at all possible) the entire system ?!

One more argument to not allow "leaks" and checkpoint whole container,
no ifs, buts and woulditbenices.

Just to clarify, C/R with "leak" is for example when process has separate
pidns, but shares, for example, netns with other process not involved in
checkpoint.

If you allow this, you lose one important property of checkpoint part,
namely, almost everything is frozen. Losing this property means suddenly
much more stuff is alive during dump and you has to account to more stuff
when checkpointing. You effectively checkpointing on live data structures
and there is no guarantee you'll get it right.

Example 1: utsns is shared with the rest of the world.

utsns content is modifiable only by tasks (current->nsproxy->uts_ns).
Consequently, someone can modify utsns content while you're dumping it
if you allow "leaks".

Did you take precautions? Where?

	static int cr_write_utsns(struct cr_ctx *ctx, struct uts_namespace *uts_ns)
	{
	        struct cr_hdr h;
	        struct cr_hdr_utsns *hh;
	        int domainname_len;
	        int nodename_len;
	        int ret;

	        h.type = CR_HDR_UTSNS;
	        h.len = sizeof(*hh);

	        hh = cr_hbuf_get(ctx, sizeof(*hh));
	        if (!hh)
	                return -ENOMEM;

	        nodename_len = strlen(uts_ns->name.nodename) + 1;
	        domainname_len = strlen(uts_ns->name.domainname) + 1;

	        hh->nodename_len = nodename_len;
	        hh->domainname_len = domainname_len;

	        ret = cr_write_obj(ctx, &h, hh);
	        cr_hbuf_put(ctx, sizeof(*hh));
	        if (ret < 0)
	                return ret;

	        ret = cr_write_string(ctx, uts_ns->name.nodename, nodename_len);
	        if (ret < 0)
	                return ret;

	        ret = cr_write_string(ctx, uts_ns->name.domainname, domainname_len);
	        return ret;
	}

You should take uts_sem.


Example 2: ipcns is shared with the rest of the world

Consequently, shm segment is visible outside and live. Someone already
shmatted to it. What will end up in shm segment content? Anything.

You should check struct file refcount or something and disable attaching
while dumping or something.


Moral: Every time you do dump on something live you get complications.
Every single time.


There are sockets and live netns as the most complex example. I'm not
prepared to describe it exactly, but people wishing to do C/R with
"leaks" should be very careful with their wishes.


More information about the Containers mailing list