[RFC][PATCH 2/2] CR: handle a single task with private memory maps

Louis Rilling Louis.Rilling at kerlabs.com
Thu Jul 31 07:08:44 PDT 2008


On Wed, Jul 30, 2008 at 02:27:52PM -0400, Oren Laadan wrote:
> 
> 
> Louis Rilling wrote:
> >> +/**
> >> + * cr_vma_fill_pgarr - fill a page-array with addr/page tuples for a vma
> >> + * @ctx - checkpoint context
> >> + * @pgarr - page-array to fill
> >> + * @vma - vma to scan
> >> + * @start - start address (updated)
> >> + */
> >> +static int cr_vma_fill_pgarr(struct cr_ctx *ctx, struct cr_pgarr *pgarr,
> >> +			     struct vm_area_struct *vma, unsigned long *start)
> >> +{
> >> +	unsigned long end = vma->vm_end;
> >> +	unsigned long addr = *start;
> >> +	struct page **pagep;
> >> +	unsigned long *addrp;
> >> +	int cow, nr, ret = 0;
> >> +
> >> +	nr = pgarr->nleft;
> >> +	pagep = &pgarr->pages[pgarr->nused];
> >> +	addrp = &pgarr->addrs[pgarr->nused];
> >> +	cow = !!vma->vm_file;
> >> +
> >> +	while (addr < end) {
> >> +		struct page *page;
> >> +
> >> +		/* simplified version of get_user_pages(): already have vma,
> >> +		* only need FOLL_TOUCH, and (for now) ignore fault stats */
> >> +
> >> +		cond_resched();
> >> +		while (!(page = follow_page(vma, addr, FOLL_TOUCH))) {
> >> +			ret = handle_mm_fault(vma->vm_mm, vma, addr, 0);
> >> +			if (ret & VM_FAULT_ERROR) {
> >> +				if (ret & VM_FAULT_OOM)
> >> +					ret = -ENOMEM;
> >> +				else if (ret & VM_FAULT_SIGBUS)
> >> +					ret = -EFAULT;
> >> +				else
> >> +					BUG();
> >> +				break;
> >> +			}
> >> +			cond_resched();
> >> +		}
> > 
> > I guess that 'ret' should be checked somewhere after this loop.
> 
> yes; this is where a "break(2)" construct in C would come handy :)

Alternatively, putting the inner loop in a separate function often helps to
handle errors in a cleaner way.

> 
> > 
> >> +
> >> +		if (IS_ERR(page)) {
> >> +			ret = PTR_ERR(page);
> >> +			break;
> >> +		}
> >> +
> >> +		if (page == ZERO_PAGE(0))
> >> +			page = NULL;	/* zero page: ignore */
> >> +		else if (cow && page_mapping(page) != NULL)
> >> +			page = NULL;	/* clean cow: ignore */
> >> +		else {
> >> +			get_page(page);
> >> +			*(addrp++) = addr;
> >> +			*(pagep++) = page;
> >> +			if (--nr == 0) {
> >> +				addr += PAGE_SIZE;
> >> +				break;
> >> +			}
> >> +		}
> >> +
> >> +		addr += PAGE_SIZE;
> >> +	}
> >> +
> >> +	if (unlikely(ret < 0)) {
> >> +		nr = pgarr->nleft - nr;
> >> +		while (nr--)
> >> +			page_cache_release(*(--pagep));
> >> +		return ret;
> >> +	}
> >> +
> >> +	*start = addr;
> >> +	return (pgarr->nleft - nr);
> >> +}
> >> +


> >> +int cr_write_mm(struct cr_ctx *ctx, struct task_struct *t)
> >> +{
> >> +	struct cr_hdr h;
> >> +	struct cr_hdr_mm *hh = ctx->tbuf;
> >> +	struct mm_struct *mm;
> >> +	struct vm_area_struct *vma;
> >> +	int ret;
> >> +
> >> +	h.type = CR_HDR_MM;
> >> +	h.len = sizeof(*hh);
> >> +	h.id = ctx->pid;
> >> +
> >> +	mm = get_task_mm(t);
> >> +
> >> +	hh->tag = 1;	/* non-zero will mean first time encounter */
> >> +
> >> +	hh->start_code = mm->start_code;
> >> +	hh->end_code = mm->end_code;
> >> +	hh->start_data = mm->start_data;
> >> +	hh->end_data = mm->end_data;
> >> +	hh->start_brk = mm->start_brk;
> >> +	hh->brk = mm->brk;
> >> +	hh->start_stack = mm->start_stack;
> >> +	hh->arg_start = mm->arg_start;
> >> +	hh->arg_end = mm->arg_end;
> >> +	hh->env_start = mm->env_start;
> >> +	hh->env_end = mm->env_end;
> >> +
> >> +	hh->map_count = mm->map_count;
> > 
> > Some fields above should also be protected with mmap_sem, like ->brk,
> > ->map_count, and possibly others (I'm not a memory expert though).
> 
> true; keep in mind, though, that the container will be frozen during
> this time, so nothing should change at all. The only exception would
> be if, for instance, someone is killing the container while we save
> its state.

Sure. So you think that taking mm->mmap_sem below is useless? I tend to believe
so, since no other task should share this mm_struct at this time, and we could
state that ptrace should not interfere during restart. However, I'm never
confident when ptrace considerations come in...

> 
> > 
> >> +
> >> +	/* FIX: need also mm->flags */
> >> +
> >> +	ret = cr_write_obj(ctx, &h, hh);
> >> +	if (ret < 0)
> >> +		goto out;
> >> +
> >> +	/* write the vma's */
> >> +	down_read(&mm->mmap_sem);
> >> +	for (vma = mm->mmap; vma; vma = vma->vm_next) {
> >> +		if ((ret = cr_write_vma(ctx, vma)) < 0)
> >> +			break;
> >> +	}
> >> +	up_read(&mm->mmap_sem);
> >> +
> >> +	if (ret < 0)
> >> +		goto out;
> >> +
> >> +	ret = cr_write_mm_context(ctx, mm);
> >> +
> >> + out:
> >> +	mmput(mm);
> >> +	return ret;
> >> +}
> > 

Thanks,

Louis

-- 
Dr Louis Rilling			Kerlabs
Skype: louis.rilling			Batiment Germanium
Phone: (+33|0) 6 80 89 08 23		80 avenue des Buttes de Coesmes
http://www.kerlabs.com/			35700 Rennes


More information about the Containers mailing list