[RFC][v3][PATCH 7/7] Define clone_with_pids syscall

Serge E. Hallyn serue at us.ibm.com
Mon Jun 1 09:42:32 PDT 2009


Quoting Oren Laadan (orenl at cs.columbia.edu):
> The two issues are related, and this is intentional. The idea
> was that when you use CLONE_NEWPID, you imply a new nesting level
> and a pid==1 there. So you are not supposed to specify the pid
> of that new level.
> 
> IOW, the parent specify the pid's of the child from the _parent's_
> level and up (of the desired depth). CLONE_NEWPID creates a new
> pidns level below the parent's, and that is not covered in the
> array of pids.
> 
> By allocation an extra slot and forcing it to be 0, we ensure that
> the case of CLONE_NEWPID is covered correctly. Clearly, if this
> flag isn't set, then the extra slot is redundant (but doesn't hurt).

Ok, I see - I guess I don't mind those semantics.  So:

> >> +	j = knum_pids - unum_pids;
> >         j = 1, so we copy the 3 pids in the right place.
> > 
> >> +	rc = copy_from_user(&target_pids[j], pid_set.target_pids, size);
> >> +	if (rc) {
> >> +		rc = -EFAULT;
> >> +		goto out_free;
> >> +	}
> >> +
> >> +	return target_pids;
> > 
> > 
> > For the second one, we have a parent task
> > 
> > level no       |     pid
> > 0                   5009
> > 1                   1000
> > 2                    49
> > 
> > calling clone_with_pid with CLONE_NEWPID and {1001,50,1} to produce:
> > 
> > level no       |     pid
> > 0                   5010
> > 1                   1001
> > 2                    50
> > 3                    1
> > 
> > So the numbers in your code become:
> > 
> >> +	unum_pids = pid_set.num_pids;
> >         unum_pids = 3
> 
> This is a "bug" of the parent. The parent should specify the pids
> from the parent's level only and up, and not include the new level
> below that will be created. (After all, it will have to be 1).
> 
> So unum_pids = 3 will not do what you want; instead it will try to
> create the process:
> 
> 0	1001
> 1	50
> 2	1
> 3	1
> 
> And will fail, of course, because pid==1 at level 2 is already
> taken.
> 
> Instead, parent should say use {1001, 50}.

Ok, but then we have the task:

level no       |     pid
0                   5009
1                   1000
2                    49

calling clone(CLONE_NEWPID) with unum_pids = 2, so

> 
> >> +	knum_pids = task_pid(current)->level + 1;
> >         knum_pids = 2 + 1 = 3
> > 
> >> +	target_pids = kzalloc((knum_pids + 1) * sizeof(pid_t), GFP_KERNEL);
> > 
> >         target_pids gets room for 4 pids
> > 
> >> +	j = knum_pids - unum_pids;

j = 3 - 2 = 1, so we copy 1001 into pid[1] and
50 into pid[2], with 0 in pid[0] and pid[3]

Looks good.  Thanks for indulging me :)

Acked-by: Serge Hallyn <serue at us.ibm.com>

One last thought - should there be an explicit check to make sure that
if CLONE_NEWPID, then at the end pid[knum_pids+1] = 0?  Or is that
there and I just missed it?

-serge


More information about the Containers mailing list