[PATCH] [RFC] c/r: Add UTS support

Eric W. Biederman ebiederm at xmission.com
Fri Mar 20 20:39:41 PDT 2009


"Serge E. Hallyn" <serge at hallyn.com> writes:

> Quoting Eric W. Biederman (ebiederm at xmission.com):
>> > What is wrong with Alexey's patch, which simply passes in the values
>> > themselves?  Do you have another use in mind for the min/max pid
>> > values?
>> 
>> At an implementation level (and I need to look at Alexey's specific patch)
>> every patch I have seen to date creates their own version of alloc_pidmap.
>
> You're right, Alexey's patch creates a new one.
>
>> alloc_pidmap already implicitly takes min/max and first value to try
>> as parameters.  RESERVED_PIDS, pid_max, and pid_ns->last_pid.  So
>> instead of rewriting alloc_pidmap we should just be able to refactor
>> alloc_pidmap to take the requisite values.  That should be less code
>> and easier to maintain.
>
> Yeah, that sounds good actually.  Thanks.
>
>> Looking at the current implementation we also have the issue that
>> pid_max is not per pid namespace.  Where it seems to belong.
>
> Eh.  It does seem to, but otoh why give userspace knobs it has no use
> for...  Or, can you think of a case where it'd be useful?

In general the number of usable pid numbers should be larger in the outer
pid namespace than in the child pid namespace.  Otherwise it is possible
for the child to eat all of the possible pid numbers.

So I think it would be advantageous for to make containers designed to migrate
to have a small pid_max by default so we know we won't overwhelm others.

Furthermore since pid_max is a limit on the identifiers allocated no on the
number of processes it is very much a pid namespace property.

>> > I think that's a good guideline, bad rule.  Certainly possible
>> > that you're right that this is just pointing to in-kernel
>> > recreation of process tree as the way to go.  I was getting
>> > that feeling myself, but then there are still very good reasons
>> > not to do that, as there are things which each task should do
>> > before completing sys_restart() which are best done in userspace.
>> > These include for instance creating virtual nics, and calling
>> > Oren's suggested 'cr_advise()' system calls.
>> 
>> You might be right.   I am behind on that part of the conversation.
>> 
>> My general concern is that dividing up the responsibilities between user space
>> and kernel space seems harder to maintain, and refactor if we don't get something
>> right the first time.
>
> So far we're actually still at the point where the code (Oren's set)
> could go either way.  A small patch from Alexey can make it swing toward
> kernel, while Oren's mktree.c userspace restart program swings the other
> way.
>
> And since we're punting on any nested namespaces it actually may stay that way
> for awhile.

Interesting.  That sounds fairly fundamental.  If I have some free time I will
have to take a look.  I'm in favor of a kernel/user space cooperation but I don't
currently see the benefit of fork processes in user space.

Eric


More information about the Containers mailing list