[PATCH] [RFC] c/r: Add UTS support
Eric W. Biederman
ebiederm at xmission.com
Fri Mar 20 16:26:42 PDT 2009
"Serge E. Hallyn" <serue at us.ibm.com> writes:
> Quoting Eric W. Biederman (ebiederm at xmission.com):
>> Ok. I see what you are trying to accomplish with this and honestly I
>> think it is silly.
>> We should start the threads we need in the kernel, and if we need to
>> run clone_pid fine. I am not comfortable exporting clone_with_pid to
>> user space.
> Even if we create the task tree in userspace, I don't see why we
> can't have the parent of each nested pid_ns pass CLONE_NEWPID to
> clone_with_pid() instead of doing clone first and then unsharing
> the pidns?
> As for clone_with_pid(), I don't particularly like the semantics,
> but as was discussed over IRC, we could have clone_with_pid()
> return -EINVAL unless it is called while it is called from a task
> inside a restarting container. (and -EPERM if setting a pid in
> a pid_ns which was not created as part of the container) Eric
> do you dislike that any less?
>> As for the implementation of allocating a struct pid with a certain
>> set of pid values. I expect we can do that easily enough by
>> refactoring the pid allocator to be passed in the min/max pid to
>> allocate from, and have a special case that passes in a different set
>> of min/max values so we can allocate just the pid we need.
> What is wrong with Alexey's patch, which simply passes in the values
> themselves? Do you have another use in mind for the min/max pid
At an implementation level (and I need to look at Alexey's specific patch)
every patch I have seen to date creates their own version of alloc_pidmap.
alloc_pidmap already implicitly takes min/max and first value to try
as parameters. RESERVED_PIDS, pid_max, and pid_ns->last_pid. So
instead of rewriting alloc_pidmap we should just be able to refactor
alloc_pidmap to take the requisite values. That should be less code
and easier to maintain.
Looking at the current implementation we also have the issue that
pid_max is not per pid namespace. Where it seems to belong.
>> If the primary use for a userspace interface is restart I feel we are
>> doing it wrong.
> I think that's a good guideline, bad rule. Certainly possible
> that you're right that this is just pointing to in-kernel
> recreation of process tree as the way to go. I was getting
> that feeling myself, but then there are still very good reasons
> not to do that, as there are things which each task should do
> before completing sys_restart() which are best done in userspace.
> These include for instance creating virtual nics, and calling
> Oren's suggested 'cr_advise()' system calls.
You might be right. I am behind on that part of the conversation.
My general concern is that dividing up the responsibilities between user space
and kernel space seems harder to maintain, and refactor if we don't get something
right the first time.
More information about the Containers