[RFC][PATCH] ns: Syscalls for better namespace sharing control.

Daniel Lezcano daniel.lezcano at free.fr
Thu Feb 25 14:13:00 PST 2010


Eric W. Biederman wrote:
> Daniel Lezcano <daniel.lezcano at free.fr> writes:
>
>   
>> Eric W. Biederman wrote:
>>     
>>> Introduce two new system calls:
>>> int nsfd(pid_t pid, unsigned long nstype);
>>> int setns(unsigned long nstype, int fd);
>>>
>>> These two new system calls address three specific problems that can
>>> make namespaces hard to work with.
>>> - Namespaces require a dedicated process to pin them in memory.
>>> - It is not possible to use a namespace unless you are the
>>>   child of the original creator.
>>> - Namespaces don't have names that userspace can use to talk
>>>   about them.
>>>
>>> The nsfd() system call returns a file descriptor that can
>>> be used to talk about a specific namespace, and to keep
>>> the specified namespace alive.
>>>
>>> The fd returned by nsfd() can be bind mounted as:
>>> mount --bind /proc/self/fd/N /some/filesystem/path
>>> to keep the namespace alive indefinitely as long as
>>> it is mounted.
>>>
>>> open works on the fd returned by nsfd() so another
>>> process can get a hold of it and do interesting things.
>>>
>>> Overall that allows for persistent naming of namespaces
>>> according to userspace policy.
>>>
>>> setns() allows changing the namespace of the current process
>>> to a namespace that originates with nsfd().
>>>
>>> Signed-off-by: Eric W. Biederman <ebiederm at xmission.com>
>>> ---
>>>   
>>>       
>> Is it planned to support all the namespaces for 'nsfd' ?
>> I mean will it be possible to specify an Or'ed combination of nstype to grab a
>> reference for several namespaces at a time of the targeted process ?
>>
>> for example : nsfd( 1234, NSTYPE_NET | NSTYPE_IPC, NSTYPE_MNT)
>>     
>
> No, the plan is only one namespace at a time.
>
> It would not be much of a change to support multiple namespaces,
> but I don't think I want to go there.  Bitmaps filling up are
> ugly and I don't see what would be gained.
>   
The idea I had in mind when I asked this question was if we can "move" a 
process inside a container, aka a set of namespaces :)
> I does make sense to support all of the namespaces we can support
> with unshare, but with nstype as an enumeration not as a bitmap.
>   
I suppose when you say "to support all of the namespaces we can support 
with *unshare*", you exclude the pid namespace which is created only 
with clone, right ? Do you think we can extend the concept to all the 
namespaces including the pid_namespace ?

> This is slightly better than the earlier version that used a netlink
> socket as the reference as I can give it the semantics of a deleted
> file and only when that file goes away drop the reference on the
> namespace.  It is also better in that this interface can support all
> of the namespaces, without adding yet another syscall.
>   
I like the idea :)



More information about the Containers mailing list