[PATCH 0/5 RFC] Add an interface to discover relationships between namespaces

Eric W. Biederman ebiederm at xmission.com
Mon Jul 25 14:59:43 UTC 2016


"Michael Kerrisk (man-pages)" <mtk.manpages at gmail.com> writes:

> Hi Eric,
>
> On 07/25/2016 03:18 PM, Eric W. Biederman wrote:
>> "Michael Kerrisk (man-pages)" <mtk.manpages at gmail.com> writes:
>>
>>> Hi Andrey,
>>>
>>> On 07/22/2016 08:25 PM, Andrey Vagin wrote:
>>>> On Thu, Jul 21, 2016 at 11:48 PM, Michael Kerrisk (man-pages)
>>>> <mtk.manpages at gmail.com> wrote:
>>>>> Hi Andrey,
>>>>>
>>>>>
>>>>> On 07/21/2016 11:06 PM, Andrew Vagin wrote:
>>>>>>
[snip]
>>>>>> where ioctl_type is one of the following:
>>>>>>
>>>>>> NS_GET_USERNS
>>>>>>       Returns a file descriptor that refers to an owning  user  names‐
>>>>>>       pace.
>>>>>>
>>>>>> NS_GET_PARENT
>>>>>>       Returns  a  file  descriptor  that refers to a parent namespace.
>>>>>>       This ioctl(2) can be used for pid and user namespaces. For  user
>>>>>>       namespaces,  NS_GET_PARENT and NS_GET_USERNS have the same mean‐
>>>>>>       ing.
>>>
>>> For each of the above, I think it is worth mentioning that the
>>> close-on-exec flag is set for the returned file descriptor.
>>
>> Hmm.  That is an odd default.
>
> Why do you say that? It's pretty common as the default for various
> APIs that create new FDs these days. (There's of course a strong argument
> that the original UNIX default was a design blunder...)

Interesting.  I haven't kept up on that, but it seems reasonable.

[snip]
>>> So, from my point of view, the important piece that was missing from
>>> your commit message was the note to use readlink("/proc/self/fd/%d")
>>> on the returned FDs. I think that detail needs to be part of the
>>> commit message (and also the man page text). I think it even be
>>> helpful to include the above program as part of the commit message:
>>> it helps people more quickly grasp the API.
>>
>> Please, please make the standard way to compare these things fstat.
>> That is much less magic than a symlink, and a little more future proof.
>> Possibly even kcmp.
>
> As in fstat() to get the st_ino field, right?

Both the st_ino and st_dev fields.

The most likely change to support checkpoint/restart in the future is to
preserve st_ino across migrations and instantiate a different instance
of nsfs to hold the inode numbers from the previous machine.

We would need to handle the preservation carefully or else there is
a chance that two namespace file descriptors (collected from different
sources) with different st_dev and st_ino fields may actuall refer to
the same object.

Which is a long way of saying we have the st_dev field please use it,
it may matter at some point.

Eric


More information about the Containers mailing list