[REVIEW][PATCH 3/3] vfs: Fix a regression in mounting proc

Andy Lutomirski luto at amacapital.net
Wed Nov 27 20:41:06 UTC 2013


On Wed, Nov 27, 2013 at 12:07 PM, Eric W. Biederman
<ebiederm at xmission.com> wrote:
> ebiederm at xmission.com (Eric W. Biederman) writes:
>
>> Oleg Nesterov <oleg at redhat.com> writes:
>>
>>> Just to avoid the possible confusion, let me repeat that the fix itsef
>>> looks "obviously fine" to me, "i_nlink != 2" looks obviously wrong.
>>>
>>> I am not arguing with this patch, I am just trying to understand this
>>> logic.
>>>
>>> On 11/27, Eric W. Biederman wrote:
>>>>
>>>> [... snip ...]
>>>
>>> Thanks a lot.
>>>
>>>> For the real concern about jail environments where proc and sysfs are
>>>> not mounted at all a fs_visible check is all that is really required,
>>>
>>> this is what I can't understand...
>>>
>>> Lets ignore the implementation details. Suppose that proc was never
>>> mounted. Then "mount -t proc" should fail after CLONE_NEWUSER | NEWNS?
>>
>> Yes.
>
> Well strictly speaking it should fail after CLONE_NEWUSER | NEWNS | NEWPID.
> If proc was never mounted.
>
> Fresh mounts of proc are not allowed unless you have also created the
> pid namespace.  With just CLONE_NEWUSER | NEWNS you are limited to bind
> mounts.
>
> Has this cleared up the confusion?
>
> Eric
>

This is all obnoxiously complicated.  I wonder if we can do (a lot)
better by allowing a "pid-only" variant of proc to be mounted.  It
should contain:

 - All the pid directories
 - /proc/self, /proc/net, and /proc/mounts (but possibly not
/proc/PID/net -- that's a weird interface IMO and isn't really related
to the pid)
 - keys key-users (wtf is up with that interface, though -- those
files are way too magical)
 - cpuinfo, version, and maybe other informational things (crypto?)
 - loadavg, perhaps

I wonder it would be possible to boot a reasonable container with a
heavily limited /proc like that.

--Andy


More information about the Containers mailing list