[REVIEW][PATCH 3/3] vfs: Fix a regression in mounting proc

Eric W. Biederman ebiederm at xmission.com
Wed Nov 27 18:51:40 UTC 2013


Oleg Nesterov <oleg at redhat.com> writes:

> To all: sorry for noise, I can't comment this patch.
>
>
> But Eric, could you please help me to understand? I am totally confused.
>
> So, afaics, initially (even after MS_KERNMOUNT) fs_fully_visible("proc")
> should return false.
>
> After the normal "mout -t proc none /proc/" it becomes true.
>
> And it is still true after, say, "mount -t ramfs none /proc/sys" because
> "ls -ld /proc/sys" shows ->i_nlink == 1.
>
> However, say, "mount -t ramfs none /proc/tty/" should make
> fs_fully_visible() == F, because in this case ->i_nlink == 4.
>
> Correct?
>
> If yes, could you explain what this "!CAP_SYS_ADMIN && !fs_fully_visible"
> check actually tries to prevent and why?

Right now a more accurate name would be fs_visible.

The primary goal is to restrict the user namespace root to being able to
mount proc and sysfs if nothing new is revealed, preserving the
restrictions in jail type environments where proc and sysfs are not
mounted at all.

I would have dropped the ability for the userns root to create a fresh
mount of proc and syfs, and required a bind mount instead except
because of the namespace intertwining we need fresh mounts of proc and
sys after new namespaces are created.

What I really would love to have implemented would be a check that
nothing is mounted on top of proc of sysfs but that fails because we
have /sys/fs/.. and /proc/sys/fs_binfmt_misc.  So I had to implement
an empty directory check.

And when I first implemented that empty directory check I had a complete
and total brain fart.

So what is really implemented at the moment would be more accurately called
fs_visible().

For the real concern about jail environments where proc and sysfs are
not mounted at all a fs_visible check is all that is really required,
and if you look at 3.11? I think it was that is what was shipped.

Unfortunately I cleaned things up and totally had a massive brain fart
and failed to implement the is this an empty directory check properly
and somehow failed to test mounting /proc in a real world environment
and so caused a significant regression in 3.12 leading to a case where
proc is not mountable.  This patch just fixes that stupid regression.
As the overmount protection in general is not needed as to my knolwedge
no one overmounts proc or sysfs in a meaningful way.

Now it would be very good to go back and fix fs_fully_visible to either
be fs_visible and not care or to read overmounted directories and verify
that they are actually empty.  Reading directories at this point
requires opening a file and probably playing with locks.  Something I am
not prepared to do for a trivial bug fix.  Especially since getting it
right only closes theoretical holes at the moment.

Does that clarify things?

Eric


More information about the Containers mailing list