[CFT][PATCH 0/10] Making new mounts of proc and sysfs as safe as bind mounts

Andy Lutomirski luto at amacapital.net
Fri May 15 06:26:48 UTC 2015

On Thu, May 14, 2015 at 2:10 PM, Eric W. Biederman
<ebiederm at xmission.com> wrote:
> Greg Kroah-Hartman <gregkh at linuxfoundation.org> writes:
>> On Thu, May 14, 2015 at 12:30:45PM -0500, Eric W. Biederman wrote:
>>> The code is currently available at:
>>>    git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace.git for-testing
>>>    HEAD: a524faf520600968e58bbc732063fccf2fdf9199 mnt: Update fs_fully_visible to test for permanently empty directories
>>> The problem:  Mounting a new instance of proc of sysfs can allow things
>>> that a bind mount of those filesystems would not.
>>> That is the cases I am dealing with are:
>>>      unshare --user --net --mount ; mount -t sysfs ...
>>>      unshare --user --pid --mount ; mount -t proc ...
>>> The big change is that this set of changes enforces the preservation of
>>> locked mount flags, from the existing mount to the current mount.  Which
>>> means that if proc was mounted read-only the current current will allow
>>> a new instance of proc to be mounted read-write, and this set of changes
>>> enforces that proc remain read-only.
>>> The other gotcha is that the current code does not properly detect empty
>>> directories so to prevent things slipping through the cracks this set of
>>> changes annotates all mount points where nothing will be revealed if
>>> the filesystem mounted on top is removed.
>>> Enforcing the administrators policy can actually matter in the real
>>> world as has been shown by the recent docker issue.
>>> With this patchset I have two concerns:
>>> - The enforcement of mount flag preservation on proc and sysfs may break
>>>   things.  (I am especially worried about the implicit adding of nodev).
>> What do you mean by this?  What got added?
> In a user namespace mounting a filesystem implicitly adds nodev.
> When I started enforcing not clearing bits that root had set on a
> filesystem in mount -o remount the implicit nodev wound up being
> an issue that broke userspace for no good reason.  The fix was
> to implicitly add nodev in remount as well.
> Taking a second look at this nodev is implicitly added before the
> fs_fully_visible check so even for applications that are know how the
> original proc was mounted (and don't see an implicit nodev) and that
> don't add nodev (because they ''know'' the mount flags) this change
> should not be a problem.  Hooray!  One less scary thing.

Can we please just get rid of this implicit nodev thing once and for all?  If it
breaks some really weird /proc use case, then I think the right fix is to
stop enforcing the nodev lock for the proc fully visible check.  After
all, /proc doesn't contain useful device nodes anyway.

Other than that, the code here looks okay to me on brief inspection.


More information about the Containers mailing list