[CFT][PATCH 00/10] Making new mounts of proc and sysfs as safe as bind mounts (take 2)

Kenton Varda kenton at sandstorm.io
Thu May 28 18:20:07 UTC 2015

On Thu, May 28, 2015 at 10:33 AM, Andy Lutomirski <luto at amacapital.net> wrote:
> On Thu, May 28, 2015 at 8:03 AM, Eric W. Biederman
> <ebiederm at xmission.com> wrote:
>> Serge Hallyn <serge.hallyn at ubuntu.com> writes:
>>> Quoting Andy Lutomirski (luto at amacapital.net):
>>>> On Fri, May 22, 2015 at 10:39 AM, Eric W. Biederman
>>>> <ebiederm at xmission.com> wrote:
>>>> > I had hoped to get some Tested-By's on that patch series.
>>>> Sorry, I've been totally swamped.
>>>> I suspect that Sandstorm is okay, but I haven't had a chance to test
>>>> it for real.  Sandstorm makes only limited use of proc and sysfs in
>>>> containers, but I'll see if I can test it for real this weekend.
>>> Testing this with unprivileged containers, I get
>>> lxc-start: conf.c: lxc_mount_auto_mounts: 808 Operation not permitted
>>> - error mounting sysfs on
>>> /usr/lib/x86_64-linux-gnu/lxc/sys/devices/virtual/net flags 0
>> Grr..  I was afraid this would break something. :(
>> Looking at my system I see that sysfs is currently mounted
>> "nosuid,nodev,noexec"
>> Looking at the lxc-start code I don't see it as including any of those
>> mount options.  In practice for sysfs I think those options are
>> meaningless (as there should be no devices and nothing executable in
>> sysfs) but I can understand the past concerns with chmod on virtual
>> filesystems that would incline people to use them, so I think the
>> failure is reporting a legitimate security issue in the lxc userspace
>> code where the the unprivileged code is currently attempting to give
>> greater access to sysfs than was given by the original mount of sysfs.
>> As nosuid,nodev,noexec should not impair the operation of sysfs
>> operation it looks like you can always specify those options and just
>> make this concern go away.
> Linus is pretty strict about not breaking the ABI, and this definitely
> counts as breaking the ABI.  There's an exception for security issues,
> but is there really a security issue here?  That is, do we lose
> anything important if we just drop the offending part of the patch
> set?  As you've said, there shouldn't be sensitive device nodes,
> executables, or setuid files in proc or sysfs in the first place.

Speaking as a user of the mount() interfaces, I really think it would
be less confusing overall if mount() simply ignored the requested
flags when the caller doesn't have a choice. That is, in cases where
mount() currently fails with EPERM when not given, say, MS_NOSUID, it
should instead just pretend the caller actually set MS_NOSUID and go
ahead with a nosuid mount. Or put another way, the absence of
MS_NOSUID should not be interpreted as "remove the nosuid bit" but
rather "don't set the nosuid bit if not required".


- This approach will actually cause lxc to have the correct behavior,
without any changes to lxc. I suspect that this generalizes: In the
vast majority of cases, when users have failed to set MS_NOSUID, it's
not because they are explicitly requesting that the flag be turned
off, but rather that they didn't know it mattered.

- If a user actually *does* expect not passing MS_NOSUID to remove the
nosuid bit, and they find instead that the nosuid bit is silently
kept, I don't think they'll be confused: it's pretty obvious in
context that this must be for security reasons.

- On the other hand, the current behavior *is* very confusing: mount()
returns EPERM because of rules the caller probably doesn't know
anything about. I've spent a fair amount of time frustrated by this
sort of thing.


More information about the Containers mailing list