[PATCH] fs: Remove implicit nodev for new mounts in non-root userns
luto at amacapital.net
Fri Aug 15 19:16:47 UTC 2014
On Fri, Aug 15, 2014 at 12:05 PM, Serge Hallyn <serge.hallyn at ubuntu.com> wrote:
> Quoting Andy Lutomirski (luto at amacapital.net):
>> Currently, creating a new mount (as opposed to bindmount) in a
>> non-root userns will implicitly set nodev unless the fs is devpts.
>> Something like this will be necessary for file systems that allow
>> the mounter to create device nodes without using mknod (e.g. FUSE
>> if/when that is allowed), but none of the currently allowed
>> filesystems do this.
> Sorry, I'm probably thinking stupidly, but I don't see this restriction
> being the case
> serge at sl:~$ mount | grep tmp
> tmpfs on /run type tmpfs (rw,noexec,nosuid,size=10%,mode=0755)
> serge at sl:~$ sudo mknod /run/kvm c 10 232
> [sudo] password for serge:
> serge at sl:~$ echo $?
> serge at sl:~$ ls -l /run/kvm
> crw-r--r-- 1 root root 10, 232 Aug 15 14:04 /run/kvm
> But you seem to be saying I shouldn't be allowed to create a device inside
> a tmpfs. What am I overlooking?
I assume you're in the root userns. This patch is unnecessary, and
has no effect, if you're in the root userns.
The code in Sandstorm that's currently broken in Linus' tree runs in a
new userns with a matching mount ns. It does (copied verbatim):
KJ_SYSCALL(mount("sandstorm-dev", "dev", "tmpfs", MS_NOSUID | MS_NOEXEC,
makeCharDeviceNode("null", "null", 1, 3);
makeCharDeviceNode("zero", "zero", 1, 5);
makeCharDeviceNode("random", "urandom", 1, 9);
makeCharDeviceNode("urandom", "urandom", 1, 9);
KJ_SYSCALL(mount("dev", "dev", nullptr,
MS_REMOUNT | MS_BIND | MS_NOSUID | MS_NOEXEC |
makeCharDeviceNode is a helper that creates an empty file and mounts a
device node over it. This code needs the fs to be read/write, but
Sandstorm wants to make /dev read-only when it's done.
In Linus' tree, the remount fails with -EPERM because the mount is
secretly nodev. It was always secretly nodev, but no one noticed
because of CVE-2014-5207, which caused that remount to succeed.
(Yay for programs that inadvertently exploited a serious security
vulnerability for their normal function.)
More information about the Containers