[CFT][PATCH 0/10] Making new mounts of proc and sysfs as safe as bind mounts

Greg Kroah-Hartman gregkh at linuxfoundation.org
Thu May 14 20:29:51 UTC 2015

On Thu, May 14, 2015 at 12:30:45PM -0500, Eric W. Biederman wrote:
> The code is currently available at:
>    git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace.git for-testing
>    HEAD: a524faf520600968e58bbc732063fccf2fdf9199 mnt: Update fs_fully_visible to test for permanently empty directories
> The problem:  Mounting a new instance of proc of sysfs can allow things
> that a bind mount of those filesystems would not.
> That is the cases I am dealing with are:
>      unshare --user --net --mount ; mount -t sysfs ...
>      unshare --user --pid --mount ; mount -t proc ...
> The big change is that this set of changes enforces the preservation of
> locked mount flags, from the existing mount to the current mount.  Which
> means that if proc was mounted read-only the current current will allow
> a new instance of proc to be mounted read-write, and this set of changes
> enforces that proc remain read-only.
> The other gotcha is that the current code does not properly detect empty
> directories so to prevent things slipping through the cracks this set of
> changes annotates all mount points where nothing will be revealed if
> the filesystem mounted on top is removed.
> Enforcing the administrators policy can actually matter in the real
> world as has been shown by the recent docker issue.
> With this patchset I have two concerns:
> - The enforcement of mount flag preservation on proc and sysfs may break
>   things.  (I am especially worried about the implicit adding of nodev).

What do you mean by this?  What got added?

> - I missed a filesystem mountpoint on proc or sysfs which would make a
>   fresh copy unmountable for no good reason.
> I don't want to break userspace if I can help it, and the code has been
> this way for a while so I figure there is time to find any pitfalls and
> address them before this code gets merged.
> So if this works for you please give me your Tested-By
> The well known mountpoints for pseudo filesystems that I could find are:
> /dev/ffs*/                 functionfs
> /dev/gadget/               gadgetfs
> /dev/mqueue                mqueue
> /dev/oprofile/             oprofilefs
> /dev/pts/                  devpts

/dev/shm gets a tmpfs, right?  Or do those not matter here?

> /dlm/                      ocfs2_dlmfs
> /ipath/                    ipathfs
> /proc/fs/nfsd/             nfsd
> /proc/openprom/            openpromfs
> /proc/sys/fs/binfmt_misc/  binfmt_misc
> /spu/                      spufs

> /sys/firmware/efi/efivars/ efivarfs
> /sys/fs/cgroup/            cgroup
> /sys/fs/fuse/connections/  fusectl

I thought fuse mounted a few more things in here, but I don't know for

> /sys/fs/pstore/            pstore
> /sys/fs/selinux/           selinuxfs
> /sys/fs/smackfs/           smackfs
> /sys/hypervisor/s390/      s390_hypfs
> /sys/kernel/config/        configfs
> /sys/kernel/debug/         debugfs
> /sys/kernel/security/      securityfs
> /sys/kernel/tracing/       tracefs

I think these are all correct for sysfs, I have a minor comment on the
sysfs patch I'll make in it.


greg k-h

More information about the Containers mailing list