[PATCH review 0/7] Bind mount escape fixes

Eric W. Biederman ebiederm at xmission.com
Sat Aug 15 18:35:19 UTC 2015


It is possible in some situations to rename a file or directory through
one mount point such that it can start out inside of a bind mount and
after the rename wind up outside of the bind mount.  Unfortunately with
user namespaces these conditions can be trivially created by creating a
bind mount under an existing bind mount.

I have identified four situations in which this may be a problem.
- __d_path and d_absolute_path need to error on disconnected paths
  that can not reach some root directory or lsm path based security
  checks can incorrectly succeed.

- Normal path name resolution following .. can lead to a directory
  that is outside of the original loopback mount.

- file handle reconsititution aka exportfs_decode_fh can yield a dentry
  from which d_parent can be followed up to mnt->sb->s_root, but
  d_parent can not be followed up to mnt->mnt_root.

- Mounts on a path that has been renamed outside of a loopback mount
  become unreachable, as there is no possible path that can be passed
  to umount to unmount them.

My strategy:

o File handle reconsitituion problems can be prevented by enabling
  the nfsd subtree checks for nfs exports, and open_by_handle_at
  requires capable(CAP_DAC_READ_SEARCH) so is only usable by the global
  root.  This makes any problems difficult if not impossible to exploit
  in practice so I have not yet written code to address that issue.

o The functions __d_path and d_absolute_path are agumented so that the
  security modules will not be fed a problematic path to work with.

o Following of .. has been agumented to test that after d_parent has
  been resolved the original  directory is connected, and if not
  an error of -ENOENT is returned.

o I do not worry about mounts that are disconnected from their bind
  mount as these mounts can always be freed by either umount -l on
  the bind mount they have escaped from, or by freeing the mount
  namespace.  So I do not believe there is an actual problem.

Pathname resolution is a common fast path and most of the code in this
patchset to support keeping .. from becoming expensive in the common
case.

After hearing the Al's feedback and running some numbers I have given
up attempting to keeping the number of d_ancestor calls during pathname
resolution to an absolute minimum.  It appears that simply preventing
calls d_ancestor unless a directory has escaped is good enough.  This
change in approach has significantly simplified the code.

The implementation change this round is I have dropped my patch cleaning
up d_splice_alias.  Al Viro found a race that makes the technique I was
using fundamentally racy.  I now have d_splice_alias taking mount_lock
around rename_lock.  Since I don't have to sleep in d_splice_alias
change is minimal and sufficient for this purpose.

Barring some other idiocy I think this will be the final version of this
patchset.

These changes are all against v4.2-rc1. 

For those who like to see everything in a single tree the code is at:

     git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace.git for-testing

Eric W. Biederman (7):
      dcache: Handle escaped paths in prepend_path
      dcache: Reduce the scope of i_lock in d_splice_alias
      mnt: Track which mounts use a dentry as root.
      dcache: Implement d_common_ancestor
      dcache: Only read d_flags once in d_is_dir
      mnt: Track when a directory escapes a bind mount
      vfs: Test for and handle paths that are unreachable from their mnt_root


 fs/dcache.c            |  91 +++++++++++++++++++++++++++--
 fs/mount.h             |   9 +++
 fs/namei.c             |  26 ++++++++-
 fs/namespace.c         | 152 +++++++++++++++++++++++++++++++++++++++++++++++--
 include/linux/dcache.h |  11 +++-
 include/linux/mount.h  |   1 +
 6 files changed, 279 insertions(+), 11 deletions(-)


More information about the Containers mailing list