[PATCH] vfs: Test for and handle paths that are unreachable from their mnt_root

NeilBrown neilb at suse.com
Mon Aug 17 03:56:49 UTC 2015


On Sat, 15 Aug 2015 20:27:13 -0500 ebiederm at xmission.com (Eric W.
Biederman) wrote:

> 
> In rare cases a directory can be renamed out from under a bind mount.
> In those cases without special handling it becomes possible to walk up
> the directory tree to the root dentry of the filesystem and down
> from the root dentry to every other file or directory on the filesystem.
> 
> Like division by zero .. from an unconnected path can not be given
> a useful semantic as there is no predicting at which path component
> the code will realize it is unconnected.  We certainly can not match
> the current behavior as the current behavior is a security hole.
> 
> Therefore when encounting .. when following an unconnected path
> return -ENOENT.
> 
> - Add a function path_connected to verify path->dentry is reachable
>   from path->mnt.mnt_root.  AKA to validate that rename did not do
>   something nasty to the bind mount.
> 
>   To avoid races path_connected must be called after following a path
>   component to it's next path component.
> 
> Signed-off-by: "Eric W. Biederman" <ebiederm at xmission.com>
> ---
> 
> This is the simple version that needs no extra vfs support.
> 
> My availability is likely to be a bit spotty for the next while
> as I am travelling to and then attending Linux Plumbers Conference.
> 
>  fs/namei.c | 27 +++++++++++++++++++++++++--
>  1 file changed, 25 insertions(+), 2 deletions(-)
> 
> diff --git a/fs/namei.c b/fs/namei.c
> index ae4e4c18b2ac..5303e994f8d6 100644
> --- a/fs/namei.c
> +++ b/fs/namei.c
> @@ -560,6 +560,24 @@ static int __nd_alloc_stack(struct nameidata *nd)
>  	return 0;
>  }
>  
> +/**
> + * path_connected - Verify that a path->dentry is below path->mnt.mnt_root
> + * @path: nameidate to verify
> + *
> + * Rename can sometimes move a file or directory outside of a bind
> + * mount, path_connected allows those cases to be detected.

While it is obviously true that a rename can move a file outside of a
bind mount, it doesn't seem relevant and so could be confusing.
This is only ever used for directories, and a file could already be
outside a bind mount even while it is inside.

I would stick with "Rename can sometimes move a directory outside..."



> + */
> +static bool path_connected(const struct path *path)
> +{
> +	struct vfsmount *mnt = path->mnt;
> +
> +	/* Only bind mounts can have disconnected paths */
> +	if (mnt->mnt_root == mnt->mnt_sb->s_root)
> +		return true;
> +
> +	return is_subdir(path->dentry, mnt->mnt_root);
> +}
> +
>  static inline int nd_alloc_stack(struct nameidata *nd)
>  {
>  	if (likely(nd->depth != EMBEDDED_LEVELS))
> @@ -1296,6 +1314,8 @@ static int follow_dotdot_rcu(struct nameidata *nd)
>  				return -ECHILD;
>  			nd->path.dentry = parent;
>  			nd->seq = seq;
> +			if (unlikely(!path_connected(&nd->path)))
> +				return -ENOENT;
>  			break;
>  		} else {
>  			struct mount *mnt = real_mount(nd->path.mnt);
> @@ -1396,7 +1416,7 @@ static void follow_mount(struct path *path)
>  	}
>  }
>  
> -static void follow_dotdot(struct nameidata *nd)
> +static int follow_dotdot(struct nameidata *nd)
>  {
>  	if (!nd->root.mnt)
>  		set_root(nd);
> @@ -1412,6 +1432,8 @@ static void follow_dotdot(struct nameidata *nd)
>  			/* rare case of legitimate dget_parent()... */
>  			nd->path.dentry = dget_parent(nd->path.dentry);
>  			dput(old);
> +			if (unlikely(!path_connected(&nd->path)))
> +				return -ENOENT;
>  			break;
>  		}
>  		if (!follow_up(&nd->path))
> @@ -1419,6 +1441,7 @@ static void follow_dotdot(struct nameidata *nd)
>  	}
>  	follow_mount(&nd->path);
>  	nd->inode = nd->path.dentry->d_inode;
> +	return 0;
>  }
>  
>  /*
> @@ -1634,7 +1657,7 @@ static inline int handle_dots(struct nameidata *nd, int type)
>  		if (nd->flags & LOOKUP_RCU) {
>  			return follow_dotdot_rcu(nd);
>  		} else
> -			follow_dotdot(nd);
> +			return follow_dotdot(nd);
>  	}
>  	return 0;
>  }


I really like this patch, particularly from the standpoint of
backporting to -stable and enterprise kernels.  I suspect that all the
tracking of which mounts might have been escaped from should be
classified as premature optimisation, until measurements show otherwise.

path_connected() adds no locks or even atomics.  The path that it walks
will be well-trodden and so very likely most of it will be in the CPU
cache.

And I particularly like that follow_dotdot() and follow_dotdot_rcu()
now both that the same signature :-)  What's not to like?

Reviewed-by: NeilBrown <neilb at suse.com>

in case it helps.


NeilBrown


More information about the Containers mailing list