[PATCH review 4/6] mnt: Track when a directory escapes a bind mount

Al Viro viro at ZenIV.linux.org.uk
Mon Aug 10 04:36:37 UTC 2015

On Mon, Aug 03, 2015 at 04:27:34PM -0500, Eric W. Biederman wrote:

> - The escape count on struct mount must be incremented both before the
>   rename and after.  If the count is not incremented before the rename
>   it is possible to hit a scenario where the rename happens the code
>   walks up the directory tree to somewhere outside of the bind mount
>   before the count is touched.  Similary without a count after the
>   rename it is possible for the code to look at the escape count
>   validate a path is connected before the rename and assume cache the
>   escape count, leading to not retesting the path is ok.

Umm...  I wonder if you are overcomplicating the things here.  Sure,
I understand wanting to reduce the checks on "..", but...  It costs you
considerable complexity (especially when it comes to 64bit counts),
it's really brittle (you need to be very careful about the places where
you zero the cached values in fs/namei.c and missing one will lead to
really unpleasant effects there) _and_ it's all for the benefit of 
a very rare case.  With check you are optimizing away not being all that
costly anyway.
> - The largest change is in d_unalias, where the two cases are split
>   apart so they can be handled separately.  In the easy case of a
>   rename within the same directory all that is needed is __d_move
>   (escaping a mount is impossible in that case).  In the more involved
>   case mutexes need to be acquired, and now the spin locks need to be
>   dropped so that proper lock aquisition order around __d_move can be
>   arranged.

I _really_ hate that part.  Could you explain WTF is wrong with simply
taking mount_lock in that case of __d_splice_alias() just outside of

> +	unlock = lock_namespace_rename(dentry, target, false);
> +
>  	write_seqlock(&rename_lock);
>  	__d_move(dentry, target, false);
>  	write_sequnlock(&rename_lock);
> +
> +	if (unlock)
> +		unlock_namespace_rename(unlock, dentry, target, false);
> +

Your unlock_namespace_rename() should've been a static inline.
With the check of unlock != NULL done in there.  Two such inlines,
actually, and to hell with the boolean argument.  Same split for the
lock counterpart, of course.

More information about the Containers mailing list