[PATCH review 0/7] Bind mount escape fixes

Al Viro viro at ZenIV.linux.org.uk
Sun Aug 16 02:12:09 UTC 2015

On Sat, Aug 15, 2015 at 03:47:50PM -0700, Linus Torvalds wrote:
> So the cost I worry about is not the CPU cost, but the complexity and
> correctness. If anything goes subtly wrong, the end result is going to
> be some very very subtle bugs.
> And personally, I'd be much happier with something that is a bit more
> straightforward, even if it makes ".." lookup slower. Especially since
> I think we can limit the costs to fairly obvious cases (ie only for
> partial bind mounts). Keep the code more straightforward, and *if* we
> ever see the cost of dentry traversal
> But it's up to Al, I think.
> Al, comments?

I think you are underestimating the frequency of .. traversals.  Any build
process that creates relative symlinks will be hitting it all the time,
for one thing.  And less-than-entire-fs mounts are not something pathological -
I've got quite a few here, things like container setups often create such
beasts, etc.  Not to mention that things like NFSv4 will often look like such
partial mounts, BTW.

I really don't understand why the hell do we need *anything* complicated
around __d_move() callers - just take sodding spinlock of mount_lock in
the (few) callers around rename_lock, and that's it.  No need for anything
subtle and brittle.

Redoing the locking in d_splice_alias() simply doesn't belong anywhere near
that work.

Basically, all places where we change tree topology go through __d_move()
(and take rename_lock around it).  There are very few of those.  And we
can find and taint affected mounts quite easily - I think we all agree that
beginning of that series looks sane (locating mounts by mnt_root), right?

Let's just add "mount_lock should be held read-exclusive by all callers of
__d_move()" and do the find-and-taint logics from
dentry_lock_for_move().  Move these
        BUG_ON(d_ancestor(dentry, target));
        BUG_ON(d_ancestor(target, dentry));
into dentry_lock_for_move(), while we are at it.  And have it begin with
finding the last common ancestor of dentry and target.  Which turns those
checks into ancestor == dentry and ancestor == target...  Then we need to
taint everything with root being an ancestor of dentry, but not an ancestor
of target.  Which is trivial with LCA already found.  For exchange case we
need to do that both for dentry and target, of course.  That's it.  After that
we just use that taint instead of "is it a partial?" in .. handling.

More information about the Containers mailing list