[PATCH review 0/7] Bind mount escape fixes

Eric W. Biederman ebiederm at xmission.com
Sun Aug 16 00:59:39 UTC 2015

Linus Torvalds <torvalds at linux-foundation.org> writes:

> On Sat, Aug 15, 2015 at 2:07 PM, Eric W. Biederman
> <ebiederm at xmission.com> wrote:
>> Yes we can compare s_root and mnt_root and only call is_subir  if they don't match.
> Not even "is_subdir()" - for the RCU traversal case, just d_ancestor()
> should be sufficient since we'd already be in an RCU read-locked
> region and the RCU lookup checks the rename sequence number around it
> all.

We check the dentry sequence number and the mount sequence number, which
may be enough to catch a local rename but is certainly not enough to
catch what d_ancestor cares about.

Further we have the partial rcu to non-rcu walk case represented by
unlazy_walk that means we can't blithely do something that might be
wrong and only check the sequence numbers at each step.

> And d_ancestor() should really be pretty low-cost - even *if* we have
> to call it, which wouldn't even be the case for the normal situation.

>> At this point it is a matter of trade offs.
>> If there is not an escape I do not expect my current implementation will have a measurable cost.
>> And I don't expect there will be any escapes.
> So the cost I worry about is not the CPU cost, but the complexity and
> correctness. If anything goes subtly wrong, the end result is going to
> be some very very subtle bugs.

Fair enough.  I like simple low complexity code, but I don't want to
mess up the pathname lookup fastpath.

> And personally, I'd be much happier with something that is a bit more
> straightforward, even if it makes ".." lookup slower. Especially since
> I think we can limit the costs to fairly obvious cases (ie only for
> partial bind mounts). Keep the code more straightforward, and *if* we
> ever see the cost of dentry traversal
> But it's up to Al, I think.
> Al, comments?

At the very beginning of this I got shot down by Al Viro for a simple
implementation that essentially had everything except the check for
being a bind mount.  Knowing what I know now I realize it was a bit
buggy, calling d_ancestor in the rcu walk instead of d_subdir, but it
was shot down for the cpu cost.  Then Al suggested the basic approach I
have taken in these patches.

As soon as I am done testing I am going to post the revised version of
my final patch that only performs is_subdir checks on bind mounts.

Then we can decide to merge whichever version of the code you and Al are
happy with.


More information about the Containers mailing list