[PATCH review 0/4] Loopback mount escape fixes

Eric W. Biederman ebiederm at xmission.com
Fri Jul 24 20:39:02 UTC 2015

Miklos Szeredi <miklos at szeredi.hu> writes:

> On Thu, Apr 9, 2015 at 1:31 AM, Eric W. Biederman <ebiederm at xmission.com> wrote:
>> After the last round of feedback I sat down and played with my fix
>> for the fact that a strategically placed rename, ".." on bind mounts
>> go up past the root of the bind mount.
>> The code better handles the escaped directory returning into it's bind
>> mount, and is now roughly a constant factor cost in all cases from what
>> the code costs without the fix.
>> So I think I have found a better tradeoff between fixing this bug and
>> not slowing down path name lookups in the common case.
> Maybe I'm missing something, but I see a much simpler fix:
>  - When following ".." first just check against the dentry being equal
> to the root dentry.
>  - If so, then check mount being equal to root mount.
>  - If so, then we are fine, found the root.
>  - If mount is not root mount, then we either have a bind mount or the
> escape scenario. So have a peek at the mount tree to see if we have a
> chance of reaching root or not.
>   - If yes, then we are fine, continue upward.
>   - Otherwise stop here and act like we found root.

In concrete terms I think you are suggesting something like this patch
to follow_dot_dot.

diff --git a/fs/namei.c b/fs/namei.c
index ae4e4c18b2ac..56a8562899a1 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -1409,6 +1409,11 @@ static void follow_dotdot(struct nameidata *nd)
                if (nd->path.dentry != nd->path.mnt->mnt_root) {
+                       /* Escaped path? */
+                       if ((nd->path.mnt->mnt_root != nd->path.mnt->mnt_sb->s_root) &&
+                           d_ancestor(nd->path.mnt->mnt_root, nd->path.dentry))
+                               break;
+                       }
                        /* rare case of legitimate dget_parent()... */
                        nd->path.dentry = dget_parent(nd->path.dentry);

> This doesn't have to hook into d_move() and will only trigger the
> "violated" mode on an very specific and rare case.

Am I misunderstanding you?  I don't think .. on a bind mount is a very
specific rare case.

Operations such as following ../../../../../../../../../.. would go from
a cost of O(10) to a cost of O((10*(10 + P + 1))/2) aka
from O(N) to O(N^2+N*P). Where P is the depth of the path below 10
directories up.

Given that in cases like containers bind mounts are frequently the root
mount point of a filesystem I don't think we want that expense, if we
can possibly avoid it.  As that is a DOS attack and messes up
performance for cases that are not afflicected with an escape.


More information about the Containers mailing list