[PATCH v4 2/4] namei: O_BENEATH-style path resolution flags

Aleksa Sarai asarai at suse.de
Fri Nov 23 16:58:43 UTC 2018


On 2018-11-23, Jürg Billeter <j at bitron.ch> wrote:
> On Tue, 2018-11-13 at 01:26 +1100, Aleksa Sarai wrote:
> > * O_BENEATH: Disallow "escapes" from the starting point of the
> >   filesystem tree during resolution (you must stay "beneath" the
> >   starting point at all times). Currently this is done by disallowing
> >   ".." and absolute paths (either in the given path or found during
> >   symlink resolution) entirely, as well as all "magic link" jumping.
> 
> With open_tree(2) and OPEN_TREE_CLONE, will O_BENEATH still be
> necessary? As I understand it, O_BENEATH could be replaced by a much
> simpler flag that only disallows absolute paths (incl. absolute
> symlinks). And it would have the benefit that you can actually pass the
> tree/directory fd to another process and escaping would not be possible
> even if that other process doesn't use O_BENEATH (after calling
> mount_setattr(2) to make sure it's locked down).
> 
> This approach would also make it easy to restrict writes via a cloned
> tree/directory fd by marking it read-only via mount_setattr(2) (and
> locking down the read-only flag). This would again be especially useful
> when passing tree/directory fds across processes, or for voluntary
> self-lockdown within a process for robustness against security bugs.
> 
> This wouldn't affect any of the other flags in this patch. And for full
> equivalence to O_BENEATH you'd have to use O_NOMAGICLINKS in addition
> to O_NOABSOLUTE, or whatever that new flag would be called.
> 
> Or is OPEN_TREE_CLONE too expensive for this use case? Or is there
> anything else I'm missing?

OPEN_TREE_CLONE currently requires CAP_SYS_ADMIN in mnt_ns->user_ns,
which requires a fork and unshare -- or at least a vfork and some other
magic -- at which point we're back to just doing a pivot_root(2) for
most operations.

I think open_tree(2) -- which I really should sit down and play around
with -- would be an interesting way of doing O_BENEATH, but I think
you'd still need to have the same race protections we have in the
current O_BENEATH proposal to handle "..".

So really you'd be using open_tree(OPEN_TREE_CLONE) just so that you can
use the "path.mnt" setting code, which I'm not sure is the best way of
doing it (plus the other interesting ideas which you get with the other
mount API changes).

But I am quite hopeful for what cool things we'll be able to make using
the new mount API.

-- 
Aleksa Sarai
Senior Software Engineer (Containers)
SUSE Linux GmbH
<https://www.cyphar.com/>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 833 bytes
Desc: not available
URL: <http://lists.linuxfoundation.org/pipermail/containers/attachments/20181124/c2a181d6/attachment-0001.sig>


More information about the Containers mailing list