[PATCH v2 2/3] fs: introduce uid/gid shifting bind mount

Fri Jan 17 22:52:52 UTC 2020

On Fri, 2020-01-17 at 13:19 -0800, Tycho Andersen wrote:
> On Fri, Jan 17, 2020 at 08:25:42AM -0800, James Bottomley wrote:
> > On Fri, 2020-01-17 at 09:44 -0600, Serge E. Hallyn wrote:
> > > On Thu, Jan 16, 2020 at 08:29:33AM -0800, James Bottomley wrote:
> > > I guess I figured we would have privileged task in the owning
> > > namespace (presumably init_user_ns) mark a bind mount as
> > > shiftable 
> > 
> > Yes, that's what I've got today in the prototype.  It mirrors the
> > original shiftfs mechanism.  However, I have also heard people say
> > they want a permanent mark, like an xattr for this.
> 
> Please, no. mount() failures are already hard to reason about, I
> would rather not add another temporary (or worse, permanent) non-
> obvious failure mode.

I'm not particularly bothered either way ... although using xattrs
always seems to end up biting us for nesting, so I wasn't wildly
enthusiastic about it.

> What if we make shifted bind mounts always readonly? That will force
> people to use an overlay (or something else) on top, but they
> probably want to do that anyway so they can avoid tainting the
> original container image with writes.

That really causes problems for the mutable (non-docker) container use
case which is pretty much the way I always use containers.  Who wants
to bother with overlayfs when their image is expected to mutate: it's
just a huge hassle.

> > > Oh - I consider the detail of whether we pass a userid or userns
> > > nsfd as more of an implementation detail which we can hash out
> > > after the more general shift-mount api is decided upon.  Anyway,
> > > passing nsfds just has a cool factor :)
> > 
> > Well, yes, won't aruge on the cool factor-ness.
> 
> It's not just the cool factor: if you're doing this, it's presumably
> because you want to use it with a container in a user namespace.
> Specifying the same parameters twice leaves room for error, causing
> CVEs and more work.

It depends.  For the offset, we agreed there's no extant user_ns, so
you have to create one specifically.  That leads to a more error prone
setup with no actual checking benefit.

For the shift_ns, it depends whether you want one mount point per
tenant, in which case the tenant user_ns might be a useful check, or
one mount point with an ACL in which case you just backshift along the
binding tenant user_ns.

James