shiftfs status and future development

Sat Jun 16 03:03:05 UTC 2018

On Fri, 2018-06-15 at 09:59 -0500, Seth Forshee wrote:
> On Fri, Jun 15, 2018 at 08:56:38AM -0500, Serge E. Hallyn wrote:
> > Quoting Seth Forshee (seth.forshee at canonical.com):
> > > I wanted to inquire about the current status of shiftfs and the
> > > plans for it moving forward. We'd like to have this functionality
> > > available for use in lxd, and I'm interesetd in helping with
> > > development (or picking up development if it's stalled).
> > > 
> > > To start, is anyone still working on shiftfs or similar
> > > functionality? I haven't found it in any git tree on kernel.org,
> > > and as far as mailing list activity the last submission I can
> > > find is [1]. Is there anything newer than this?
> > > 
> > > Based on past mailing list discussions, it seems like there was
> > > still debate as to whether this feature should be an overlay
> > > filesystem or something supported at the vfs level. Was this ever
> > > resolved?
> > > 
> > > Thanks,
> > > Seth
> > > 
> > > [1] http://lkml.kernel.org/r/1487638025.2337.49.camel@HansenPartn
> > > ership.com
> > 
> > Hey Seth,
> > 
> > I haven't heard anything in a long time.  But if this is going to
> > pick back up, can we come up with a detailed set of goals and
> > requirements?

That would actually help.

> I was planning to follow up later with some discussion of
> requirements.
> Here are some of ours:
> 
>  - Supports any id maps possible for a user namespace

Could you clarify: right at the moment, it basically reverses the
namespace ID mapping when it does on to the filesystem using the
superblock user namespace, so, in theory you can have an arbitrary
mapping simply by changing the s_userns.  The problem here is that you
don't have a lot of tools for manipulating the s_userns.

>  - Does not break inotify

I don't expect it does, but I haven't checked.

>  - Passes accurate disk usage and source information from the
> "underlay"

mounts of this type don't currently show up in df

>  - Works with a variety of filesystems (ext4, xfx, btrfs, etc.)

yes

>  - Works with nested containers

yes

> I'm also interested in collecting any requirements others might have.
> 
> > I don't recall whether the last version still worked like this, but
> > I'm still not comfortable with the idea of a system where after a
> > reboot, container-created root-owned files are owned by host root
> > until a path is specially marked.  Enforcing that the "source"
> > directory is itself uid-shifted would greatly ease my mind.

And I believe we're discussing everything below in a different
subthread.

James

> I understand the concern and share the discomfort to some degree, but
> I'm not convinced that requiring the source subtree be shifted is the
> right approach.
> 
> First, let's address the marking question. As you stated, an approach
> that leaves the subree unmarked for a period of time is problematic,
> and imo this is a fatal flaw with marking as a protection for e.g.
> execing some suid root file written by a container. Writing some such
> mark to the filesystem would make it persistent, but it could also
> limit the support to a limited set of filesystems.
> 
> However, I do think it's necessary for a user with sufficient
> capabilities to "bless" a subtree for mounting in a less privileged
> context, so this is a feature of marking that I would like to keep. I
> think the new mount apis in David Howells' filesystem context patches
> [1] might give us a nicer way to do this. For example, root in
> init_user_ns could set up a mount fd which specifies the source
> subtree for the id shift. At that time the kernel could check for
> ns_capable(sb->s_user_ns, CAP_SYS_ADMIN) for the filesystem
> containing the source subtree. Then the fd could be passed to a
> container in a user namespace, who could use it to attach the mount
> to its filesystem tree.  The same concept could be extended to nested
> containers, as long as the user setting the source subtree has
> CAP_SYS_ADMIN towards sb->s_user_ns for the subtree.
> 
> Now back to reuiring the srouce subtree be id shifted. I understand
> the motivation for wanting this, but I'm not sure I'm in favor of it.
> To start, there are other ways to ensure that id shifted mounts don't
> lead to problems, such as putting the subtree under a directory
> accessible only by root or putting it in a nosuid or noexec mount.
> For some implementations those sorts of protections are going to make
> sense.
> 
> Having this requirement may also add significant time to mounting, as
> I assume it would involve iterating through all filesystem objects.
> 
> Additionally, that requirement is likely to significantly complicate
> the implementation. The simplest implementation would just translate
> the k[ug]ids in the inodes to a target user ns. A slightly more
> complicated approach might translate them based on a source and
> destination user ns. If it's implemented based on passing in an
> arbitrary id map at mount time it will be more complex and duplicate
> functionality that user namespaces already give us.
> 
> Thanks,
> Seth
> 
> [1] http://lkml.kernel.org/r/152720672288.9073.9868393448836301272.st
> git at warthog.procyon.org.uk
> _______________________________________________
> Containers mailing list
> Containers at lists.linux-foundation.org
> https://lists.linuxfoundation.org/mailman/listinfo/containers
>