[PATCH 00/34] fs: idmapped mounts

Tycho Andersen tycho at tycho.pizza
Thu Oct 29 21:03:44 UTC 2020


Hi Eric,

On Thu, Oct 29, 2020 at 11:44:33AM -0500, Eric W. Biederman wrote:
> Tycho Andersen <tycho at tycho.pizza> writes:
> 
> > Hi Eric,
> >
> > On Thu, Oct 29, 2020 at 10:47:49AM -0500, Eric W. Biederman wrote:
> >> Christian Brauner <christian.brauner at ubuntu.com> writes:
> >> 
> >> > Hey everyone,
> >> >
> >> > I vanished for a little while to focus on this work here so sorry for
> >> > not being available by mail for a while.
> >> >
> >> > Since quite a long time we have issues with sharing mounts between
> >> > multiple unprivileged containers with different id mappings, sharing a
> >> > rootfs between multiple containers with different id mappings, and also
> >> > sharing regular directories and filesystems between users with different
> >> > uids and gids. The latter use-cases have become even more important with
> >> > the availability and adoption of systemd-homed (cf. [1]) to implement
> >> > portable home directories.
> >> 
> >> Can you walk us through the motivating use case?
> >> 
> >> As of this year's LPC I had the distinct impression that the primary use
> >> case for such a feature was due to the RLIMIT_NPROC problem where two
> >> containers with the same users still wanted different uid mappings to
> >> the disk because the users were conflicting with each other because of
> >> the per user rlimits.
> >> 
> >> Fixing rlimits is straight forward to implement, and easier to manage
> >> for implementations and administrators.
> >
> > Our use case is to have the same directory exposed to several
> > different containers which each have disjoint ID mappings.
> 
> Why do the you have disjoint ID mappings for the users that are writing
> to disk with the same ID?

We don't today; right now we have a service that runs as root, since
that's the only thing that can be made to work. See below...

> >> Reading up on systemd-homed it appears to be a way to have encrypted
> >> home directories.  Those home directories can either be encrypted at the
> >> fs or at the block level.  Those home directories appear to have the
> >> goal of being luggable between systems.  If the systems in question
> >> don't have common administration of uids and gids after lugging your
> >> encrypted home directory to another system chowning the files is
> >> required.
> >> 
> >> Is that the use case you are looking at removing the need for
> >> systemd-homed to avoid chowning after lugging encrypted home directories
> >> from one system to another?  Why would it be desirable to avoid the
> >> chown?
> >
> > Not just systemd-homed, but LXD has to do this,
> 
> I asked why the same disk users are assigned different kuids and the
> only reason I have heard that LXD does this is the RLIMIT_NPROC problem.
> 
> Perhaps there is another reason.
> 
> In part this is why I am eager to hear peoples use case, and why I was
> trying very hard to make certain we get the requirements.
> 
> I want the real requirements though and some thought, not just we did
> this and it hurts.  Changning the uids on write is a very hard problem,
> and not just in implementating it but also in maintaining and
> understanding what is going on.

We have N services which we don't want to be able to talk to each
other, so we run them in containers with isolated uid maps. However,
we do want them to be able to share a single directory of metadata.
We'd like to bind mount this directory into each container with the
correct user ns mapping to accomplish this; this is the primary driver
for putting the parameterization on the struct vfsmount.

We could map some shared uid into all the containers. But then we have
to have some service running as root in the containers to shuttle the
data back and forth between the shared uid the uid of the actual
service.

So goals are:

* directory shared between containers
* containers can talk to this directory as "normal" uids, so there's
  no data shuttling
* container's rootfs is still disjoint, so writes by "normal" uids in
  each container are disjoint from the other and they can't talk
  except via the shared directory

Tycho


More information about the Containers mailing list