[PATCH v2 00/28] user_namespace: introduce fsid mappings
James.Bottomley at HansenPartnership.com
Mon Feb 17 21:11:59 UTC 2020
On Fri, 2020-02-14 at 19:35 +0100, Christian Brauner wrote:
> With this patch series we simply introduce the ability to create fsid
> mappings that are different from the id mappings of a user namespace.
> The whole feature set is placed under a config option that defaults
> to false.
> In the usual case of running an unprivileged container we will have
> setup an id mapping, e.g. 0 100000 100000. The on-disk mapping will
> correspond to this id mapping, i.e. all files which we want to appear
> as 0:0 inside the user namespace will be chowned to 100000:100000 on
> the host. This works, because whenever the kernel needs to do a
> filesystem access it will lookup the corresponding uid and gid in the
> idmapping tables of the container.
> Now think about the case where we want to have an id mapping of 0
> 100000 100000 but an on-disk mapping of 0 300000 100000 which is
> needed to e.g. share a single on-disk mapping with multiple
> containers that all have different id mappings.
> This will be problematic. Whenever a filesystem access is requested,
> the kernel will now try to lookup a mapping for 300000 in the id
> mapping tables of the user namespace but since there is none the
> files will appear to be owned by the overflow id, i.e. usually
> 65534:65534 or nobody:nogroup.
> With fsid mappings we can solve this by writing an id mapping of 0
> 100000 100000 and an fsid mapping of 0 300000 100000. On filesystem
> access the kernel will now lookup the mapping for 300000 in the fsid
> mapping tables of the user namespace. And since such a mapping
> exists, the corresponding files will have correct ownership.
How do we parametrise this new fsid shift for the unprivileged use
case? For newuidmap/newgidmap, it's easy because each user gets a
dedicated range and everything "just works (tm)". However, for the
fsid mapping, assuming some newfsuid/newfsgid tool to help, that tool
has to know not only your allocated uid/gid chunk, but also the offset
map of the image. The former is easy, but the latter is going to vary
by the actual image ... well unless we standardise some accepted shift
for images and it simply becomes a known static offset.
More information about the Containers