[PATCH v2 2/3] fs: introduce uid/gid shifting bind mount

Mon Jan 13 03:41:49 UTC 2020

On Sat, Jan 04, 2020 at 12:39:45PM -0800, James Bottomley wrote:
> This implementation reverse shifts according to the user_ns belonging
> to the mnt_ns.  So if the vfsmount has the newly introduced flag
> MNT_SHIFT and the current user_ns is the same as the mount_ns->user_ns
> then we shift back using the user_ns before committing to the
> underlying filesystem.
> 
> For example, if a user_ns is created where interior (fake root, uid 0)
> is mapped to kernel uid 100000 then writes from interior root normally
> go to the filesystem at the kernel uid.  However, if MNT_SHIFT is set,
> they will be shifted back to write at uid 0, meaning we can bind mount
> real image filesystems to user_ns protected faker root.

Thanks, James, I definately would like to see shifting in the VFS
api.

I have a few practical concerns about this implementation, but my biggest
concern is more fundemental:  this again by design leaves littered about
the filesystem uid-0 owned files which were written by an untrusted user.

I would feel much better if you institutionalized having the origin
shifted.  For instance, take a squashfs for a container fs, shift it
so that fsuid 0 == hostuid 100000.  Mount that, with a marker saying
how it is shifted, then set 'shiftable'.  Now use that as a base for
allowing an unpriv user to shift.  If that user has subuid 200000 as
container uid 0, then its root will write files as uid 100000 in the
fs.  This isn't perfect, but I think something along these lines would
be far safer.

Two namespaces with different uid maps can share the filesystem as though
they both had the same uidmap.  (This currently is to me the most
interesting use case for shifing bind mounts)

If the user wants to tar up the result, they can do do in their own
namespace, resulting in uid 0 shown as uid 0.  If host root wants to
do so, they can umount it everywhere and use something like fuidshift.
Or, they can also create a namespace to do the shifting to uid 0 in tar.

My more practical concerns include: (1) once a userns has set a shiftable
bind mount to shift, if it then creates a new child userns, that ns
will not see (iiuc) see the fs as shifted.  (2) there seems to be no
good reason to stick to caching the cred for only one mnt, versus
having a per-userns hashtable of creds for shifted mnts.  Was that just
a temporary shortcut or did you intend it to stay that way?