[PATCH v3 2/3] fs: introduce uid/gid shifting bind mount

Amir Goldstein amir73il at gmail.com
Tue Feb 18 07:38:07 UTC 2020

On Mon, Feb 17, 2020 at 10:58 PM James Bottomley
<James.Bottomley at hansenpartnership.com> wrote:
> This implementation reverse shifts according to the user_ns belonging
> to the mnt_ns.  So if the vfsmount has the newly introduced flag
> MNT_SHIFT and the current user_ns is the same as the mount_ns->user_ns
> then we shift back using the user_ns and an optional mnt_userns (which
> belongs to the struct mount) before committing to the underlying
> filesystem.
> For example, if a user_ns is created where interior (fake root, uid 0)
> is mapped to kernel uid 100000 then writes from interior root normally
> go to the filesystem at the kernel uid.  However, if MNT_SHIFT is set,
> they will be shifted back to write at uid 0, meaning we can bind mount
> real image filesystems to user_ns protected faker root.
> In essence there are several things which have to be done for this to
> occur safely.  Firstly for all operations on the filesystem, new
> credentials have to be installed where fsuid and fsgid are set to the
> *interior* values. Next all inodes used from the filesystem have to
> have i_uid and i_gid shifted back to the kernel values and attributes
> set from user space have to have ia_uid and ia_gid shifted from the
> kernel values to the interior values.  The capability checks have to
> be done using ns_capable against the kernel values, but the inode
> capability checks have to be done against the shifted ids.
> Since creating a new credential is a reasonably expensive proposition
> and we have to shift and unshift many times during path walking, a
> cached copy of the shifted credential is saved to a newly created
> place in the task structure.  This serves the dual purpose of allowing
> us to use a pre-prepared copy of the shifted credentials and also
> allows us to recognise whenever the shift is actually in effect (the
> cached shifted credential pointer being equal to the current_cred()
> pointer).
> To get this all to work, we have a check for the vfsmount flag and the
> user_ns gating a shifting of the credentials over all user space
> entries to filesystem functions.  In theory the path has to be present
> everywhere we do this, so we can check the vfsmount flags.  However,
> for lower level functions we can cheat this path check of vfsmount
> simply to check whether a shifted credential is in effect or not to
> gate things like the inode permission check, which means the path
> doesn't have to be threaded all the way through the permission
> checking functions.  if the credential is shifted check passes, we can
> also be sure that the current user_ns is the same as the mnt->user_ns,
> so we can use it and thus have no need of the struct mount at the
> point of the shift.
> Although the shift can be effected simply by executing
> do_reconfigure_mnt with MNT_SHIFT in the flags, this patch only
> contains the shifting mechanisms.  The follow on patch wires up the
> user visible API for turning the flag on.
> Signed-off-by: James Bottomley <James.Bottomley at HansenPartnership.com>
> ---

> @@ -3828,6 +3884,7 @@ long do_mknodat(int dfd, const char __user *filename, umode_t mode,
>         if (IS_ERR(dentry))
>                 return PTR_ERR(dentry);
> +       cred = change_userns_creds(&path);
>         if (!IS_POSIXACL(path.dentry->d_inode))
>                 mode &= ~current_umask();
>         error = security_path_mknod(&path, dentry, mode, dev);

> +       cred = change_userns_creds(&path);
>         if (!IS_POSIXACL(path.dentry->d_inode))
>                 mode &= ~current_umask();
>         error = security_path_mkdir(&path, dentry, mode);

> +       cred = change_userns_creds(&path);
>         error = security_path_symlink(&path, dentry, from->name);

I see a pattern above.

Perhaps change_userns_creds() should be inside security_path_XXX hooks?
Perhaps auto-shifting bind mount should be implemented by an LSM?
After, all "gating" access to filesystem, is part of what LSMs do and
uid (or fsid)
shifting is a sort of "gating".
Heck, there should already be a way to attach a security context to a mount,
right? So you don't even need a new UAPI in order to configure the auto-shifting
LSM. And you could use standard security.* xattr for persistent configuration
of the auto-shifting filesystem sections, which is something that you wanted
to solve anyway, right?

Apologies if my suggestions are flawed with misunderstanding of the feature.


More information about the Containers mailing list