[PATCH v4] Introduce v3 namespaced file capabilities

Tue Jun 13 23:50:25 UTC 2017

Quoting Serge E. Hallyn (serge at hallyn.com):
> Quoting Stefan Berger (stefanb at linux.vnet.ibm.com):
> > On 05/08/2017 02:11 PM, Serge E. Hallyn wrote:
> > >Root in a non-initial user ns cannot be trusted to write a traditional
> > >security.capability xattr.  If it were allowed to do so, then any
> > >unprivileged user on the host could map his own uid to root in a private
> > >namespace, write the xattr, and execute the file with privilege on the
> > >host.
> > >
> > >However supporting file capabilities in a user namespace is very
> > >desirable.  Not doing so means that any programs designed to run with
> > >limited privilege must continue to support other methods of gaining and
> > >dropping privilege.  For instance a program installer must detect
> > >whether file capabilities can be assigned, and assign them if so but set
> > >setuid-root otherwise.  The program in turn must know how to drop
> > >partial capabilities, and do so only if setuid-root.
> > 
> > Hi Serge,
> > 
> > 
> >   I have been looking at patch below primarily to learn how we could
> > apply a similar technique to security.ima and security.evm for a
> > namespaced IMA. From the paragraphs above I thought that you solved
> > the problem of a shared filesystem where one now can write different
> > security.capability xattrs by effectively supporting for example
> > security.capability[uid=1000] and security.capability[uid=2000]
> 
> Interesting idea.  Worth considering.
> 
> > written into the filesystem. Each would then become visible as
> > security.capability if the userns mapping is set appropriately.
> > However, this doesn't seem to be how it is implemented. There seems
> 
> Indeed, when I was considering supporting multiple simulatenous
> xattrs, I did it as something like:
> 
> struct vfs_ns_cap_data {
> 	struct {
> 		__le32 permitted;
> 		__le32 inheritable;
> 	} data[VFS_CAP_U32];
> 	__le32 rootid;
> };
> 
> struct vfs_ns_cap {
> 	__le32 magic_etc;
> 	__le32 n_entries;
> 	struct ns_cap_data data[0];
> }; // followed by n_entries of struct ns_cap_data
> 
> You're instead suggesting encoding the rootuid in the name,
> which is interesting.
> 
> > to be only a single such entry with uid appended to it and, if it
> > was a shared filesystem, the first one to set this attribute blocks
> > everyone else from writing the xattr. Is that how it works? Would
> 
> Approximately - indeed there is only a single xattr.  But it can be
> overwritten, so long as the writer has CAP_SETFCAP over the user_ns
> which mounted the filesystem.

Hang on.  I've mis-spoken.  That's the requirement for writing a
v2 xattr.  To write a v3 xattr you only need to be privileged
(with CAP_SETFCAP) against the inode.