[PATCH 1/1] simplified security.nscapability xattr

Serge E. Hallyn serge at hallyn.com
Mon May 16 21:48:04 UTC 2016

On Mon, May 16, 2016 at 04:15:23PM -0500, Serge E. Hallyn wrote:
> Quoting Serge E. Hallyn (serge at hallyn.com):
> ...
> > There's a problem though.  The above suffices to prevent an unprivileged user
> > in a user_ns from unsharing a user_ns to write a file capability and exploit
> > that capability in the ns where he is unprivileged.  With one exception, which
> > is the case where the unprivileged user is mapped to the same kuid which
> > created the namespace.  So if uid 1000 on the host creates a namespace
> > where uid 1000 maps to 1000 in the namespace, then 1000 in the namespace
> > can create a new user_ns, write the xattr, and exploit it from the
> > parent namespace.  This is not an uncommon case.  I'm not sure what to do about
> > it.
> Ok I think I've convinced myself that requiring a kuid 0 in the container
> and storing that in the security.nscapability is best solution.  The DAC
> objection is imo not really valid - we don't have to give uid 0 in the
> container any special privilege, we just require that the ns have a uid 0
> mapping.  I have not been able to think of any other reliable way to verify
> that the writer of the capability is authorized to grant privilege to the
> file when executed by current.
> I'm going to proceed with another POC based on the following design:
> 1. no new syscalls at the moment.  You can choose to set/query
> security.nscapability, but can also just set security.capability from
> a user_ns and have the kernel transparently set a security.nscapability
> entry for you.
> 2. For now just a single security.nscapability entry, but in a format
> that turning it into an array will be a trivial change
> 3. When running file foo which has a security.nscapability for kuid 100000,
> then any namespace where kuid 100000 is root - or which has an ancestor ns where
> that is the case - will run the file with the listed capabilities.
> 4. When doing getxattr of security.capability from a user_ns, if there is a
> security.capability entry, that will be returned;  else if there is a valid
> security.nscapability for your ns, that will be returned.
> 5. when doing a setxattr of security.capability from a user_ns, if there is
> a security.nscapability entry, you get EBUSY;  else a security.nscapability 
> with your root kuid will be written provided that (a) you are privileged
> over your namespace, (b) you are privileged over your root uid, (c) the
> file owner maps into your namespace.

Stéphane pointed out this isn't quite right.  The EBUSY will happen if
a security.nscapability is defined with a kuid over which the writer is
not privileged - else it will overwrite.  It will also happen if
security.capbility is set.

> 6. when doing a getxattr of security.nscapability, the entry will be shown
> with kuid mapped into your namespace or -1 if the uid does not map into
> your ns.
> 7. when doing a setxattr of security.nscapability, if an entry exists, you
> get -EBUSY;  if you are not privileged over your ns, your root uid, and
> the file owner, then you get -EPERM;  the xattr includes a uid field, which
> must be either 0 or a value valid in your ns.  The value will be converted
> to a kuid and stored on disk.  (Seth, I'm not sure offhand how that should
> mesh with your patches, we can talk about it after I send the next patch,
> which I'm quite certain will handle it wrongly)
> 8. If a security.capability exists, it will override any security.nscapability
> at execve() (so, inverse of my previous two patches).
> -serge

More information about the Containers mailing list