[PATCH 1/1] simplified security.nscapability xattr

Mon May 16 21:15:23 UTC 2016

Quoting Serge E. Hallyn (serge at hallyn.com):
...
> There's a problem though.  The above suffices to prevent an unprivileged user
> in a user_ns from unsharing a user_ns to write a file capability and exploit
> that capability in the ns where he is unprivileged.  With one exception, which
> is the case where the unprivileged user is mapped to the same kuid which
> created the namespace.  So if uid 1000 on the host creates a namespace
> where uid 1000 maps to 1000 in the namespace, then 1000 in the namespace
> can create a new user_ns, write the xattr, and exploit it from the
> parent namespace.  This is not an uncommon case.  I'm not sure what to do about
> it.

Ok I think I've convinced myself that requiring a kuid 0 in the container
and storing that in the security.nscapability is best solution.  The DAC
objection is imo not really valid - we don't have to give uid 0 in the
container any special privilege, we just require that the ns have a uid 0
mapping.  I have not been able to think of any other reliable way to verify
that the writer of the capability is authorized to grant privilege to the
file when executed by current.

I'm going to proceed with another POC based on the following design:

1. no new syscalls at the moment.  You can choose to set/query
security.nscapability, but can also just set security.capability from
a user_ns and have the kernel transparently set a security.nscapability
entry for you.

2. For now just a single security.nscapability entry, but in a format
that turning it into an array will be a trivial change

3. When running file foo which has a security.nscapability for kuid 100000,
then any namespace where kuid 100000 is root - or which has an ancestor ns where
that is the case - will run the file with the listed capabilities.

4. When doing getxattr of security.capability from a user_ns, if there is a
security.capability entry, that will be returned;  else if there is a valid
security.nscapability for your ns, that will be returned.

5. when doing a setxattr of security.capability from a user_ns, if there is
a security.nscapability entry, you get EBUSY;  else a security.nscapability 
with your root kuid will be written provided that (a) you are privileged
over your namespace, (b) you are privileged over your root uid, (c) the
file owner maps into your namespace.

6. when doing a getxattr of security.nscapability, the entry will be shown
with kuid mapped into your namespace or -1 if the uid does not map into
your ns.

7. when doing a setxattr of security.nscapability, if an entry exists, you
get -EBUSY;  if you are not privileged over your ns, your root uid, and
the file owner, then you get -EPERM;  the xattr includes a uid field, which
must be either 0 or a value valid in your ns.  The value will be converted
to a kuid and stored on disk.  (Seth, I'm not sure offhand how that should
mesh with your patches, we can talk about it after I send the next patch,
which I'm quite certain will handle it wrongly)

8. If a security.capability exists, it will override any security.nscapability
at execve() (so, inverse of my previous two patches).

-serge