[PATCH] userns: honour no_new_privs for cap_bset during user ns creation/switch

Maciej Żenczykowski zenczykowski at gmail.com
Fri Dec 22 01:51:49 UTC 2017


> Good point about CAP_DAC_OVERRIDE on files you own.
>
> I think there is an argument that you are playing dangerous games with
> the permission system there, as it isn't effectively a file you own if
> you can't read it, and you can't change it's permissions.

Append-only files are useful - particularly for logging.
It could also simply be a non-readable file on a R/O filesystem.

> Given little things like that I can completely see no_new_privs meaning
> you can't create a user namespace.  That seems consistent with the
> meaning and philosophy of no_new_privs.  So simple it is hard to get
> wrong.

Yes, I could totally buy the argument that no_new_privs should prevent
creating a user ns.

However, there's also setns() and that's a fair bit harder to reason about.
Entirely deny it?  But that actually seems potentially useful...
Allow it but cap it?  That's what this does...

> We could do more clever things like plug this whole in user namespaces,
> and that would not hurt my feelings.

Sure, this particular one wouldn't be all that easy I think... and how
many such holes are there?
I found this particular one *after* your first reply in this thread.

> However unless that is our only
> choice to avoid badly breaking userspace I would have to have to depend
> on user namespaces being perfect for no_new_privs to be a proper jail.

This stuff is ridiculously complex to get right from userspace. :-(

> As a general rule user namespaces are where we tackle the subtle scary
> things that should work, and no_new_privs is where we implement a simple
> hard to get wrong jail.  Most of the time the effect is the same to an
> outside observer (bounded permissions), but there is a real difference
> in difficulty of implementation.

So, where to now...

Would you accept patches that:

- make no_new_priv block user ns creation?

- make no_new_priv block user ns transition?

Or perhaps we can assume that lack of create privs is sufficient, and
if there's a pre-existing user ns for you to enter, then that's
acceptable...
Although this implies you probably always want to combine no_new_privs
with a leaf user ns, or no_new_privs isn't all that useful for root in
root ns...
This added complexity, probably means it should be blocked...

- inherits bset across user ns creation/transition based on X?
[this is the one we care about, because there are simply too many bugs
in the kernel wrt. certain caps]
X could be:
- a new flag similar to no_new_priv
- a new securebit flag (w/lockbit)  [provided securebits survive a
userns transition, haven't checked]
- or perhaps a new capability
- something else?

How do we make forward progress?


More information about the Containers mailing list