[PATCH] userns: honour no_new_privs for cap_bset during user ns creation/switch
zenczykowski at gmail.com
Fri Dec 22 01:51:49 UTC 2017
> Good point about CAP_DAC_OVERRIDE on files you own.
> I think there is an argument that you are playing dangerous games with
> the permission system there, as it isn't effectively a file you own if
> you can't read it, and you can't change it's permissions.
Append-only files are useful - particularly for logging.
It could also simply be a non-readable file on a R/O filesystem.
> Given little things like that I can completely see no_new_privs meaning
> you can't create a user namespace. That seems consistent with the
> meaning and philosophy of no_new_privs. So simple it is hard to get
Yes, I could totally buy the argument that no_new_privs should prevent
creating a user ns.
However, there's also setns() and that's a fair bit harder to reason about.
Entirely deny it? But that actually seems potentially useful...
Allow it but cap it? That's what this does...
> We could do more clever things like plug this whole in user namespaces,
> and that would not hurt my feelings.
Sure, this particular one wouldn't be all that easy I think... and how
many such holes are there?
I found this particular one *after* your first reply in this thread.
> However unless that is our only
> choice to avoid badly breaking userspace I would have to have to depend
> on user namespaces being perfect for no_new_privs to be a proper jail.
This stuff is ridiculously complex to get right from userspace. :-(
> As a general rule user namespaces are where we tackle the subtle scary
> things that should work, and no_new_privs is where we implement a simple
> hard to get wrong jail. Most of the time the effect is the same to an
> outside observer (bounded permissions), but there is a real difference
> in difficulty of implementation.
So, where to now...
Would you accept patches that:
- make no_new_priv block user ns creation?
- make no_new_priv block user ns transition?
Or perhaps we can assume that lack of create privs is sufficient, and
if there's a pre-existing user ns for you to enter, then that's
Although this implies you probably always want to combine no_new_privs
with a leaf user ns, or no_new_privs isn't all that useful for root in
This added complexity, probably means it should be blocked...
- inherits bset across user ns creation/transition based on X?
[this is the one we care about, because there are simply too many bugs
in the kernel wrt. certain caps]
X could be:
- a new flag similar to no_new_priv
- a new securebit flag (w/lockbit) [provided securebits survive a
userns transition, haven't checked]
- or perhaps a new capability
- something else?
How do we make forward progress?
More information about the Containers