userns: targeted capabilities v4

Serge E. Hallyn serge at
Mon Jan 10 22:43:42 PST 2011

This version addresses feedback from and bugs pointed out by
Bastian.  It also adds user namespace checks in fs/namei.c.
If a task reads a file owned by another user_ns, it gets the
world access rights to that file.  Since inodes don't yet have
a user namespace, we just declare that init_user_ns owns them all.
So if you are root in a child user namespace, you effectively
are roaming the system as user nobody.  See
for prior discussions.

[ Intro to v3 follows ]

The core of the set is patch 2, originally conceived and
implemented by Eric Biederman.  The concept is to target
capabilities at user namespaces.  A task's capabilities are
now contextualized as follows (previously, capabilities had
no context):

1. For a task in the initial user namespace, the calculated
capabilities (pi, pe, pp) are available to act upon any
user namespace.

2. For a task in a child user namespace, the calculated
capabilities are available to act only on its own or any
descendent user namespace.  It has no capabilities to any
parent or unrelated user namespaces.

3. If a user A creates a new user namespace, that user has
all capabilities to that new user namespace and any of its
descendents.  (Contrast this with a user namespace created
by another user B in the same user namespace, to which this
user A has only his calculated capabilities)

All existing 'capable' checks are automatically converted to
checks against the initial user namespace.  The rest of the
patches begin to enable capabilities in child user namespaces
to setuid, setgid, set hostnames, kill tasks, and do ptrace.

My next step would be to re-introduce a part of a several year
old patchset which assigns a userns to a superblock (and hence
to inodes), and grants 'user other' permissions to any task
whose uid does not map to the target userns.  (By default, this
will be all but the initial userns)


More information about the Containers mailing list