For review (v2): user_namespaces(7) man page

richard -rw- weinberger richard.weinberger at
Fri Apr 26 05:48:08 UTC 2013

On Fri, Apr 26, 2013 at 2:54 AM, Eric W. Biederman
<ebiederm at> wrote:
> richard -rw- weinberger <richard.weinberger at> writes:
>> On Wed, Mar 27, 2013 at 10:26 PM, Michael Kerrisk (man-pages)
>> <mtk.manpages at> wrote:
>>>        Inside the user namespace, the shell has user and group  ID  0,
>>>        and a full set of permitted and effective capabilities:
>>>            bash$ cat /proc/$$/status | egrep '^[UG]id'
>>>            Uid: 0    0    0    0
>>>            Gid: 0    0    0    0
>>>            bash$ cat /proc/$$/status | egrep '^Cap(Prm|Inh|Eff)'
>>>            CapInh:   0000000000000000
>>>            CapPrm:   0000001fffffffff
>>>            CapEff:   0000001fffffffff
>> I've tried your demo program, but inside the new ns I'm automatically nobody.
>> As Eric said, setuid(0)/setgid(0) are missing.
> Is it the setuid/setgid or not setting up the uid/gid map?

uid/git mapping are set up.

>> Eric, maybe you can help me. How can I drop capabilities within a user
>> namespace?
>> In childFunc() I did add prctl(PR_CAPBSET_DROP, CAP_NET_ADMIN) but it always
>> returns ENOPERM.
>> What that? I thought I get a completely fresh set of cap which I can modify.
>> I don't want that uid 0 inside the container has all caps.
> There are weird things that happen with exec and the user namespace.  If
> you have exec'd as an unmapped user all of your capabilities have
> already been droped.

I've setup the mappings. If I look into /proc/*/status I see that my process has
all caps.
So, in general it is possible to drop cap within a user namespace?
I really want to drop CAP_NET_ADMIN and some others.
root within my container must not change any networking settings.

>> And why does /proc/*/loginuid always contain 4294967295 in a new user namespace?
>> Writing to it also fails. (Noticed that because does not work).
> Almost certainly because the loginuid has already been set.  Yes. It
> looks like I am simply using from_kuid instead of from_kuid_munged on
> the read.  So an unmapped loginuid will be reported as 4294967295.
> For some circumstances 65534 (nobody) is definitely better in some it is
> a toss up, and most of the time no one really cares.  So I have tried to
> do something but in this case I don't know which was the best policy.

Hmm, I hoped that loginuid will be reset upon entering a user namespace.

>> Final question, is it by design that uid 0 within a namespace in not
>> allowed to write to
>> /proc/*/oom_score_adj?
> Essentially.  It is by design that uid 0 within a namespace be mapped to
> some other uid outside the namespace, and that the permissions on writes
> should use the permission needed outside of the user namespace.

Okay, I've asked because systemd is a heavy user of this file and
fails due to this
within a user namespace.
Luckily it is possible to remove all the score changes from the .service files.


More information about the Containers mailing list