Interaction user namespace, /proc/1 ownership & cap_set

Tue Jul 2 17:12:34 UTC 2013

"Daniel P. Berrange" <berrange at redhat.com> writes:

> On Tue, Jul 02, 2013 at 09:35:39AM -0700, Eric W. Biederman wrote:
>> Gao feng <gaofeng at cn.fujitsu.com> writes:
>> 
>> > On 07/02/2013 05:57 PM, Eric W. Biederman wrote:
>> >> "Daniel P. Berrange" <berrange at redhat.com> writes:
>> >> 
>> >>> On Tue, Jul 02, 2013 at 10:56:37AM +0200, Richard Weinberger wrote:
>> >>>> Am 02.07.2013 10:44, schrieb Eric W. Biederman:
>> >>>>> Gao feng <gaofeng at cn.fujitsu.com> writes:
>> >>>>>
>> >>>>>> On 07/02/2013 12:16 AM, Daniel P. Berrange wrote:
>> >>>>>>> I'm struggling debugging a strange problem with interaction between user
>> >>>>>>> namespaces, cap_set and ownership of files in /proc/1/
>> >>>>>>>
>> >>>>>>
>> >>>>>> This problem is occured after we call setuid/gid.
>> >>>>>>
>> >>>>>> for example, a task whose pid is 1234 calls
>> >>>>>> setregid(10,10);
>> >>>>>> setreuid(10,10);
>> >>>
>> >>> If seems to get reset to the right values (0:0) when we execve()
>> >>> the init binary though.  This doesn't happen if we have invoked
>> >>> the capset() syscall in between the setregid & the execve() calls.
>> >> 
>> >> Yes, execve() should reset the dumpable state.
>> >> 
>> >> I took a quick look and I don't see a way around set_dumpable calls in
>> >> setup_new_exec.  Why the process remains undumpable after exec is worth
>> >> investigating.  That logic should not be user namespace specific
>> >> however.
>> >> 
>> >
>> > I think it's the install_exec_creds, it calls commit_creds to set process undumpable
>> >
>> >         /* dumpability changes */
>> >         if (!uid_eq(old->euid, new->euid) ||
>> >             !gid_eq(old->egid, new->egid) ||
>> >             !uid_eq(old->fsuid, new->fsuid) ||
>> >             !gid_eq(old->fsgid, new->fsgid) ||
>> >             !cred_cap_issubset(old, new)) {
>> >                 if (task->mm)
>> >                         set_dumpable(task->mm, suid_dumpable);
>> >                 task->pdeath_signal = 0;
>> >                 smp_wmb();
>> >         }
>> 
>> That looks like it could do it.  Especially if exec is increasing your
>> capabilities.
>
> Ah, yes, that would explain it. My demo is removing the SYS_MODULE
> capability, and then exec'ing the shell binary. Since we are uid==0,
> and prctl(PR_CAPBSET_DROP) is not available inside the user namespace,
> the rules for capabilities vs execve() call will cause the shell
> binary to regain SYS_MODULE capability bit.
>
> So the problem I'm seeing in libvirt is all a result of the fact
> that we can't use PR_CAPBSET_DROP inside the user namespace. Given
> that there's no point trying to drop any capabilities inside the
> user namespace.
>
> The only slight problem here is that we want to drop CAP_MKNOD so
> that systemd can detect that it shouldn't attempt to run any units
> which would rely on mknod.

I just looked at that and I don't see a justification for the
restriciton.

Could you try the patch below and see if it fixes things for you?

Eric


From: "Eric W. Biederman" <ebiederm at xmission.com>
Date: Tue, 2 Jul 2013 10:04:54 -0700
Subject: [PATCH] userns: Allow PR_CAPBSET_DROP in a user namespace.

As the capabilites and capability bounding set are per user namespace
properties it is safe to allow changing them with just CAP_SETPCAP
permission in the user namespace.

Signed-off-by: "Eric W. Biederman" <ebiederm at xmission.com>
---
 security/commoncap.c |    2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/security/commoncap.c b/security/commoncap.c
index 4d787e6..fd9b08f 100644
--- a/security/commoncap.c
+++ b/security/commoncap.c
@@ -843,7 +843,7 @@ int cap_task_setnice(struct task_struct *p, int nice)
  */
 static long cap_prctl_drop(struct cred *new, unsigned long cap)
 {
-	if (!capable(CAP_SETPCAP))
+	if (!ns_capable(current_user_ns(), CAP_SETPCAP))
 		return -EPERM;
 	if (!cap_valid(cap))
 		return -EINVAL;
-- 
1.7.5.4