Interaction user namespace, /proc/1 ownership & cap_set

Tue Jul 2 20:24:47 UTC 2013

Am 02.07.2013 19:12, schrieb Eric W. Biederman:
> "Daniel P. Berrange" <berrange at redhat.com> writes:
> 
>> On Tue, Jul 02, 2013 at 09:35:39AM -0700, Eric W. Biederman wrote:
>>> Gao feng <gaofeng at cn.fujitsu.com> writes:
>>>
>>>> On 07/02/2013 05:57 PM, Eric W. Biederman wrote:
>>>>> "Daniel P. Berrange" <berrange at redhat.com> writes:
>>>>>
>>>>>> On Tue, Jul 02, 2013 at 10:56:37AM +0200, Richard Weinberger wrote:
>>>>>>> Am 02.07.2013 10:44, schrieb Eric W. Biederman:
>>>>>>>> Gao feng <gaofeng at cn.fujitsu.com> writes:
>>>>>>>>
>>>>>>>>> On 07/02/2013 12:16 AM, Daniel P. Berrange wrote:
>>>>>>>>>> I'm struggling debugging a strange problem with interaction between user
>>>>>>>>>> namespaces, cap_set and ownership of files in /proc/1/
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>> This problem is occured after we call setuid/gid.
>>>>>>>>>
>>>>>>>>> for example, a task whose pid is 1234 calls
>>>>>>>>> setregid(10,10);
>>>>>>>>> setreuid(10,10);
>>>>>>
>>>>>> If seems to get reset to the right values (0:0) when we execve()
>>>>>> the init binary though.  This doesn't happen if we have invoked
>>>>>> the capset() syscall in between the setregid & the execve() calls.
>>>>>
>>>>> Yes, execve() should reset the dumpable state.
>>>>>
>>>>> I took a quick look and I don't see a way around set_dumpable calls in
>>>>> setup_new_exec.  Why the process remains undumpable after exec is worth
>>>>> investigating.  That logic should not be user namespace specific
>>>>> however.
>>>>>
>>>>
>>>> I think it's the install_exec_creds, it calls commit_creds to set process undumpable
>>>>
>>>>         /* dumpability changes */
>>>>         if (!uid_eq(old->euid, new->euid) ||
>>>>             !gid_eq(old->egid, new->egid) ||
>>>>             !uid_eq(old->fsuid, new->fsuid) ||
>>>>             !gid_eq(old->fsgid, new->fsgid) ||
>>>>             !cred_cap_issubset(old, new)) {
>>>>                 if (task->mm)
>>>>                         set_dumpable(task->mm, suid_dumpable);
>>>>                 task->pdeath_signal = 0;
>>>>                 smp_wmb();
>>>>         }
>>>
>>> That looks like it could do it.  Especially if exec is increasing your
>>> capabilities.
>>
>> Ah, yes, that would explain it. My demo is removing the SYS_MODULE
>> capability, and then exec'ing the shell binary. Since we are uid==0,
>> and prctl(PR_CAPBSET_DROP) is not available inside the user namespace,
>> the rules for capabilities vs execve() call will cause the shell
>> binary to regain SYS_MODULE capability bit.
>>
>> So the problem I'm seeing in libvirt is all a result of the fact
>> that we can't use PR_CAPBSET_DROP inside the user namespace. Given
>> that there's no point trying to drop any capabilities inside the
>> user namespace.
>>
>> The only slight problem here is that we want to drop CAP_MKNOD so
>> that systemd can detect that it shouldn't attempt to run any units
>> which would rely on mknod.
> 
> I just looked at that and I don't see a justification for the
> restriciton.
> 
> Could you try the patch below and see if it fixes things for you?

With the patch applied my test program is able to drop it's caps
(using libcap-ng) and does not regain them upon execve.
Also reading from /proc/1/environ works. :)

> Eric
> 
> 
> From: "Eric W. Biederman" <ebiederm at xmission.com>
> Date: Tue, 2 Jul 2013 10:04:54 -0700
> Subject: [PATCH] userns: Allow PR_CAPBSET_DROP in a user namespace.
> 
> As the capabilites and capability bounding set are per user namespace
> properties it is safe to allow changing them with just CAP_SETPCAP
> permission in the user namespace.
> 
> Signed-off-by: "Eric W. Biederman" <ebiederm at xmission.com>

Tested-by: Richard Weinberger <richard at nod.at>

> ---
>  security/commoncap.c |    2 +-
>  1 files changed, 1 insertions(+), 1 deletions(-)
> 
> diff --git a/security/commoncap.c b/security/commoncap.c
> index 4d787e6..fd9b08f 100644
> --- a/security/commoncap.c
> +++ b/security/commoncap.c
> @@ -843,7 +843,7 @@ int cap_task_setnice(struct task_struct *p, int nice)
>   */
>  static long cap_prctl_drop(struct cred *new, unsigned long cap)
>  {
> -	if (!capable(CAP_SETPCAP))
> +	if (!ns_capable(current_user_ns(), CAP_SETPCAP))
>  		return -EPERM;
>  	if (!cap_valid(cap))
>  		return -EINVAL;
> 

Thanks,
//richard