[PATCH] Forbid invocation of kexec_load() outside initial PID namespace

Eric W. Biederman ebiederm at xmission.com
Mon Aug 6 19:16:29 UTC 2012


"Serge E. Hallyn" <serge at hallyn.com> writes:

> Quoting Daniel P. Berrange (berrange at redhat.com):
>> From: "Daniel P. Berrange" <berrange at redhat.com>
>> 
>> The following commit
>> 
>>     commit cf3f89214ef6a33fad60856bc5ffd7bb2fc4709b
>>     Author: Daniel Lezcano <daniel.lezcano at free.fr>
>>     Date:   Wed Mar 28 14:42:51 2012 -0700
>> 
>>     pidns: add reboot_pid_ns() to handle the reboot syscall
>> 
>> introduced custom handling of the reboot() syscall when invoked
>> from a non-initial PID namespace. The intent was that a process
>> in a container can be allowed to keep CAP_SYS_BOOT and execute
>> reboot() to shutdown/reboot just their private container, rather
>> than the host.
>> 
>> Unfortunately the kexec_load() syscall also relies on the
>> CAP_SYS_BOOT capability. So by allowing a container to keep
>> this capability to safely invoke reboot(), they mistakenly
>> also gain the ability to use kexec_load(). The solution is
>> to make kexec_load() return -EPERM if invoked from a PID
>> namespace that is not the initial namespace
>> 
>> Signed-off-by: Daniel P. Berrange <berrange at redhat.com>
>> Cc: Serge Hallyn <serge.hallyn at canonical.com>
>
> Acked-by: Serge Hallyn <serge.hallyn at canonical.com>
>
> (Please see my previous email explaining why I believe the pidns
> is an appropriate check)

Serge as to your objects.

If we define kexec_load in terms of the pid namespace then something
makes sense, but the error should be EINVAL, or something of that sort.

That is what we did with reboot.  We defined reboot in terms of the pid
namespace.

Not defining kexec_load in terms of the pid namespace and then returning
EPERM because having we happen to have CAP_SYS_BOOT for other reasons is
semantically horrible.

At the end of the day the effect is the same, but I think it matters a
great deal in how we think about things.

We have CAP_SYS_BOOT in the initial user namespace.  We do have
permission to make the system call.

So I continue to see this patch the way it is current constructed as
broken.

Nacked-by: "Eric W. Biederman" <ebiederm at xmission.com>

Eric


More information about the Containers mailing list