[PATCH 2/2] Notify container-init parent a 'reboot' occured

Serge Hallyn serge.hallyn at canonical.com
Thu Aug 11 14:50:05 PDT 2011


Quoting Daniel Lezcano (daniel.lezcano at free.fr):
> On 08/11/2011 11:09 PM, Serge Hallyn wrote:
> > Quoting Daniel Lezcano (daniel.lezcano at free.fr):
> >> When the reboot syscall is called and the pid namespace where the calling
> >> process belongs to is not from the init pidns, we send a SIGCHLD with CLD_REBOOTED
> >> to the parent of this pid namespace.
> >>
> >> Signed-off-by: Daniel Lezcano <daniel.lezcano at free.fr>
> > ...
> >
> >> +void do_notify_parent_cldreboot(struct task_struct *tsk, int why, char *buffer)
> >> +{
> >> +	struct siginfo info = { };
> >> +	struct task_struct *parent;
> >> +	struct sighand_struct *sighand;
> >> +	unsigned long flags;
> >> +
> >> +	if (tsk->ptrace)
> >> +		parent = tsk->parent;
> >> +	else {
> >> +		tsk = tsk->group_leader;
> >> +		parent = tsk->real_parent;
> >> +	}
> >> +
> >> +	info.si_signo = SIGCHLD;
> >> +	info.si_errno = 0;
> >> +	info.si_status = why;
> >> +
> >> +	rcu_read_lock();
> >> +	info.si_pid = task_pid_nr_ns(tsk, parent->nsproxy->pid_ns);
> >> +	info.si_uid = __task_cred(tsk)->uid;
> > 	
> > 	This eventually should become:
> >
> > 	info.si_uid = user_ns_map_uid(task_cred_xxx(t, user_ns),
> > 	                              current_cred(), current_uid());
> >
> > 	I've got a first-stab patch at converting the rest of
> > 	kernel/signal.c in http://kernel.ubuntu.com/git?p=serge/userns-2.6.git
> 
> Ok, thanks.
> 
> >> +	rcu_read_unlock();
> >> +
> >> +	info.si_utime = cputime_to_clock_t(tsk->utime);
> >> +	info.si_stime = cputime_to_clock_t(tsk->stime);
> >> +
> >> +	info.si_code = CLD_REBOOTED;
> >> +
> >> +	sighand = parent->sighand;
> >> +	spin_lock_irqsave(&sighand->siglock, flags);
> >> +	if (sighand->action[SIGCHLD-1].sa.sa_handler != SIG_IGN &&
> >> +	    sighand->action[SIGCHLD-1].sa.sa_flags & SA_CLDREBOOT)
> >> +		__group_send_sig_info(SIGCHLD, &info, parent);
> >> +	/*
> >> +	 * Even if SIGCHLD is not generated, we must wake up wait4 calls.
> >> +	 */
> >> +	__wake_up_parent(tsk, parent);
> >> +	spin_unlock_irqrestore(&sighand->siglock, flags);
> >> +}
> > ...
> >
> >> @@ -426,10 +434,18 @@ SYSCALL_DEFINE4(reboot, int, magic1, int, magic2, unsigned int, cmd,
> >>  {
> >>  	char buffer[256];
> >>  	int ret = 0;
> >> +	struct pid_namespace *pid_ns = current->nsproxy->pid_ns;
> >> +
> >> +        /* We only trust the superuser with rebooting the system. */
> >> +	if (!capable(CAP_SYS_BOOT)) {
> > Doesn't this mean that an unprivileged task in a container can shut
> > down the container?
> 
> Ha ha ! Right, good catch :)
> 
> Yes, rethinking about it, we can do what initially proposed Bruno by
> just preventing to reboot when we are not in the init_pid_ns. Actually, 
> the sys_reboot occurs after the services shutdown and "kill -1 SIGTERM"
> and "kill -1 SIGKILL", and would not make sense to do that in a child
> pid namespace, except if we are in a container where we don't want to
> reboot :)
> 
> So IMO, it is safe to do:
> 
> 	if (!ns_capable(current_pid_ns()->user_ns, CAP_SYS_BOOT))
>  		return -EPERM;

That sounds good.  Until the pid_ns->user_ns patch goes in, just
capable(CAP_SYS_BOOT) works too.

Actually, if this is the only thing CAP_SYS_BOOT grants you, and
if it is always fully namespaced, then I'm not sure there'll ever
be a reason to switch this to ns_capable().

thanks,
-serge

> 	if (pid_ns != &init_pid_ns)
> 		return pid_namespace_reboot(pid_ns, cmd, buffer);
> 
> 
> > The pidns->user_ns patch I sent earlier today gives you what you need
> > so that you can add
> >
> > 		if (!ns_capable(current_pid_ns()->user_ns, CAP_SYS_BOOT)
> > 			return -EPERM;
> >
> > right here to prevent that.
> >
> >> +		/* If we are not in the initial pid namespace, we send a signal
> >> +		 * to the parent of this init pid namespace, notifying a shutdown
> >> +		 * occured */
> >> +		if (pid_ns != &init_pid_ns)
> >> +			pid_namespace_reboot(pid_ns, cmd, buffer);
> >>  
> >> -	/* We only trust the superuser with rebooting the system. */
> >> -	if (!capable(CAP_SYS_BOOT))
> >>  		return -EPERM;
> >> +	}
> >>  
> >>  	/* For safety, we require "magic" arguments. */
> >>  	if (magic1 != LINUX_REBOOT_MAGIC1 ||
> >> -- 
> >> 1.7.4.1
> >>
> >> _______________________________________________
> >> Containers mailing list
> >> Containers at lists.linux-foundation.org
> >> https://lists.linux-foundation.org/mailman/listinfo/containers
> 


More information about the Containers mailing list