[RFC][PATCH 2/2] Prevent container-inits from using CLONE_PARENT

Oleg Nesterov oleg at redhat.com
Thu Jun 18 08:35:01 PDT 2009


On 06/17, Sukadev Bhattiprolu wrote:
>
> Prevent container-inits from using CLONE_PARENT
>
> If a container-init creates a sibling (using CLONE_PARENT), pid namespace
> semantics become complicated:
>
> 	- the "active pid namespace" of the sibling will be the descendant
> 	  container, but its not obvious if that is correct.
>
> 	- if container-init exits, it will terminate the sibling, but again
> 	  its not clear if that is the correct behavior.
>
> 	- the sibling exists in both parent and child containers while current
> 	  pid namespace semantics assume that only container-init can exist
> 	  in both parent/child containers.
>
> 	- the parent of the sibling is not a descendant of container-init
> 	  (while pid namespaces assume that all processes in the container
> 	  are descendants of the container-init)

I agree, this all a bit strange and perhaps should be fixed. But afaics,
nothing bad can happen? I mean, if the sub-namespace does stupid things
it can't do a harm to the parent namespace? Or I missed something?

> 	- When the sibling dies, the SIGCHLD is sent to its parent (if
> 	  alive), i.e the signal escapes the container to a parent container.

The same if container-init exits, we send SIGCHLD up. But yes, I agree,
this is a bit strange.

> 	  (if the parent of the sibling exits, the container-init then becomes
> 	  the reaper of the sibling).

Again, strange but harmless.

> To keep pid namespace semantics simple, prevent container-inits from using
> CLONE_PARENT at least until we have a better understanding of CLONE_PARENT
> and pid-namespace interactions.

Yes, perhaps makes sense.

> --- linux-mmotm.orig/kernel/fork.c	2009-06-17 18:23:23.000000000 -0700
> +++ linux-mmotm/kernel/fork.c	2009-06-17 19:17:54.000000000 -0700
> @@ -974,6 +974,14 @@ static struct task_struct *copy_process(
>  	if ((clone_flags & CLONE_SIGHAND) && !(clone_flags & CLONE_VM))
>  		return ERR_PTR(-EINVAL);
>
> +	/*
> +	 * To keep pid namespace semantics simple, prevent container-inits
> +	 * from creating siblings.
> +	 */
> +	if ((clone_flags & CLONE_PARENT) &&
> +			is_container_init(current) && !is_global_init(current))

Both is_ checks are not right afaics. There are per-thread. This means
that container-init can do clone(CLONE_THREAD), and then this thread
does CLONE_PARENT and fools copy_process().

As for !is_global_init(). I never understood what should we do if the
global init does CLONE_PARENT, this attaches another process to swapper,
not good.

Oleg.



More information about the Containers mailing list