[RFC][PATCH 2/2] Prevent container-inits from using CLONE_PARENT

Eric W. Biederman ebiederm at xmission.com
Wed Jun 17 20:20:21 PDT 2009


Sukadev Bhattiprolu <sukadev at linux.vnet.ibm.com> writes:

> Prevent container-inits from using CLONE_PARENT
>
> If a container-init creates a sibling (using CLONE_PARENT), pid namespace
> semantics become complicated:
>
> 	- the "active pid namespace" of the sibling will be the descendant
> 	  container, but its not obvious if that is correct.

It is correct the sibling must not change pid namespaces.  You are not
allowed to escape out of a pid namespace.

> 	- if container-init exits, it will terminate the sibling, but again
> 	  its not clear if that is the correct behavior.

Again correct because the container-init is the child reaper for the pid namespace.
No reaper no namespace.

> 	- the sibling exists in both parent and child containers while current
> 	  pid namespace semantics assume that only container-init can exist
> 	  in both parent/child containers.

All tasks in the container also exist in the parent container.
What assumption are you talking about?

> 	- the parent of the sibling is not a descendant of container-init
> 	  (while pid namespaces assume that all processes in the container
> 	  are descendants of the container-init)

User space assumes that certainly.    What part of the pid namespace
code makes such an assumption?

> 	- When the sibling dies, the SIGCHLD is sent to its parent (if
> 	  alive), i.e the signal escapes the container to a parent container.
> 	  (if the parent of the sibling exits, the container-init then becomes
> 	  the reaper of the sibling).

Yes.

> To keep pid namespace semantics simple, prevent container-inits from using
> CLONE_PARENT at least until we have a better understanding of CLONE_PARENT
> and pid-namespace interactions.

The only argument that I can see that carries any weight is that unix
semantics fundamentally assume a process tree.  Allowing init to use
CLONE_PARENT creates a multi-rooted process tree.

At which point the is_global_init check is foolish.

Eric


> Untested, RFC patch :-)
>
> Signed-off-by: Sukadev Bhattiprolu <sukadev at us.ibm.com>
> ---
>  kernel/fork.c |    8 ++++++++
>  1 file changed, 8 insertions(+)
>
> Index: linux-mmotm/kernel/fork.c
> ===================================================================
> --- linux-mmotm.orig/kernel/fork.c	2009-06-17 18:23:23.000000000 -0700
> +++ linux-mmotm/kernel/fork.c	2009-06-17 19:17:54.000000000 -0700
> @@ -974,6 +974,14 @@ static struct task_struct *copy_process(
>  	if ((clone_flags & CLONE_SIGHAND) && !(clone_flags & CLONE_VM))
>  		return ERR_PTR(-EINVAL);
>  
> +	/*
> +	 * To keep pid namespace semantics simple, prevent container-inits
> +	 * from creating siblings.
> +	 */
> +	if ((clone_flags & CLONE_PARENT) &&
> +			is_container_init(current) && !is_global_init(current))
> +		return ERR_PTR(-EINVAL);
> +
>  	retval = security_task_create(clone_flags);
>  	if (retval)
>  		goto fork_out;


More information about the Containers mailing list