[RFC][v4][PATCH 7/7]: Define clone_extended() syscall

Serge E. Hallyn serue at us.ibm.com
Thu Aug 6 08:55:20 PDT 2009


Quoting Oren Laadan (orenl at librato.com):
> 
> 
> Serge E. Hallyn wrote:
> > Quoting Sukadev Bhattiprolu (sukadev at linux.vnet.ibm.com):
> >> Subject: [RFC][v4][PATCH 7/7]: Define clone_extended() syscall
> >>
> >> Container restart requires that a task have the same pid it had when it was
> >> checkpointed. When containers are nested the tasks within the containers
> >> exist in multiple pid namespaces and hence have multiple pids to specify
> >> during restart.
> >>
> >> This patch defines, a new system call, clone_extended() which is like clone(),
> >> but takes a new 'pid_set' parameter.  This parameter lets caller choose
> >> specific pid numbers for the child process, in the process's active and
> >> ancestor pid namespaces. (Descendant pid namespaces in general don't matter
> >> since processes don't have pids in them anyway, but see comments in
> >> copy_target_pids() regarding CLONE_NEWPID).
> >>
> >> Unlike clone(), however, clone_extended() needs CAP_SYS_ADMIN, at least for
> >> now, to prevent unprivileged processes from misusing this interface.
> > 
> > It only needs that when specifying pids.
> > 
> >> While the main motivation for this interface is the need to let a process
> >> choose its 'pid numbers', the clone_extended() interface uses 64-bit clone
> >> flags.  The 'higher' portion of the clone flags are unused and are only
> >> included to preclude yet another version of clone when a new clone flag is
> >> needed. 
> >>
> >> ===== Interface:
> >>
> >> Compared to clone(), clone_extended() needs to pass in three more pieces
> >> of information:
> >>
> >> 	- additional 32-bit of clone_flags
> >> 	- number of pids in the set
> >> 	- user buffer containing the list of pids.
> >>
> >> But since clone() already takes 5 parameters and some (all ?) architectures
> >> are restricted to 6 parameters to a system-call, additional data-structures
> >> (and copy_from_user()) are needed.
> >>
> >> The proposed interface for clone_extended() is:
> >>
> >> 	struct clone_tid_info {
> >> 		void *parent_tid; 	/* parent_tid_ptr parameter */
> >> 		void *child_tid; 	/* child_tid_ptr parameter */
> >> 	};
> >>
> >> 	struct pid_set {
> >> 		int num_pids;
> >> 		pid_t *pids;
> >> 	};
> >>
> >> 	int clone_extended(int flags_low, int flags_high, void *child_stack,
> >> 			void *unused, struct clone_tid_info *tid_ptrs,
> >> 			struct pid_set *pid_setp);
> > 
> > I was thinking additional flags would be passed in the (renamed)
> > struct pid_set.
> 
> Yes.
> 
> But maybe in (renamed) 'struct clone_info' instead of 'struct pid_set' ?
> 
> I vaguely recall a strong preference to not require copy-from-user
> during a fast-path clone, because it may hurt performance.
> 
> *If* this is the case, then maybe place extra flags among the
> "base" args, or at least a CLONE_EXTRA would indicate that more
> arguments need to be pulled from user-space ?

Wouldn't passing NULL for struct clone_info suffice?

> Do you intend to get feedback from LKML too ?
> 
> Oren.


More information about the Containers mailing list