[PATCH] c/r: Add UTS support (v4)

Serge E. Hallyn serue at us.ibm.com
Fri Mar 20 11:05:06 PDT 2009

Quoting Oren Laadan (orenl at cs.columbia.edu):
> Dan Smith wrote:
> > OL> So what happens in the following scenario:
> > 
> > OL> * task A is the container init(1)
> > OL> * A calls fork() to create task B
> > OL> * B calls unshare(CLONE_NEWUTS)
> > OL> * B calls clone(CLONE_PARENT) to create task C
> > 
> > In the previous version of the patch, I failed the checkpoint if this
> > was the case by making sure that all tasks in the set had the same
> > nsproxy.  You said in IRC that this was already done elsewhere in the
> > infrastructure, but now that I look I don't see that anywhere.
> > 
> in cr_may_checkpoint_task():
>  285         /* FIXME: change this for nested containers */
>  286         if (task_nsproxy(t) != ctx->root_nsproxy)
>  287                 return -EPERM;
> > The check I had was in response to Daniel's comments about avoiding
> > the situation for the time being by making sure that all the tasks had
> > the same set of namespaces (i.e. the same nsproxy at the time of
> > checkpoint).
> > 
> > OL> Two approaches to solve this are:
> > 
> > OL> a) Identify, in mktree, that this was the case, and impose an
> > OL> order on the forks/clones to recreate the same dependency (an
> > OL> algorithm for this is described in [1])
> > 
> > OL> b) Do it in the kernel: for each nsproxy (identified by an objref)
> > OL> the first task that has it will create it during restart, in or
> > OL> out of the kernel, and the next task will simply attach to the
> > OL> existing one that will be deposited in the objhash.
> > 
> > I think that prior discussion led to the conclusion that simplicity
> > wins for the moment, but if you want to solve it now I can cook up
> > some changes.
> > 
> If we keep the assumption, for simplicity, that all tasks share the
> same namespace, then the checkpoint code should check, once, how that
> nsproxy differs from the container's parent (except for the obvious
> pidns).

I disagree.  Whether the container had its own utsns doesn't
affect whether it should have a private utsns on restart.

> If it does differ, e.g. in uts, then the checkpoint should save the
> uts state _once_ - as in global data. Restart will restore the state
> also _once_, for the init of the container (the first task restored),
> _before_ it forks the rest of the tree.
> Otherwise, we don't get the same outcome.

Again I disagree.  If we were planning on never supporting nested
uts namespaces it woudl be fine, but what you are talking about
is making sure we have to break the checkpoint format later to support
nested namespaces.

Rather, we should do:

1. record the hostname for the container in global data.
2. The restart program can decide whether to honor the global
   checkpoint image hostname or not.  It can either use a
   command line option, or check whether the recorded hostname
   is different from the restart host.  I prefer the former.
3. for each task, leave an optional spot for hostname.  If
   there is a hostname, then it will unshare(CLONE_NEWUTS)
   and set its hostname before calling sys_restart() or
   cloning any child tasks.


More information about the Containers mailing list