[PATCH 1/1] namespaces: introduce sys_hijack (v11)
Serge E. Hallyn
serue at us.ibm.com
Tue Aug 12 10:06:58 PDT 2008
Quoting Serge E. Hallyn (serue at us.ibm.com):
> Quoting Bastian Blank (bastian at waldi.eu.org):
> > On Fri, Aug 01, 2008 at 11:39:05AM -0500, Serge E. Hallyn wrote:
> > > Quoting Bastian Blank (bastian at waldi.eu.org):
> > > > Why is it not enough to use the pid of the ns creator? The ns cgroups
> > >
> > > pids wrap around
> > Ups, yes.
> > > > But I think I have a different problem. Currently, namespaces are
> > > > destructed if the last process using them exits. You change that, they
> > > > will survive until the cgroup dies. Or is that cgroup destructed when
> > > > there are no longer processes using the nsproxy? As the commit message
> > > > speaks about "pid wraparound" as problem, I doubt that.
> > >
> > > Correct. Having the namespaces stick around, and being able to attach
> > > to an empty container, was something Paul Menage had wanted IIRC.
> > It may produce problems with pid namespaces. The namespace is cleared if
> > the child reaper dies and I'm not sure how well it behaves without a new
> > one, which you can't create.
> > > But I'll leave that as is for now, until I hear something other than
> > > "this is so wrong it isn't funny" from Pavel :)
> > I'm not sure if it is funny to add another piece which may hold
> > filesystems open. Currently we can have different namespaces. All of
> > them are attached to processes and can be removed with kill. Now this
> > code adds another copy to an (automatically created) cgroup.
> > IMHO, the cgroup should be destructed automatically if the nsproxy is
> > about to be die.
> I certainly don't think your caution is unwarranted. I like to keep the
> refcounting in all of this as simple as possible.
And as always those calling for caution are vindicated. It turns out I
was grabbing a double-refcount on the nsproxy when a ns_cgroup is cloned.
After fixing that, I get warnings about potential circular locking
involving cgroup_mutex and namespace_sem. This is because cgroup_mutex
depends on namespace_sem, but now doing rmdir on a once-filled ns_cgroup
But again, this patch was resent to solicit comment on the general
approach. So I will put this patch aside again, unless I hear:
1. From Pavel, that he actually would like to use this approach for
2. From Paul, that he still has a need for entering empty cgroups.
Otherwise, there is still the point of view (held I believe by Eric)
that the right thing to do is provide the monitoring and control over
containers that we need through proper namespace semantics and exported
More information about the Containers