How much of a mess does OpenVZ make? ;) Was: What can OpenVZ do?

Alexey Dobriyan adobriyan at
Sun Mar 1 12:56:59 PST 2009

On Sun, Mar 01, 2009 at 02:02:31PM -0600, Serge E. Hallyn wrote:
> Quoting Alexey Dobriyan (adobriyan at
> > On Fri, Feb 27, 2009 at 01:31:12AM +0300, Alexey Dobriyan wrote:
> > > This is collecting and start of dumping part of cleaned up OpenVZ C/R
> > > implementation, FYI.
> > 
> > OK, here is second version which shows what to do with shared objects
> > (cr_dump_nsproxy(), cr_dump_task_struct()), introduced more checks
> > (still no unlinked files) and dumps some more information including
> > structures connections (cr_pos_*)
> > 
> > Dumping pids in under thinking because in OpenVZ pids are saved as
> > numbers due to CLONE_NEWPID is not allowed in container. In presense
> > of multiple CLONE_NEWPID levels this must present a big problem. Looks
> > like there is now way to not dump pids as separate object.
> > 
> > As result, struct cr_image_pid is variable-sized, don't know how this will
> > play later.
> > 
> > Also, pid refcount check for external pointers is busted right now,
> > because /proc inode pins struct pid, so there is almost always refcount
> > vs ->o_count mismatch.
> > 
> > No restore yet. ;-)
> Hi Alexey,
> thanks for posting this.  Of course there are some predictable responses
> (I like the simplicity of pure in-kernel, Dave will not :) but this
> needs to be posted to make us talk about it.
> A few more comments that came to me while looking it over:
> 1. cap_sys_admin check is unfortunate.  In discussions about Oren's
> patchset we've agreed that not having that check from the outset forces
> us to consider security with each new patch and feature, which is a good
> thing.

Removing CAP_SYS_ADMIN on restore?

> 2. if any tasks being checkpointed are frozen, checkpoint has the
> side effect of thawing them, right?

Haven't tried, but should be a bug, yes. It will be "thaw or kill"
depending on "flags".

> 3. wrt pids, i guess what you really want is to store the pids from
> init_tsk's level down to the task's lowest pid, right?  Then you
> manually set each of those on restart?  Any higher pids of course
> don't matter.

Yes, numbers are really meant to be from init_tsk level.

> 4. do you have any thoughts on what to do with the mntns info at
> restart?  Will you try to detect mounts which need to be re-created?
> How?

Haven't thought, but it will be tricky for sure :^)

> 5. Since you're always setting f_pos, this won't work straight over
> a pipe?  Do you figure that's just not a worthwhile feature?

So far there were no loops when dumping data structures, but I _think_
there will be some, so seeking over dumpfile would be inevitable.

> Were you saying (in response to Dave) that you're having private
> discussions about whether to pursue posting this as an alternative
> to Oren's patchset?  If so, any updates on those discussions?

Right now, no.

More information about the Containers mailing list