How much of a mess does OpenVZ make? ;) Was: What can OpenVZ do?

Dave Hansen dave at linux.vnet.ibm.com
Thu Feb 12 15:04:05 PST 2009


On Thu, 2009-02-12 at 14:10 -0800, Andrew Morton wrote:
> On Thu, 12 Feb 2009 13:51:23 -0800
> Dave Hansen <dave at linux.vnet.ibm.com> wrote:
> 
> > On Thu, 2009-02-12 at 11:42 -0800, Andrew Morton wrote:
> > > On Thu, 12 Feb 2009 13:30:35 -0600
> > > Matt Mackall <mpm at selenic.com> wrote:
> > > 
> > > > On Thu, 2009-02-12 at 10:11 -0800, Dave Hansen wrote:
> > > > 
> > > > > > - In bullet-point form, what features are missing, and should be added?
> > > > > 
> > > > >  * support for more architectures than i386
> > > > >  * file descriptors:
> > > > >   * sockets (network, AF_UNIX, etc...)
> > > > >   * devices files
> > > > >   * shmfs, hugetlbfs
> > > > >   * epoll
> > > > >   * unlinked files
> > > > 
> > > > >  * Filesystem state
> > > > >   * contents of files
> > > > >   * mount tree for individual processes
> > > > >  * flock
> > > > >  * threads and sessions
> > > > >  * CPU and NUMA affinity
> > > > >  * sys_remap_file_pages()
> > > > 
> > > > I think the real questions is: where are the dragons hiding? Some of
> > > > these are known to be hard. And some of them are critical checkpointing
> > > > typical applications. If you have plans or theories for implementing all
> > > > of the above, then great. But this list doesn't really give any sense of
> > > > whether we should be scared of what lurks behind those doors.
> > > 
> > > How close has OpenVZ come to implementing all of this?  I think the
> > > implementatation is fairly complete?
> > 
> > I also believe it is "fairly complete".  At least able to be used
> > practically.
> > 
> > > If so, perhaps that can be used as a guide.  Will the planned feature
> > > have a similar design?  If not, how will it differ?  To what extent can
> > > we use that implementation as a tool for understanding what this new
> > > implementation will look like?
> > 
> > Yes, we can certainly use it as a guide.  However, there are some
> > barriers to being able to do that:
> > 
> > dave at nimitz:~/kernels/linux-2.6-openvz$ git diff v2.6.27.10... | diffstat | tail -1
> >  628 files changed, 59597 insertions(+), 2927 deletions(-)
> > dave at nimitz:~/kernels/linux-2.6-openvz$ git diff v2.6.27.10... | wc 
> >   84887  290855 2308745
> > 
> > Unfortunately, the git tree doesn't have that great of a history.  It
> > appears that the forward-ports are just applications of huge single
> > patches which then get committed into git.  This tree has also
> > historically contained a bunch of stuff not directly related to
> > checkpoint/restart like resource management.
> > 
> > We'd be idiots not to take a hard look at what has been done in OpenVZ.
> > But, for the time being, we have absolutely no shortage of things that
> > we know are important and know have to be done.  Our largest problem is
> > not finding things to do, but is our large out-of-tree patch that is
> > growing by the day. :(
> > 
> 
> Well we have a chicken-and-eggish thing.  The patchset will keep
> growing until we understand how much of this:
> 
> > dave at nimitz:~/kernels/linux-2.6-openvz$ git diff v2.6.27.10... | diffstat | tail -1
> >  628 files changed, 59597 insertions(+), 2927 deletions(-)
> 
> we will be committed to if we were to merge the current patchset.

Here's the measurement that Alexey suggested:

dave at nimitz:~/kernels/linux-2.6-openvz$ git diff v2.6.27.10... kernel/cpt/ | diffstat 
 Makefile        |   53 +
 cpt_conntrack.c |  365 ++++++++++++
 cpt_context.c   |  257 ++++++++
 cpt_context.h   |  215 +++++++
 cpt_dump.c      | 1250 ++++++++++++++++++++++++++++++++++++++++++
 cpt_dump.h      |   16 
 cpt_epoll.c     |  113 +++
 cpt_exports.c   |   13 
 cpt_files.c     | 1626 +++++++++++++++++++++++++++++++++++++++++++++++++++++++
 cpt_files.h     |   71 ++
 cpt_fsmagic.h   |   16 
 cpt_inotify.c   |  144 ++++
 cpt_kernel.c    |  177 ++++++
 cpt_kernel.h    |   99 +++
 cpt_mm.c        |  923 +++++++++++++++++++++++++++++++
 cpt_mm.h        |   35 +
 cpt_net.c       |  614 ++++++++++++++++++++
 cpt_net.h       |    7 
 cpt_obj.c       |  162 +++++
 cpt_obj.h       |   62 ++
 cpt_proc.c      |  595 ++++++++++++++++++++
 cpt_process.c   | 1369 ++++++++++++++++++++++++++++++++++++++++++++++
 cpt_process.h   |   13 
 cpt_socket.c    |  790 ++++++++++++++++++++++++++
 cpt_socket.h    |   33 +
 cpt_socket_in.c |  450 +++++++++++++++
 cpt_syscalls.h  |  101 +++
 cpt_sysvipc.c   |  403 +++++++++++++
 cpt_tty.c       |  215 +++++++
 cpt_ubc.c       |  132 ++++
 cpt_ubc.h       |   23 
 cpt_x8664.S     |   67 ++
 rst_conntrack.c |  283 +++++++++
 rst_context.c   |  323 ++++++++++
 rst_epoll.c     |  169 +++++
 rst_files.c     | 1648 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 rst_inotify.c   |  196 ++++++
 rst_mm.c        | 1151 +++++++++++++++++++++++++++++++++++++++
 rst_net.c       |  741 +++++++++++++++++++++++++
 rst_proc.c      |  580 +++++++++++++++++++
 rst_process.c   | 1640 +++++++++++++++++++++++++++++++++++++++++++++++++++++++
 rst_socket.c    |  918 +++++++++++++++++++++++++++++++
 rst_socket_in.c |  489 ++++++++++++++++
 rst_sysvipc.c   |  633 +++++++++++++++++++++
 rst_tty.c       |  384 +++++++++++++
 rst_ubc.c       |  131 ++++
 rst_undump.c    | 1007 ++++++++++++++++++++++++++++++++++
 47 files changed, 20702 insertions(+)

One important thing that leaves out is the interaction that this code
has with the rest of the kernel.  That's critically important when
considering long-term maintenance, and I'd be curious how the OpenVZ
folks view it. 

> Now, we've gone in blind before - most notably on the
> containers/cgroups/namespaces stuff.  That hail mary pass worked out
> acceptably, I think.  Maybe we got lucky.  I thought that
> net-namespaces in particular would never get there, but it did.
> 
> That was a very large and quite long-term-important user-visible
> feature.
> 
> checkpoint/restart/migration is also a long-term-...-feature.  But if
> at all possible I do think that we should go into it with our eyes a
> little less shut.

One thing Ingo has asked for that I understand a bit more clearly is a
programmatic statement of what is and is not covered by this current
code.  That's certainly one eye-opening activity which I'll get to
immediately.

-- Dave



More information about the Containers mailing list