C/R review

Wed Mar 18 09:00:08 PDT 2009

On Wed, 2009-03-18 at 06:19 -0400, Oren Laadan wrote:
> >> +The checkpoint image format is composed of records consisting of a
> >> +pre-header that identifies its contents, followed by a payload. (The
> >> +idea here is to enable parallel checkpointing in the future in which
> >> +multiple threads interleave data from multiple processes into a single
> >> +stream).
> > 
> > I have my doubts about parallel checkpoint especially how large container
> > should be to need this and how much more complex code will it results in.
> 
> Doubts about the need ?   if I recall correctly IBM expressed interest in
> checkpointing containers with hundreds/thousands of processes that are
> spread among tens and hundreds of CPUs (multi-processor machine).

At the same time, I'd throw this kind of feature out the window in a
second if it meant getting a smaller or more understandable patch.  It
certainly isn't needed now.

Alexey, I'm really just assuming here, but I'd guess that a normal VPS
has a memory footprint between hundreds of MB or a few GB, right?  We're
also talking completely about RAM contents being moved here because all
the other data is a very small portion of the whole.  Unless creating
the checkpoint is maxing out one CPU, the entire problem this is solving
has to do with I/O bandwidth and availability.

If we really have I/O bandwidth problems, we should probably solve that
at the I/O level and not the checkpoint level.

My only other concern would be on systems with really high NUMA ratios.
A parallel checkpoint there just makes sense because shipping everything
across an interconnect could get really expensive.  We could move the
checkpoint process around to each node as we checkpoint its stuff, but
it would be a little silly to do a serial checkpoint on 1000 NUMA nodes
like that.

Anyway, I do think we should just concentrate on a single-stream
checkpoint for now.  We have a lot of other problems to solve before we
get to 1000 node NUMA machines.

-- Dave