[RFC][PATCH 2/2] CR: handle a single task with private memory maps

Louis Rilling Louis.Rilling at kerlabs.com
Tue Aug 5 02:19:56 PDT 2008


On Mon, Aug 04, 2008 at 08:51:37PM -0700, Joseph Ruscio wrote:
> As somewhat of a tangent to this discussion, I've been giving some  
> thought to the general strategy we talked about during the summit. The  
> checkpointing solution we built at Evergrid sits completely in userspace 
> and is soley focused on checkpointing parallel codes (e.g. MPI). That 
> approach required us to virtualize a whole slew of resources (e.g. PIDs) 
> that will be far better supported in the kernel through this effort. On 
> the other hand, there isn't anything inherent to checkpointing the memory 
> in a process that requires it to be in a kernel. During a restart, you 
> can map and load the memory from the checkpoint file in userspace as 
> easily as in the kernel. Since the cost of checkpointing HPC codes is 

Hmm, for unusual mappings this may be not so easy to reproduce from
userspace if binaries are statically linked. I agree that with
dynamically linked applications, LD_PRELOAD allows one to record the
actual memory mappings and restore them at restart.

> fairly dominated by checkpointing their large memory footprints, memory 
> checkpointing is an area of ongoing research with many different 
> solutions.
>
> It might be desirable for the checkpointing implementation to be modular 
> enough that a userspace application or library could select to handle 
> certain resources on their own. Memory is the primary one that comes to 
> mind.

I definitely agree with you about this flexibility. Actually in
Kerrighed, during the next 3 years, we are going to study an API for
collaborative checkpoint/restart between kernel and userspace, in order to
allow such HPC apps to checkpoint huge memory efficiently (eg. when reaching
states where saving small parts is enough), or to rebuild their data from
partial/older states.
I hope that this study will bring useful ideas that could be applied to
containers as well.

Thanks,

Louis

-- 
Dr Louis Rilling			Kerlabs - IRISA
Skype: louis.rilling			Campus Universitaire de Beaulieu
Phone: (+33|0) 2 99 84 71 52		Avenue du General Leclerc
Fax: (+33|0) 2 99 84 71 71		35042 Rennes CEDEX - France
http://www.kerlabs.com/


More information about the Containers mailing list