[RFC][PATCH 2/2] CR: handle a single task with private memory maps

Louis Rilling Louis.Rilling at kerlabs.com
Thu Aug 7 02:29:43 PDT 2008


On Wed, Aug 06, 2008 at 09:15:46AM -0700, Joseph Ruscio wrote:
>
> On Aug 5, 2008, at 9:23 AM, Dave Hansen wrote:
>
>> On Mon, 2008-08-04 at 20:51 -0700, Joseph Ruscio wrote:
>>> It might be desirable for the checkpointing implementation to be
>>> modular enough that a userspace application or library could select  
>>> to
>>> handle certain resources on their own. Memory is the primary one that
>>> comes to mind.
>>
>> How would you propose making it modular?
>>
>> -- Dave
>>
>
>
> Well it seems to me that the initial focus here is in live migration of 
> traditional enterprise applications, e.g. databases, app-servers, etc. I 
> think this is the right focus given how much utility the general 
> enterprise is finding in capabilities like VMotion. Providing this 
> mobility to applications without the overhead of traditional VM's would 
> be very valuable.
>
> On the other hand I've been primarily focused in checkpointing large- 
> scale MPI jobs to provide fault tolerance, and that use-case is somewhat 
> different then the live-migration one. These checkpoints have huge RAM 
> footprints (in-core checkpointing is not an option), require  
> coordination across large numbers of servers, some number of open files  
> on an enormous parallel filesystem, and some scratch files open on the 
> local disk/ramdisk. They generally have very simple process trees with 
> one process per core, or one process with a thread for each core.
>
> To support these kinds of jobs, one would ideally instruct the Container 
> checkpointer to ignore network resources, dynamically allocated private 
> memory, and the contents of open files. You'd be relying on the Container 
> checkpointer to recreate processes, open file descriptors, threads, 
> thread synchronization primitives, IPC mechanisms (including shm).
>
> As far as the mechanism is concerned, I'd defer to the more experienced 
> kernel developers here. I assume that passing a bitmask of flags as an 
> argument into the checkpoint syscall would be frowned upon, and anyways 
> redundant, as its unlikely that the mask would change within a container 
> from checkpoint to checkpoint. If each container is going to have a 
> CGroup filesystem directory, then we could have a file(s) along the lines 
> of /proc/sys/kernel/randomize_va_space that turn features off for that 
> Container. The default settings after Container creation would be a 
> complete in-kernel checkpoint/migration.

Did you think about mechanisms/interfaces making the kernel's checkpointing
sub-system and the application/run-time interact to efficiently build the
checkpoint image and restart from it?

Louis

-- 
Dr Louis Rilling			Kerlabs
Skype: louis.rilling			Batiment Germanium
Phone: (+33|0) 6 80 89 08 23		80 avenue des Buttes de Coesmes
http://www.kerlabs.com/			35700 Rennes
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: Digital signature
Url : http://lists.linux-foundation.org/pipermail/containers/attachments/20080807/e6a86eef/attachment.pgp 


More information about the Containers mailing list