[PATCH 1/3] powerpc: bare minimum checkpoint/restart implementation
orenl at cs.columbia.edu
Wed Mar 18 02:15:05 PDT 2009
An alternative: the task that created the container namely, is the parent
(outside the container) of the container init(1). In turn, init(1) creates
a special 'monitor' thread that monitors the restart, and the outside task
reaps the exit status of that thread (and only that thread).
[Hmmm... thinking about this - what happens if the container init(1) calls
clone() with CLONE_PARENT ?? does it not generate sort of a competing
container init(1) ??!!
Cedric Le Goater wrote:
>> Again, how would 'cr' obtain exit status for these tasks, and how would
>> it distinguish failure from normal operation?
> Here's our solution to this issue.
> mcr maintains in its kernel container object an exitcode attribute for
> the mcr-restart process. This process is detached from the fork tree of
> the restarted application.
> when the restart is finished, an mcr-wait command can be called to reap
> this exitcode. This make it possible to distinguish an exit of the
> application process from an exit of the mcr-restart process.
> This is a must-have for batch managers in an HPC environment.
More information about the Containers