Testing Linux-CR -- Some Documentation

Raghu D K dk.raghu at gmail.com
Wed May 18 00:09:54 PDT 2011


Hi,

>  1. add '-l logfile' arguments to checkpoint and restart commands,
>     to put more debug messages into 'logfile'  (which must not yet
>     exist)
>  2. add '-v' argument to checkpoint and restart for debugging
>  3. look at /var/log/syslog for lots of error messages, assuming
>     you have CONFIG_CHECKPOINT_DEBUG (or whatever that is called)
>     set in your kernel
>  4. after doing checkpoint, use 'ckptinfo', which came with the
>     user-cr programs, to analyze the checkpoint image

I have done all of these and tried even the "ckptinfo", which every
reports the error of "unexpected end of file".


> I suspect what happened to you, though, is that you left file
> descriptors open.  If you look at counterloop/crcounter.c in
> the tests, it does 'for i in (1..100) close(i)'.  The problem
> with not doing this is that the program you are checkpointing has
> inherited file descriptors from its parent task, and, at restart,
> it has no way to recreate those.

I am not testing the sample scripts, I just wrote a sample one as I am
not able to understand
how the Linux CR is supposed to work.
1. Is it mandatory to have the "mount -tcgroup -o freezer cgroup
/cgroup" mounted ?
2. Do we have to launch programs using "nsexec" to be able to
checkpoint and restart them ?

I have tried all the "--help" options, however failed to get the
described results. Even the "self_checkpoint" & "self_restart" code,
provided in the "linux/Documentation/checkpoint" folder is not
executing as described in "usage.txt"

The only application that is showing positive result is
"/test-cr/simple/ckpt.c"

Warm Regards,
Raghu

On Mon, May 16, 2011 at 6:57 PM, Serge E. Hallyn <serge at hallyn.com> wrote:
> Quoting Raghu D K (dk.raghu at gmail.com):
>> Hello All,
>>
>> I moved  the "#!/bin/sh" to point to "bash" however I still see issues
>> in used the "git://www.linux-cr.org/pub/git/tests-cr" scripts.
>> Probably I am missing something with my wrong understanding, I am a
>> little confused with the usage of user space application "checkpoint"
>> and "restart" and the applications in the "test-cr" folder.
>>
>> I wrote a sample shell script "my-test.sh" and tried the following
>> without much success.
>>
>> #!/bin/sh
>> #
>> #
>> #***********************************************************************************
>>
>> echo "Incrementing variable ..."
>> COUNT=$1
>> X=0
>> while [ $X -le $COUNT ];
>> do
>>         X=$(( $X + 1 ))
>>         echo "Value of X =" $X
>>         sleep 1
>> done
>>
>>
>> $ cd ~/user-cr
>> $ mount -tcgroup -o freezer cgroup /cgroup
>> $ mkdir -p /cgroup/1
>> $ nsexec -z5000 my-test.sh 100 &
>> $ echo 5000 > /cgroup/1/tasks
>> $ echo FROZEN > /cgroup/1/freezer.state
>>
>> $ checkpoint 5000 > ckpt.image
>>
>> This generated a "ckpt.image" file of size 2594550 bytes
>>
>> $ ckptinfo -epv ckpt.image
>> info: [@8] object   1 HDR_HEADER len 72
>> info: [@80] object   4 HDR_BUFFER len 73
>> info: [@153] object   4 HDR_BUFFER len 73
>> info: [@226] object   4 HDR_BUFFER len 73
>> ...
>> unexpected end of file (read 0 of 8)
>>
>> $ kill -9 5000
>> $ echo THAWED > /cgroup/1/freezer.state
>> $ ./restart < ckpt.image
>>
>> This one shows error "Bad file discriptor", what I am missing ?
>
> First, you can find more information about what went wrong in a
> few ways:
>
>  1. add '-l logfile' arguments to checkpoint and restart commands,
>     to put more debug messages into 'logfile'  (which must not yet
>     exist)
>  2. add '-v' argument to checkpoint and restart for debugging
>  3. look at /var/log/syslog for lots of error messages, assuming
>     you have CONFIG_CHECKPOINT_DEBUG (or whatever that is called)
>     set in your kernel
>  4. after doing checkpoint, use 'ckptinfo', which came with the
>     user-cr programs, to analyze the checkpoint image
>
> I suspect what happened to you, though, is that you left file
> descriptors open.  If you look at counterloop/crcounter.c in
> the tests, it does 'for i in (1..100) close(i)'.  The problem
> with not doing this is that the program you are checkpointing has
> inherited file descriptors from its parent task, and, at restart,
> it has no way to recreate those.
>
> -serge
>


More information about the Containers mailing list