[Fuego] target_reboot during test_run

Fri Jan 13 00:18:32 UTC 2017

> -----Original Message-----
> From Maciej Pijanowski on  Thursday, January 12, 2017 2:44 PM
>
> Recently we've started to use Fuego as a test framework for our project.
> We had
> some success with first tests, but then we began to struggle when it
> came to test
> our system upgrade process. It requires us to perform target reboot
> during test.
> For example, a part of a test (or one of tests) would be to check current
> mounted rootfs partition, perform target reboot, and after reboot check
> again
> which one from the rootfs partitions has been mounted. Few issues have
> been
> spotted there:

Thanks for reporting these issues.  Reboot is an area that could use some
improvement and this is helpful.
> 
>    - systemlogs.before were missing - I understand that the reason is that
>      systemlogs are stored under /tmp and transferred to host in post_test
>      function; if reboot occours /tmp is being cleared and logs are gone

There are two ways to get around this.  One would be to make the
log file location configurable.  ie, put something in the board file,
and put the logs on a more persistent area of the file system.
this seems like the best way to handle it, and shouldn't be too hard

I think this could be done by modifying function ov_rootfs_logread()
in overlays/base/base-distrib.sh.  that would at least put the 'before'
system log somewhere else.  We could have this function check
FUEGO_TARGET_LOG_DIR or some such variable, and use
that instead of /tmp (if it were defined) as the location for system
logs.

The other way around this problem would be to just ignore the error.
Likely, the system is dropping out of post_test due to the missing
log (failure of the get operation).  The system logs are not essential
for test operation.  They are just used to check for Oopses during the
test.  We could probably wrap the getting of the test logs with a 
set +e and set -e (to avoid erroring out if there's a problem).
However, this seems like papering over the problem.

>    - there are no testlogs on host
Probably this is an artifact of post_test not completing (and the
logs getting wiped out on a reboot).  We probably need a configurable
log directory on the target to solve issues with this disappearing mid-test
as well.  'report_append' expects the testlog to be persistent on the
target, and clearly won't work across reboots in configurations like yours.

>    - overall test fails due to connection loss; after reboot it enters
> post_test
>      function immediately - commands inside test_deploy function located
> after
>      target_reboot command are never executed

Hmmm.  I'm not sure that target_reboot should be in test_deploy.
It seems like it should be in test_run instead, if you're actually doing
testing of the reboot itself, or the reboot is an integral part of the test.

> 
> The same failure happens when trying to run default Benchmark.Reboot
> test with
> default testplan. To be more specific: after reboot it enters post_test
> function and report function never is never executed.
> 
> function test_run {
>      # MAX_REBOOT_RETRIES can be defined in the board's file.
>      # Otherwise, the default is 20 retries
>      retries=${MAX_REBOOT_RETRIES:-20}
>      target_reboot $retries
>      report "cd $FUEGO_HOME/fuego.$TESTDIR; ./reboot"
> }
> 
>  From what I've found there [1] it looks like Benchmark.Reboot is (used
> to) work
> properly.
> 
> I've also found FIXME note in documentation regarding report function
> [2]. Does
> this apply in my case?

I hadn't seen this note added by someone.  Thanks for bringing it to my
attention!

The 'report' operation should finish and the testlog should be on the
target in /tmp, before a target_reboot operation should proceed in
test_run.  Looking at the code in report, it should finish and the log
should be created, before the next operation in the test script.

I think that is referring to a test program (which could be a script)
which has an embedded 'reboot' in it.  The FIXTHIS comment would
be correct in such a case; the connection to the host during the command
will be severed, and if the script has more activity following the reboot, it
won't be recorded in the log (or the script started for that matter)
But I'm not sure how this would work anyway.  I don't
know of any scripts that would run on target and expect to continue
at the next command after a reboot operation.  We can do these type
of reboots in Fuego since the scripts are run on host.

> 
> My question is: what is the current state of target_reboot during
> test_run phase? Maybe I'm missing something or not running
> Benchmark.Reboot
> properly - if so, what would be the correct way?

One quick thing you could do is modify all uses of
/tmp in fuego-core/engine/scripts/functions.sh
and fuego-core/engine/overlays/base/base-distrib.fuegoclass
and replace it with something else. (maybe /usr/tmp or /home/tmp)
Something that doesn't get wiped clean during your startup.

There are 9 uses of /tmp in functions.sh (not in comments)
that I count, 2 in base-distrib.fuegoclass, and it shouldn't be too
hard to just do a search and replace, to see if that solves the problem.

Can you do that and get back to me with the results?

If it works, I'll work up a proper solution that involves using
a default of /tmp and allowing the board file to override
the log file directory.

Thanks.
 -- Tim

P.S. It looks like there's a bug in target_cleanup anyway, that
I found looking through the code.  It does a "rm -rf /tmp/*", which
seems really aggressive and potentially dangerous.  It should
only remove the fuego-related files from /tmp, and not everything.
Thanks for reporting this issue and giving us a chance to review this
part of Fuego.