[Fuego] Mark Job status as Pass during Target Board restart using kexec utility

Tim.Bird at sony.com Tim.Bird at sony.com
Tue Oct 9 17:13:50 UTC 2018


> -----Original Message-----
> From: Dhinakar Kalyanasundaram 
> 
> Hi All,
> 
> I have requirement where I need to load different kernels and execute test
> suites meant for each of them.

This is a great use case to support.   Let's see if we can get it working!

> In this regard I created a job and just called them in fuego_test.sh as shown
> below:
> 
> ........
> 
> report "cd /mnt/opt/mytest; sh kexec.sh"

Do you run kexec.sh only once per test, or do you
run it multiple times?

> 
> ........
>  
> The job runs fine and kernel is replaced and restarted but the job status at
> the end will be marked as 'FAILURE' because of the error in the execution log
> snippet provided below.
> 
> ....
> 
> Executing kexec -d -e. . .!!!
> arch_process_options:148: command_line: (null)
> arch_process_options:150: initrd: (null)
> arch_process_options:151: dtb: (null)
> Write failed: Broken pipe
> Build step 'Execute shell' marked build as failure
> ....

So the ssh session is going to drop during the execution of the test.
That is what I believe is causing your broken pipe.

There are two issues:
1) we need to turn off the shell's error handling (on the host, running
fuego_test.sh).  Normally, the commands in fuego_test.sh are run
with the shell in 'set -e' mode, which will cause the test to error out
if any single command fails (returns an error code).
There are numerous places where we temporarily disable that, by adding
set +e
(some lines that might fail)
set -e
around the lines that might have a problem. 

You could try that here:
set +e
report "cd /mnt/opt/mytest; sh kexec.sh"
set -e

Then, that leads to the second issue:
2) log file and exit code handling

Fuego's report function is structured to issue the requested command
on the target, save the result code from the operation, and redirect
the output from the command into the fuego test log (all on the target board).  This
log is later collected from the board by fuego, after all 'report' and 'report_append'
functions are run (during post_test).

In the case of kexec.sh, I'm not sure where the exit code should go.  Does
kexec.sh even return?  How does one assess the success of the operation (by whether
the kexec'ed kernel boots?)
Also, clearly operations to the board filesystem are interrupted when the
transition to the new kernel occurs.

I have been thinking about adding a new 'report' routine that handles logging
and the exit code differently.  Rather than save the log into a file on the board,
it would save the log into a file on the host.  This means that when the board
hangs, or transitions to a new kernel, we would get a partial log on the host
side (consisting of data that had already been transmitted from the test program
on the board over ssh).

Here is a possible new function (I haven't tested this):

function report_live_no_fail {
# $1 - remote shell command
  is_empty $1

  set +e
  cmd "{ $1 } 2>&1"  | tee -a $LOGDIR/livelog.txt
  RESULT=${PIPESTATUS[0]}
  set -e

  export REPORT_RETURN_VALUE=${RESULT}
  return ${RESULT}
}

The bash PIPESTATUS variable can be used to get the return code from
the 'cmd' function. In your case, this is likely to always be an 
error.  We would need to come up with a mechanism for actually
determining the real result of the kexec.sh operation.  If we don't care,
and are just going to check the log to determine the status, then we
can ignore the return result here.

Also, this introduces a new log file 'livelog.txt', which has log data saved
on the host during execution of the test.  We would want to add
a routine to merge this into the testlog.txt file, somewhere in post_test.

If the only report functions called during a test are 'report_live_no_fail',
then we can just copy or cat $LOGDIR/livelog.txt into $LOGDIR/testlog.txt
(see how this is done for hostlog.txt in functions.sh already).

We may also need to turn off shell debugging, similar to what function
log_this() does.  (But maybe not, it's worth experimenting to find out).

> I tried many tricks to make the job 'PASS' like modifying the 'report',
> 'safe_cmd' functions in functions.sh to comment out the handling to ignore
> abrupt restart of the board by the handler after 'kexec' command, but it did
> not work.
> 
> I also checked 'Benchmark.reboot' test available under fuego-
> core/engine/tests/ folder to get some clues but we need to use only 'kexec'
> in our target board as normal 'reboot' doesn't work in our board (because of
> some know issues).
> 
> So only way to change the kernel is via 'kexec' and I want that job to 'PASS'
> ignoring the errors thrown above.
>
> Please let me know if anyone knows how to do it.
> 
> Thanks much in advance.

Can you try adding the function as described above, and calling it like so:

report_live_no_fail  "cd /mnt/opt/mytest; sh kexec.sh"

and let me know if that works?  If not, try it with additional 'set +e' and
'set -e' wrapped around it.  Or you can also do this:

report_live_no_fail "cd /mnt/opt/mytest; sh kexec.sh" || true

When commands are executed in bash as part of a conditional 
sequence (using && and ||), the normal error handling of 'set -e'
is suppressed, and only the result of the whole sequence is used.
Since 'true' always returns 0, this sequence cannot trigger a shell
error.

Overall, I think that having an alternative for gathering the log data
live from the board, as opposed to via the board-based log file, will
be useful for other situations. If it works well, we may even want to
change to that style of log capture by default.

Let me know what you find out.

Thanks,
 -- Tim



More information about the Fuego mailing list