[Fuego] RFC - new board-control interface

Tim.Bird at sony.com Tim.Bird at sony.com
Thu Feb 15 19:45:48 UTC 2018


Hello fellow Fuegoans!

As if I didn't have enough sticks in the fire already (ha ha, pun intended),
here is another RFC for a new feature I'm adding to Fuego, to create an
interface for hardware control of the device under test.

This has been prompted by several issues:
1) We need a hardware interface for provisioning boards, for some test scenarios, and 
2) Fuego really needs to be able to recover from tests that hang the device under test
(For me, LTP's inotify06 does this for my beaglebone board, as the board is currently configured.)

The general idea is that I am introducing a new board variable called: BOARD_CONTROL.
This defines the system used for managing a board in someone's farm (e.g. r4d, labgrid,
ttc, or something else).  This is different from, and orthogonal to the PLATFORM
(which really should be renamed to DISTRO), which specifies a set of runtime capabilities
and programs on the device under test, and TRANSPORT, which specifies communications
mechanisms between Fuego and the device under test.

Currently, the only supported BOARD_CONTROL is 'ttc', because that's what I have in
my lab.  However, I'm open to support for other people's board control management
interfaces.

Also currently, the only supported board control function is: ov_board_control_reboot.
After defining this, I then go on to add support to Fuego, in the post_test function,
for attempting to recover (reboot a board) after it hangs due to a test.

That Fuego has been missing this for so long is indicative of Fuego's positioning
in the Open Source test landscape.  For other systems, which integrate boot-time
and provisioning testing (like kernelci and lava), rebooting the board is fundamental
and inescapable to their test procedures.  Fuego is positioned much more as a 
runtime test system, which assumes the board is up, and that few tests actually
put the board into a non-responsive condition.

Below is the patch which introduces these concepts.  Note that I have not written
the documentation for this yet, or provided any example boards with BOARD_CONTROL
support (although I have some unpublished board files in my lab that use this
new board variable).

Note that Fuego's interface to the board management software is via the
Linux command line (as opposed to, say, using a C, python, or shell library).
Fuego has a very 'Unix-y' approach to interfacing with external software.

The patch itself is surprisingly short, but please consider
carefully the concepts, and let me know what you think.

Thanks and regards,
 -- Tim

Here's the patch, which is already in my master branch (but I'm open
to making changes based on feedback on the principles here).
-----------
Subject: [PATCH] core: add support for hardware reset

Support performing a hardware reboot of a board, if the system
detects that the board is no longer responsive.  Introduce
a new hardware reboot function, which requires that the board
provide a BOARD_CONTROL variable, indicating the board control
system being used.  Other board variables may be required by
the board control software.

The only board control system supported currently is 'ttc'.

Signed-off-by: Tim Bird <tim.bird at sony.com>
---
 engine/overlays/base/base-board.fuegoclass | 12 ++++++++++++
 engine/scripts/functions.sh                | 19 ++++++++++++++++++-
 2 files changed, 30 insertions(+), 1 deletion(-)

diff --git a/engine/overlays/base/base-board.fuegoclass b/engine/overlays/base/base-board.fuegoclass
index bdda9c3..8c9b5f8 100644
--- a/engine/overlays/base/base-board.fuegoclass
+++ b/engine/overlays/base/base-board.fuegoclass
@@ -188,3 +188,15 @@ function ov_transport_cmd() {
    ;;
   esac
 }
+
+# function to reboot the board
+function ov_board_control_reboot() {
+  case "$BOARD_CONTROL" in
+  "ttc")
+    $TTC $TTC_TARGET reboot
+    ;;
+  *)
+    abort_job "Error reason: unsupported BOARD_CONTROL ${BOARD_CONTROL}"
+    ;;
+  esac
+}
diff --git a/engine/scripts/functions.sh b/engine/scripts/functions.sh
index 46c3d8e..1eee444 100755
--- a/engine/scripts/functions.sh
+++ b/engine/scripts/functions.sh
@@ -29,6 +29,7 @@ function signal_handler {
     # if we got here, something went wrong.  Let's clean up and leave
     echo "in signal_handler"
     unlock_build_dir
+    export FUEGO_RECEIVED_SIGNAL="true"
     if [[ "$FUEGO_TEST_PHASES" == *post_test* ]] ; then
         echo "##### doing fuego phase: post_test (from signal handler) #####"
         post_test
@@ -618,6 +619,22 @@ function post_test {
     trap post_term_handler SIGTERM
     trap - SIGHUP SIGALRM SIGINT ERR EXIT
 
+    if [ "${FUEGO_RECEIVED_SIGNAL}" = "true" ] ; then
+        # the board may have hung (a kernel oops)
+        # see if the board is responsive, and if not, try to reboot it
+        set +e
+        if ! cmd "true" ; then
+            if [ -n "${BOARD_CONTROL}" ] ; then
+                ov_board_control_reboot
+                ov_transport_connect ${MAX_REBOOT_RETRIES:-20}
+                cmd "true" || abort_job "ERROR: Cannot connect to board after reboot\n"
+            else
+                abort_job "ERROR: Cannot connect to board for test post-processing"
+            fi
+        fi
+        set -e
+    fi
+
     # log test completion message.
     # but don't let user confuse termination with success
     ov_logger "Test $TESTDIR is finished - maybe successfully"
@@ -671,7 +688,7 @@ function target_reboot {
   # pass max_retries to ov_transport_connect
   ov_transport_connect $1
   set -e
-  cmd "true" || abort_job "FAIL: Cannot connect to board after reboot\n"
+  cmd "true" || abort_job "ERROR: Cannot connect to board after reboot\n"
 }
 
 # $1 - tarball template
-- 
1.9.1




More information about the Fuego mailing list