[Ksummit-discuss] [CORE TOPIC] stable workflow

Wed Aug 3 04:47:35 UTC 2016

> -----Original Message-----
> From: ksummit-discuss-bounces at lists.linuxfoundation.org [mailto:ksummit-
> discuss-bounces at lists.linuxfoundation.org] On Behalf Of Shuah Khan
 On 07/29/2016 08:28 AM, Steven Rostedt wrote:
> > On Fri, 29 Jul 2016 11:59:47 +0300
> > Laurent Pinchart <laurent.pinchart at ideasonboard.com> wrote:
> >
> >> Another limitation of kselftest is the lack of standardization for logging and
> >> status reporting. This would be needed to interpret the test output in a
> >> consistent way and generate reports. Regardless of whether we extend
> kselftest
> >> to cover device drivers this would in my opinion be worth fixing.
> >>
> >
> > Perhaps this should be a core topic at KS.
> >
> 
> Yes definitely. There has been some effort in standardizing,
> but not enough. We can discuss and see what would make the
> kselftest more usable without adding external dependencies.
> 
> One thing we could do is add script to interpret and turn the
> test output into usable format.

Just FYI on what Fuego [1] does here:

It basically has to take the output from tests with many different output formats,
and convert each one into a single pass/fail value for each test, for the Jenkins interface.

It uses a short shell function called log_compare, which it
uses to scan  a log (the test program output) looking for a regular expression.  It is
passed a test_name, match_count, a regular_expression, and a result_category.
The result category is "p" for positive or "n" for negative.  The regular expression is
passed to "grep -E <regular_expression> <logfile> | wc -l" and the result is compared
to the match_count.  If it matches, then an additional comparison is made between
the logfile filtered by the regular_expression and one saved previously.  If the number
of occurrences match, and the current filtered log matches the previously filtered log,
then the test is considered to have succeeded.  The test_name is used to find the
previously saved filtered log.

Here is the code, in case the description is not clear:
function log_compare {
# 1 - test_name, 2 - match_count, 3 - regular_expression, 4 - n/p (i.e. negative or positive)

  cd "$FUEGO_LOGS_PATH/${JOB_NAME}/testlogs"
  LOGFILE="${NODE_NAME}.${BUILD_ID}.${BUILD_NUMBER}.log"
  PARSED_LOGFILE="${NODE_NAME}.${BUILD_ID}.${BUILD_NUMBER}.{4}.log"

  if [ -e $LOGFILE ]; then
    current_count=`cat $LOGFILE | grep -E "${3}" 2>&1 | wc -l`
    if [ $current_count -eq $2 ];then
      cat $LOGFILE | grep -E "${3}" | tee "$PARSED_LOGFILE"
      local TMP_P=`diff -u ${WORKSPACE}/../ref_logs/${JOB_NAME}/${1}_${4}.log "$PARSED_LOGFILE" 2>&1`
      if [ $? -ne 0 ];then
        echo -e "\nFuego error reason: Unexpected test log output:\n$TMP_P\n"
        check_create_functional_logrun "test error"
        false
      else
        check_create_functional_logrun "passed"
        true
      fi
    else
      echo -e "\nFuego error reason: Mismatch in expected ($2) and actual ($current_count) pos/neg ($4) results. (pattern: $3)\n"
      check_create_functional_logrun "failed"
      false
    fi
  else
    echo -e "\nFuego error reason: 'logs/${JOB_NAME}/testlogs/$LOGFILE' is missing.\n"
    check_create_functional_logrun "test error"
    false
  fi

  cd -
}

This is called with a line like the following:
   log_compare $TESTNAME, "11", "^Test-.*OK", "p" 
or
  log_compare $TESTNAME, "0", "^Test-.*Failed", "n"

The reason for the match_count is that many tests that Fuego runs have
lots of sub-tests, (LTP being a prime example) and you want to figure out
if you're getting the same number of positive or negative results
that you are expecting.  The match_count is sometimes parameterized, so
that you can tune the system to ignore some failures.

The system ships with <test_name>_p.log and <test_name>_n.log files
(previously filtered log files) for each test.

I think in general you want a system that provides default expected results
while still allowing developers to tune it for individual sub-tests that
fail for some reason on their system.  One of the biggest problems with
tests is that users often don't have a baseline of what they should expect
to see (what is "good" output vs. what actually shows a problem).

'grep -E is <regular_expression>' is about the most basic thing you
can do in terms of parsing a log.  Fuego also includes a python-based
parser to extract out benchmarking data, for use in charting and
threshold regression checking, but that seems like overkill for a first pass
at this with kselftest. (IMHO)

FWIW I'm interested in how this shakes out because I want to wrap kselftest into
Fuego.  I'm not on the list for the summit, but I'd like to stay in the discussion via e-mail.
 -- Tim

[1] http://bird.org/fuego/FrontPage

P.S. by the way, there's a bug in the above log_compare code.  Don't use it directly.