[Fuego] Some stray thoughts on the parser generalization

Fri Jun 29 01:44:01 UTC 2018

> -----Original Message-----
> From: Tim.Bird at sony.com <Tim.Bird at sony.com>
> Sent: Thursday, June 28, 2018 5:09 PM
> To: fuego at lists.linuxfoundation.org; daniel.sangorrin at toshiba.co.jp
> Subject: Some stray thoughts on the parser generalization
> 
> I've been thinking about the parser generalization that was discussed at the
> Fuego Jamboree.  I have a few ideas to toss out, in no particular order:
> 
> 1) there's some boilerplate code that every parser.py has at the beginning,
> that IMHO it would be good to try to eliminate.
> The lines with sys.path.insert... and
> import common as plib
> would be nice to eliminate.
> 
> Rather than running the parser.py as a standalone program, why don't we structure
> it as a plugin module instead?

I was thinking about a new python library called (e.g.: "fuegoparse") that you can use from
any project (e.g.: install it with pip3) and also be called from the command line.

Example usage as a library:

import fuegoparse
import json

testparser = fuegoparse.TestlogParser(test="LTP", output_format="fuego_run_json")
data = testparser.parse("./testlog.txt")
with open('run.json', 'w') as outfile:
    json.dump(data, outfile)

Example as a command:

$ fuegoparse -t LTP --log ./testlog.txt --format fuego_run_json -o run.json

Other output formats could be TAP, KernelCI, Squad, Junit, etc.

> Currently we invoke the parser with:
> run_python $PYTHON_ARGS $FUEGO_CORE/engine/tests/${TESTDIR}/parser.py
> (with a whole lot of Fuego-specific environment variables).
> 
> I think it would be good to refactor this, so that the fuego core (functions.sh)
> calls a single program, indicating the test, the log file, and a parser name.
> Many fuego parsers could be combined by declaring a single regex pattern
> in the fuego_test.sh (similar to what is done with log_compare).
> 
> So, something like the following instead:
> run_python $PYTHON_ARGS --log=$FUEGO_RW/logs/.../testlog --test=$TESTDIR
> --parser=TAP13
> or
> run_python $PYTHON)_ARGS --log=$FUEGO_RW/logs/.../testlog --test=$TESTDIR
> --parser=2part-regex --parser-arg="regex_string= ^TEST-(\d+) (.*)$"

Cool, this is very close to what I had in mind.
It looks like this could become a whole new project (the first spinoff of the Fuego project!).

We also need to think about isolating other parts (cross-build, dependency checks, transport abstractions), 
so they can be reusable. And finally be able to say that Fuego is just glue code that puts all those parts together.

> 2) I'm starting to come to the conclusion that the testcase name needs to
> be very free-form.  That is, it should be allowed to have spaces and punctuation.
> Many tests use a description of the test as the only unique identifier for
> the test.  That is, they don't use numbered testcases.  I strongly prefer moving
> away from numbers as testcase names, as a number provides very little
> human-usable information about the testcase.

Sure, the more flexible for the user the better.

> I think the run.json can handle arbitrary strings for testcase names, but
> I fear that a lot of our parser and ftc code can not.

We can always fix it if that is the case. Python dictionary keys are supposed to
handle any string, including spaces.

> 3) I'm also starting to think that the structured data is a pain to manage,
> and it might be better to do most of the work in a flat format.  The  charting
> code uses a mixture of both structured (nested objects) and flat testcase
> names, and I think there's a lot of duplicate code lying around that handles
> the conversion back and forth, that could probably be coalesced into a
> single set of library routines.

Sorry, not sure what you mean with structure data here.
Are you talking about the JSON output format?

> That's it for now.  I'm just dumping my brain - not requesting anyone to
> work on anything.

Bdump ;)

Thanks,
Daniel