[Fuego] status update

Wed Jun 28 00:47:03 UTC 2017

> -----Original Message-----
> From: Daniel Sangorrin on Monday, June 26, 2017 11:04 PM
>
> I've working on the parsing code. Here is a list of things that I managed to
> complete
> and things that need some discussion:
> 
> Completed tasks:
> - Output a single run.json file for each job's run/build that captures both the
> metadata and the test results.

Sounds good.   This is probably the right approach.

> - Merge run.json files for a test_suite into a single results.json file that flot
> can visualize.

So, just to clarify, the run.json file has the results for a multiple metrics for
a single execution (or 'run') of a test, and the results.json file has
results for multiple runs of the test?  That is, results.json has results
for a single Fuego host with results from different runs and on different boards?

>    + I have added concurrency locks to protect results.json from concurrent
> writes.
> - Add HTML output support to flot (similar, but not exactly the same yet, to
> the one in AGL JTA).
> - Fixed the test Functional.tiff (AGL test) and confirmed that it works on
> docker and beaglebone black.
>    + There are many more AGL tests to fix.
> - Fixed several bug fixes that occurred when a test fails in an unexpected
> way.

Sounds good.  Thanks!

> Discarded tasks:
> - Output a jUnit XML file so that the plugin "Test Results Analyzer" can
> display.
>   + This is working but it isn't as flexible as I'd like. The new flot's HTML output
> support that I added should deprecate it.
OK.

> - Ability to download a PNG/SVG from the flot plugin directly.
>   + I managed to get this working by using the canvas2image library or using
> the canvas.toDataURL interface. Unfortunately, flot  doesn't store the axes'
> information in the canvas so only the plotting space is saved. There is a library
> to accomplish this task [1] that but there seems to be a version mismatch
> with the javascript libraries in fuego and it didn't work. I decided to postpone
> this.
OK.

> 
> Pending tasks:
> - Add report functionality
>    + I have removed the generation of plot.png files. In part because I want to
> do that directly from the
>        flot plugin, and in part because I think it is more useful if we integrate it in
> the future ftc report command,
> - Add more information to the run.json files.
>    + I am trying to produce a schema that is very close to the one used in
> Kernel.CI. Probably I can make it compatible.

That sounds good.  I'm very interested in the schema.  I believe 
that Milo Casagrande mentioned something about groups, that I don't think
we have yet.  Everything else in your analysis from April
(https://lists.linuxfoundation.org/pipermail/fuego/2017-April/000448.html)
I think shows some analog between Fuego and KernelCI fields.

>    + There is some information that needs to be added to Fuego.
> Unfortunately, I will probably have to fix a lot of files:
>        1) Each test_case (remember test_suite > test_set > test_case) should
> be able to store a list of
>             measurements. Each measurement would consist of a name, value,
> units, duration and maybe more
>             such as error messages specific to the test_case or expected values
> (e.g. expected to fail, or expected to
>             be greater than 50).
I think this needs to be somewhere, but possibly not in the results schema.
For example, I don't want every listing of dbench results to have to report
the units for each benchmark metric.  These should be fairly static
and we should be able to define them per-test, not per-run.  Things
like thresholds are a bit different, and we may need to record them
per-run, since the success/failure threshold could be different depending
on the board, or even changed by the user to fine-tune the regression-checking.

>        2) Each test_case should have a status (PASS, FAIL..) that is not
> necessarily the same as the test_set/suite status.
Agreed.

>    + Add vcs_commit information (git or tarball information)
This is expected to be captured by the test version (in test.yaml),
and the test version should be saved in the run.json file.
But I could see saving the test source code version in the run.json file also.

> - Handle failed runs better. Sometimes the test fails very early, before even
> building it.
> - I am not sure what to do with the "reference.log" files
>    + Currently they are used to store thresholds, but these are probably board
> dependent.
>    + This is probably related to the discussion with Rafael about parameterized
> builds. We should
>        be able to define the threshold for a specific test_case's measure.

reference logs should be savable by the user, to compare with future runs.
The system we have now uses parsed testlogs, which are generated using
log_compare and very simple line-based regular-expresison filters (using 'grep').
It will be much more flexible and robust to compare run.json files instead
of a parsed log and a reference log.

The purpose of these is to save the data from a "known good" run, so that
regressions can be detected when the data from a current run differs from that.
This can include sub-test failures, that we have decided to ignore or postpone
resolution of.

I think once we have in place a system to save all the sub-test metric data
from the testlogs (using a parser) in json format, then we can eliminate these.
We should be able to replace reference.log with reference.json (which is just
a saved run.json file).  This is a key thing that I would like Fuego to be able to
share easily between developers (and sites).

I have already started working on a json difference program (called 'jdiff')
to compare 2 json files and report the differences between the two.

On the issue of where to save them, currently they should be saved somehow
at the 'board' level.  That is, tests will definitely have different results per-board.
But there may be such a thing as a reference file that is dependent on the kernel
version, or the distribution, or some other parameter.  We should discuss the
naming and storage of these.

> - Remove testplans?
>    + I was thinking that we can substitute testplans by custom scripts that call
> ftc add-jobs

Why do we want to remove them?  I think they serve a useful function - expressing
a set of tests (with their specs) to execute in sequence.

> - Create a staging folder for tests that do not work or files that are not used.
>    + Or maybe at least list them up on the wiki.

Currently, if they are not listed in a testplan, the tests are functionally 'dead'.
(Although a user can create a job for a single test and try it out).
Maybe it would be good to have a 'staging' folder for tests that are
under development or conversion (like a lot of the AGL tests). I agree that
we should have some notion of the "approved and likely to work" tests,
and that should be expressed somehow in the test placement or in
documentation.
 -- Tim