[Fuego] status update

Wed Jun 28 01:58:24 UTC 2017

Hi Tim,

> -----Original Message-----
> From: Bird, Timothy [mailto:Tim.Bird at sony.com]
> Sent: Wednesday, June 28, 2017 9:47 AM
> > - Merge run.json files for a test_suite into a single results.json file that flot
> > can visualize.
> 
> So, just to clarify, the run.json file has the results for a multiple metrics for
> a single execution (or 'run') of a test, and the results.json file has
> results for multiple runs of the test?  That is, results.json has results
> for a single Fuego host with results from different runs and on different boards?

Yes, that's correct. It's kind of a text "database". The size of this file will increase
with the number of runs so it may not scale. Probably we will need to implement
a logrotate-like functionality once we can submit all runs to a proper database
in a centralized server (maybe one running kernelci or something similar).

> > Pending tasks:
> > - Add report functionality
> >    + I have removed the generation of plot.png files. In part because I want to
> > do that directly from the
> >        flot plugin, and in part because I think it is more useful if we integrate it in
> > the future ftc report command,
> > - Add more information to the run.json files.
> >    + I am trying to produce a schema that is very close to the one used in
> > Kernel.CI. Probably I can make it compatible.
> 
> That sounds good.  I'm very interested in the schema.  I believe
> that Milo Casagrande mentioned something about groups, that I don't think
> we have yet.  Everything else in your analysis from April
> (https://lists.linuxfoundation.org/pipermail/fuego/2017-April/000448.html)
> I think shows some analog between Fuego and KernelCI fields.

Groups are called "test_sets" in kernel CI and can contain an array of "test_cases".
# I'm thinking about using kernel ci's nomenclature for this. What do you think?
# I also want to rename platform to toolchain, fwver to kernel_version, spec to
# test_spec, testplan to test_plan...

In Fuego, we have a similar concept. If you see bonnie's reference.log [1] you will
notice that there are several groups/test_sets and multiple tests/test_cases inside
each group:

test_set: Sequential_Output
test_cases: Block, PerChr, Rewrite

I am making an schema that is compatible with Kernel CI but that it will also allow
having test_sets inside test_sets. For example: "LTP > RT tests > func > sched_jitter"
contains 2-levels of test_sets (RT tests and func).

[1] https://bitbucket.org/tbird20d/fuego-core/src/805adb067afc492382ee23bc9178c059b90c043e/engine/tests/Benchmark.bonnie/reference.log?at=next&fileviewer=file-view-default

[Note] In the past, tests.info used to store this test_set > test_case information. Now this information
is actually provided by the test's parser.py and reference.log. The parser.py's information includes
information for the test_sets/test_cases that were actually executed (depends on the spec), whereas reference.log 
contains all possible test_sets/test_cases for that test (including those that were skipped somehow because
of the selected test_spec). We need to decouple the reference thresholds from this information though.

> >    + There is some information that needs to be added to Fuego.
> > Unfortunately, I will probably have to fix a lot of files:
> >        1) Each test_case (remember test_suite > test_set > test_case) should
> > be able to store a list of
> >             measurements. Each measurement would consist of a name, value,
> > units, duration and maybe more
> >             such as error messages specific to the test_case or expected values
> > (e.g. expected to fail, or expected to
> >             be greater than 50).
> I think this needs to be somewhere, but possibly not in the results schema.
> For example, I don't want every listing of dbench results to have to report
> the units for each benchmark metric.  These should be fairly static
> and we should be able to define them per-test, not per-run.  Things
> like thresholds are a bit different, and we may need to record them
> per-run, since the success/failure threshold could be different depending
> on the board, or even changed by the user to fine-tune the regression-checking.

The reasons I wanted to add units to the schema were:
  - AGL was using them on their HTML output reports, and it does make reports more readable.
    If we don't have this information in the results.json file, flot will need to get it
    from somewhere else (e.g. a json file). The problem is that we will need to update
    that file everytime we add a new test. I'd rather have that information inside the test directory.
    What do you think? 
  - Kernel CI format [2][3] allows including units in their measures (not strictly required) as well.

[2] https://api.kernelci.org/schema-test-case.html
[3] https://api.kernelci.org/json-schema/1.0/measurement.json

> > - Handle failed runs better. Sometimes the test fails very early, before even
> > building it.
> > - I am not sure what to do with the "reference.log" files
> >    + Currently they are used to store thresholds, but these are probably board
> > dependent.
> >    + This is probably related to the discussion with Rafael about parameterized
> > builds. We should
> >        be able to define the threshold for a specific test_case's measure.
> 
> reference logs should be savable by the user, to compare with future runs.
> The system we have now uses parsed testlogs, which are generated using
> log_compare and very simple line-based regular-expresison filters (using 'grep').
> It will be much more flexible and robust to compare run.json files instead
> of a parsed log and a reference log.
> 
> The purpose of these is to save the data from a "known good" run, so that
> regressions can be detected when the data from a current run differs from that.
> This can include sub-test failures, that we have decided to ignore or postpone
> resolution of.
> 
> I think once we have in place a system to save all the sub-test metric data
> from the testlogs (using a parser) in json format, then we can eliminate these.
> We should be able to replace reference.log with reference.json (which is just
> a saved run.json file).  This is a key thing that I would like Fuego to be able to
> share easily between developers (and sites).
> 
> I have already started working on a json difference program (called 'jdiff')
> to compare 2 json files and report the differences between the two.
> 
> On the issue of where to save them, currently they should be saved somehow
> at the 'board' level.  That is, tests will definitely have different results per-board.
> But there may be such a thing as a reference file that is dependent on the kernel
> version, or the distribution, or some other parameter.  We should discuss the
> naming and storage of these.

How about saving them at the board's testplan, and also allow users to try
different ones through the ftc run-test interface?

> > - Remove testplans?
> >    + I was thinking that we can substitute testplans by custom scripts that call
> > ftc add-jobs
> 
> Why do we want to remove them?  I think they serve a useful function - expressing
> a set of tests (with their specs) to execute in sequence.

OK. I just wanted to mention that is redundant. But I agree that they are useful.

> > - Create a staging folder for tests that do not work or files that are not used.
> >    + Or maybe at least list them up on the wiki.
> 
> Currently, if they are not listed in a testplan, the tests are functionally 'dead'.
> (Although a user can create a job for a single test and try it out).
> Maybe it would be good to have a 'staging' folder for tests that are
> under development or conversion (like a lot of the AGL tests). I agree that
> we should have some notion of the "approved and likely to work" tests,
> and that should be expressed somehow in the test placement or in
> documentation.

Maybe we can write this information on the test.yaml file (you mentioned something
about evaluating tests with stars). Actually, I would like to know your plans with the
test.yaml files in general. I haven't looked into them deeply yet.

Thanks,
Daniel