[Fuego] status update

Wed Jun 28 04:55:45 UTC 2017

> -----Original Message-----
> From: Daniel Sangorrin on Tuesday, June 27, 2017 6:58 PM
> > -----Original Message-----
> > From: Bird, Timothy [mailto:Tim.Bird at sony.com]
> > Sent: Wednesday, June 28, 2017 9:47 AM
> > > - Merge run.json files for a test_suite into a single results.json file that
> flot
> > > can visualize.
> >
> > So, just to clarify, the run.json file has the results for a multiple metrics for
> > a single execution (or 'run') of a test, and the results.json file has
> > results for multiple runs of the test?  That is, results.json has results
> > for a single Fuego host with results from different runs and on different
> boards?
> 
> Yes, that's correct. It's kind of a text "database". The size of this file will
> increase
> with the number of runs so it may not scale. Probably we will need to
> implement
> a logrotate-like functionality once we can submit all runs to a proper
> database
> in a centralized server (maybe one running kernelci or something similar).
For a central server, I think we'll definitely have to use a database.
If we end up using the same (or a 'similar-enough') schema, we may be
able to reuse parts of the kernelci setup for our own Fuego central server.
That is, copy their database configuration and maybe reuse their server-side
web front-end.  Maybe we could end up sharing code and helping each other
out. It would be nice not to reinvent that wheel.

However, I don't want to require a database setup for the average developer
(with one host and one or a few boards).

> 
> > > Pending tasks:
> > > - Add report functionality
> > >    + I have removed the generation of plot.png files. In part because I
> want to
> > > do that directly from the
> > >        flot plugin, and in part because I think it is more useful if we integrate
> it in
> > > the future ftc report command,
> > > - Add more information to the run.json files.
> > >    + I am trying to produce a schema that is very close to the one used in
> > > Kernel.CI. Probably I can make it compatible.
> >
> > That sounds good.  I'm very interested in the schema.  I believe
> > that Milo Casagrande mentioned something about groups, that I don't
> think
> > we have yet.  Everything else in your analysis from April
> > (https://lists.linuxfoundation.org/pipermail/fuego/2017-April/000448.html)
> > I think shows some analog between Fuego and KernelCI fields.
> 
> Groups are called "test_sets" in kernel CI and can contain an array of
> "test_cases".
> # I'm thinking about using kernel ci's nomenclature for this. What do you
> think?
Did they ever give us an example of what their LTP results look like?
I think Milo described them, but I don't recall actually seeing a json file.
LTP is probably the most complicated test we'll need to handle.

I'm not sure I like their nomenclature.  They have 3 things:
test_suite, test_set and test_case.
I guess these are roughly the same as our:
test_plan, test, and (unnamed by us) individual sub-test case.
(but it's unclear these are exact analogs).

I find these three levels confusing  - particularly because a test_suite
in kernelCI can point to both test_sets and test_cases.

> # I also want to rename platform to toolchain, fwver to kernel_version, spec
> to
> # test_spec, testplan to test_plan...
Agree on rename of platform to toolchain, fwver to kernel_version.

Our 'spec' is essentially the same as their test_set 'parameters' object.
Note that their 'test_case' can have a 'parameters' object as well.

I'm thinking of test_case as something like: LTP.syscall.kill10
where, given that they support multiple measurements per test_case, maybe
they would classify this as:
 - test_suite LTP
 - test_set syscall
 - test_case: kill10
 - measurement: ? (does kill10 do more than one measurement?)

> 
> In Fuego, we have a similar concept. If you see bonnie's reference.log [1] you
> will
> notice that there are several groups/test_sets and multiple tests/test_cases
> inside
> each group:
> 
> test_set: Sequential_Output
> test_cases: Block, PerChr, Rewrite

In this case, is Block a set of measurements, or a single measurement?

> 
> I am making an schema that is compatible with Kernel CI but that it will also
> allow
> having test_sets inside test_sets. For example: "LTP > RT tests > func >
> sched_jitter"
> contains 2-levels of test_sets (RT tests and func).
2 levels: RT tests and func?
or just 2 test_sets?

I'm not that familiar with LTP, so is 'func' actually nested under 'RT tests'?

> 
> [1] https://bitbucket.org/tbird20d/fuego-
> core/src/805adb067afc492382ee23bc9178c059b90c043e/engine/tests/Bench
> mark.bonnie/reference.log?at=next&fileviewer=file-view-default
> 
> [Note] In the past, tests.info used to store this test_set > test_case
> information. Now this information
> is actually provided by the test's parser.py and reference.log. The parser.py's
> information includes
> information for the test_sets/test_cases that were actually executed
> (depends on the spec), whereas reference.log
> contains all possible test_sets/test_cases for that test (including those that
> were skipped somehow because
> of the selected test_spec). We need to decouple the reference thresholds
> from this information though.
> 
> > >    + There is some information that needs to be added to Fuego.
> > > Unfortunately, I will probably have to fix a lot of files:
> > >        1) Each test_case (remember test_suite > test_set > test_case)
> should
> > > be able to store a list of
> > >             measurements. Each measurement would consist of a name, value,
> > > units, duration and maybe more
> > >             such as error messages specific to the test_case or expected values
> > > (e.g. expected to fail, or expected to
> > >             be greater than 50).
> > I think this needs to be somewhere, but possibly not in the results schema.
> > For example, I don't want every listing of dbench results to have to report
> > the units for each benchmark metric.  These should be fairly static
> > and we should be able to define them per-test, not per-run.  Things
> > like thresholds are a bit different, and we may need to record them
> > per-run, since the success/failure threshold could be different depending
> > on the board, or even changed by the user to fine-tune the regression-
> checking.
> 
> The reasons I wanted to add units to the schema were:
>   - AGL was using them on their HTML output reports, and it does make
> reports more readable.
>     If we don't have this information in the results.json file, flot will need to get
> it
>     from somewhere else (e.g. a json file). The problem is that we will need to
> update
>     that file everytime we add a new test. I'd rather have that information
> inside the test directory.
>     What do you think?
OK - that makes sense.

>   - Kernel CI format [2][3] allows including units in their measures (not strictly
> required) as well.
> 
> [2] https://api.kernelci.org/schema-test-case.html
> [3] https://api.kernelci.org/json-schema/1.0/measurement.json
> 
> > > - Handle failed runs better. Sometimes the test fails very early, before
> even
> > > building it.
> > > - I am not sure what to do with the "reference.log" files
> > >    + Currently they are used to store thresholds, but these are probably
> board
> > > dependent.
> > >    + This is probably related to the discussion with Rafael about
> parameterized
> > > builds. We should
> > >        be able to define the threshold for a specific test_case's measure.
> >
> > reference logs should be savable by the user, to compare with future runs.
> > The system we have now uses parsed testlogs, which are generated using
> > log_compare and very simple line-based regular-expresison filters (using
> 'grep').
> > It will be much more flexible and robust to compare run.json files instead
> > of a parsed log and a reference log.
> >
> > The purpose of these is to save the data from a "known good" run, so that
> > regressions can be detected when the data from a current run differs from
> that.
> > This can include sub-test failures, that we have decided to ignore or
> postpone
> > resolution of.
> >
> > I think once we have in place a system to save all the sub-test metric data
> > from the testlogs (using a parser) in json format, then we can eliminate
> these.
> > We should be able to replace reference.log with reference.json (which is
> just
> > a saved run.json file).  This is a key thing that I would like Fuego to be able
> to
> > share easily between developers (and sites).
> >
> > I have already started working on a json difference program (called 'jdiff')
> > to compare 2 json files and report the differences between the two.
> >
> > On the issue of where to save them, currently they should be saved
> somehow
> > at the 'board' level.  That is, tests will definitely have different results per-
> board.
> > But there may be such a thing as a reference file that is dependent on the
> kernel
> > version, or the distribution, or some other parameter.  We should discuss
> the
> > naming and storage of these.
> 
> How about saving them at the board's testplan, and also allow users to try
> different ones through the ftc run-test interface?
I don't follow this.  In /fuego-ro/boards?

They are definitely test-specific, but I'm not sure I want them in /fuego-core/engine/tests/<testname>.
That will pollute the fuego-core directory with site-specific data.

I think they need to go into /fuego-rw somewhere, as they can be user-generated (and possibly
user-downloaded).

> 
> > > - Remove testplans?
> > >    + I was thinking that we can substitute testplans by custom scripts that
> call
> > > ftc add-jobs
> >
> > Why do we want to remove them?  I think they serve a useful function -
> expressing
> > a set of tests (with their specs) to execute in sequence.
> 
> OK. I just wanted to mention that is redundant. But I agree that they are
> useful.
> 
> > > - Create a staging folder for tests that do not work or files that are not
> used.
> > >    + Or maybe at least list them up on the wiki.
> >
> > Currently, if they are not listed in a testplan, the tests are functionally
> 'dead'.
> > (Although a user can create a job for a single test and try it out).
> > Maybe it would be good to have a 'staging' folder for tests that are
> > under development or conversion (like a lot of the AGL tests). I agree that
> > we should have some notion of the "approved and likely to work" tests,
> > and that should be expressed somehow in the test placement or in
> > documentation.
> 
> Maybe we can write this information on the test.yaml file (you mentioned
> something
> about evaluating tests with stars).
I hadn't considered putting the rating information into the test.yaml file,
but an indicator of test 'readiness' might work there.  I have test version,
which, if the number is less than 1.0, is a proxy for indicating that the 
test is not really considered valid yet (ie, it's pre-release quality).
Note that the version field in test.yaml is the version of the fuego test, not
the version of the test program used in the test.

> Actually, I would like to know your plans
> with the
> test.yaml files in general. I haven't looked into them deeply yet.
They were created as the place to hold data used for packaging a test
(for test distribution outside of the fuego-core repository).  Currently,
they have a manifest of files and some extra information related to 
packaging a test (author, license, version, etc.).  I will formally define
these, clean them up, and add them for all the tests in the repository
when we roll out "test packages" as an officially supported feature. 

Right now, test packages are implemented as just a proof of concept.  More work
needs to be done server side (with a rating system, security to prevent
malware, etc) before this feature is ready. (ie - not this release).
You can think of them like rpm .spec files, or debian control files.
An example of one is in fuego-core/engine/tests/Functional.bc/test.yaml
(Now that I look at it, it's out of date.  It lists the base_script name, and that
is no longer needed, as the base_script is now always 'fuego_test.sh').
 -- Tim