[Fuego] status update
daniel.sangorrin at toshiba.co.jp
Wed Jun 28 08:43:25 UTC 2017
# I've added Milo to the Cc.
> -----Original Message-----
> From: Bird, Timothy [mailto:Tim.Bird at sony.com]
> Sent: Wednesday, June 28, 2017 1:56 PM
> > > > Pending tasks:
> > > > - Add report functionality
> > > > + I have removed the generation of plot.png files. In part because I
> > want to
> > > > do that directly from the
> > > > flot plugin, and in part because I think it is more useful if we integrate
> > it in
> > > > the future ftc report command,
> > > > - Add more information to the run.json files.
> > > > + I am trying to produce a schema that is very close to the one used in
> > > > Kernel.CI. Probably I can make it compatible.
> > >
> > > That sounds good. I'm very interested in the schema. I believe
> > > that Milo Casagrande mentioned something about groups, that I don't
> > think
> > > we have yet. Everything else in your analysis from April
> > > (https://lists.linuxfoundation.org/pipermail/fuego/2017-April/000448.html)
> > > I think shows some analog between Fuego and KernelCI fields.
> > Groups are called "test_sets" in kernel CI and can contain an array of
> > "test_cases".
> > # I'm thinking about using kernel ci's nomenclature for this. What do you
> > think?
> Did they ever give us an example of what their LTP results look like?
> I think Milo described them, but I don't recall actually seeing a json file.
> LTP is probably the most complicated test we'll need to handle.
Milo sent 3 hackbench json files but not an LTP one. LTP normally can work with 3 levels
(e.g.: LTP > syscalls > kill01) so there is no problem about that.
However, LTP now also includes 2 more test suites inside (Posix
open testsuite, and the real-time testsuite) with their own test sets and test cases.
For that reason, you could end up with 4 nested levels (unless you create 3 test suites
from the same source code).
In any case, I don't see the need to restrict ourselves to only 3 nesting levels.
> I'm not sure I like their nomenclature. They have 3 things:
> test_suite, test_set and test_case.
> I guess these are roughly the same as our:
> test_plan, test, and (unnamed by us) individual sub-test case.
> (but it's unclear these are exact analogs).
I think this is a better analogy:
test_suite, test_set , test_case == test_name, groupname, test == bonnie, sequential_output, rewrite
The concept of test_plan is not in kernel ci afaik (maybe it is in LAVA with a different name such batch jobs?).
> I find these three levels confusing - particularly because a test_suite
> in kernelCI can point to both test_sets and test_cases.
Actually that makes sense. For example, suppose you have a simple
test suite (hello world ) with one single test case. Then you don't really
need to define a test_set.
> > # I also want to rename platform to toolchain, fwver to kernel_version, spec
> > to
> > # test_spec, testplan to test_plan...
> Agree on rename of platform to toolchain, fwver to kernel_version.
> Our 'spec' is essentially the same as their test_set 'parameters' object.
> Note that their 'test_case' can have a 'parameters' object as well.
It's something like that. But I think we should write the name of the test_spec
at the test_suite level in the schema because we do not support per-test_set
parameters at the moment.
> I'm thinking of test_case as something like: LTP.syscall.kill10
> where, given that they support multiple measurements per test_case, maybe
> they would classify this as:
> - test_suite LTP
> - test_set syscall
> - test_case: kill10
> - measurement: ? (does kill10 do more than one measurement?)
Functional test cases, such as LTP test cases, normally just finish with a return value (TPASS, TFAIL, TBROK etc..)
so you don't really need measurements for them (unless you want to store each checkpoint/assertion inside
the test case).
Benchmark test cases, on the other hand, may have more than one measurement. For example, netperf
returns the network throughput but also the CPU utilization. Fuego's Benchmark.netperf currently is
test_cases: cpu, net
test_cases: cpu, net
I think that what we want to achieve is actually:
measurements: [cpu, net]
measurements: [cpu, net]
measurements: [cpu, net]
> > In Fuego, we have a similar concept. If you see bonnie's reference.log  you
> > will
> > notice that there are several groups/test_sets and multiple tests/test_cases
> > inside
> > each group:
> > test_set: Sequential_Output
> > test_cases: Block, PerChr, Rewrite
> In this case, is Block a set of measurements, or a single measurement?
A single one, but it could be multiple if we for example did it for different sector sizes.
> > I am making an schema that is compatible with Kernel CI but that it will also
> > allow
> > having test_sets inside test_sets. For example: "LTP > RT tests > func >
> > sched_jitter"
> > contains 2-levels of test_sets (RT tests and func).
> 2 levels: RT tests and func?
> or just 2 test_sets?
> I'm not that familiar with LTP, so is 'func' actually nested under 'RT tests'?
Yes. RT tests have "func", "stress" and "perf" test sets.
> >  https://bitbucket.org/tbird20d/fuego-
> > core/src/805adb067afc492382ee23bc9178c059b90c043e/engine/tests/Bench
> > mark.bonnie/reference.log?at=next&fileviewer=file-view-default
> > [Note] In the past, tests.info used to store this test_set > test_case
> > information. Now this information
> > is actually provided by the test's parser.py and reference.log. The parser.py's
> > information includes
> > information for the test_sets/test_cases that were actually executed
> > (depends on the spec), whereas reference.log
> > contains all possible test_sets/test_cases for that test (including those that
> > were skipped somehow because
> > of the selected test_spec). We need to decouple the reference thresholds
> > from this information though.
> > > > + There is some information that needs to be added to Fuego.
> > > > Unfortunately, I will probably have to fix a lot of files:
> > > > 1) Each test_case (remember test_suite > test_set > test_case)
> > should
> > > > be able to store a list of
> > > > measurements. Each measurement would consist of a name, value,
> > > > units, duration and maybe more
> > > > such as error messages specific to the test_case or expected values
> > > > (e.g. expected to fail, or expected to
> > > > be greater than 50).
> > > I think this needs to be somewhere, but possibly not in the results schema.
> > > For example, I don't want every listing of dbench results to have to report
> > > the units for each benchmark metric. These should be fairly static
> > > and we should be able to define them per-test, not per-run. Things
> > > like thresholds are a bit different, and we may need to record them
> > > per-run, since the success/failure threshold could be different depending
> > > on the board, or even changed by the user to fine-tune the regression-
> > checking.
> > The reasons I wanted to add units to the schema were:
> > - AGL was using them on their HTML output reports, and it does make
> > reports more readable.
> > If we don't have this information in the results.json file, flot will need to get
> > it
> > from somewhere else (e.g. a json file). The problem is that we will need to
> > update
> > that file everytime we add a new test. I'd rather have that information
> > inside the test directory.
> > What do you think?
> OK - that makes sense.
> > - Kernel CI format  allows including units in their measures (not strictly
> > required) as well.
> >  https://api.kernelci.org/schema-test-case.html
> >  https://api.kernelci.org/json-schema/1.0/measurement.json
> > > > - Handle failed runs better. Sometimes the test fails very early, before
> > even
> > > > building it.
> > > > - I am not sure what to do with the "reference.log" files
> > > > + Currently they are used to store thresholds, but these are probably
> > board
> > > > dependent.
> > > > + This is probably related to the discussion with Rafael about
> > parameterized
> > > > builds. We should
> > > > be able to define the threshold for a specific test_case's measure.
> > >
> > > reference logs should be savable by the user, to compare with future runs.
> > > The system we have now uses parsed testlogs, which are generated using
> > > log_compare and very simple line-based regular-expresison filters (using
> > 'grep').
> > > It will be much more flexible and robust to compare run.json files instead
> > > of a parsed log and a reference log.
> > >
> > > The purpose of these is to save the data from a "known good" run, so that
> > > regressions can be detected when the data from a current run differs from
> > that.
> > > This can include sub-test failures, that we have decided to ignore or
> > postpone
> > > resolution of.
> > >
> > > I think once we have in place a system to save all the sub-test metric data
> > > from the testlogs (using a parser) in json format, then we can eliminate
> > these.
> > > We should be able to replace reference.log with reference.json (which is
> > just
> > > a saved run.json file). This is a key thing that I would like Fuego to be able
> > to
> > > share easily between developers (and sites).
> > >
> > > I have already started working on a json difference program (called 'jdiff')
> > > to compare 2 json files and report the differences between the two.
> > >
> > > On the issue of where to save them, currently they should be saved
> > somehow
> > > at the 'board' level. That is, tests will definitely have different results per-
> > board.
> > > But there may be such a thing as a reference file that is dependent on the
> > kernel
> > > version, or the distribution, or some other parameter. We should discuss
> > the
> > > naming and storage of these.
> > How about saving them at the board's testplan, and also allow users to try
> > different ones through the ftc run-test interface?
> I don't follow this. In /fuego-ro/boards?
Sorry, I forgot to say that testplans should also be per-board/user-generated and therefore not in fuego-core.
> They are definitely test-specific, but I'm not sure I want them in /fuego-core/engine/tests/<testname>.
> That will pollute the fuego-core directory with site-specific data.
> I think they need to go into /fuego-rw somewhere, as they can be user-generated (and possibly
I think that we should add a configuration file for the user to specify a path containing its boards/testplans etc.
# I think Rafael mentioned something about this.
More information about the Fuego