[Fuego] unified output format work (as of July 14)

Daniel Sangorrin daniel.sangorrin at toshiba.co.jp
Tue Jul 18 01:07:07 UTC 2017


Hi Tim,

> -----Original Message-----
> From: Bird, Timothy [mailto:Tim.Bird at sony.com]
> Sent: Saturday, July 15, 2017 9:15 AM
> To: fuego at lists.linuxfoundation.org
> Cc: Daniel Sangorrin
> Subject: unified output format work (as of July 14)
> 
> Daniel,
> 
> Thanks for putting your work on the unified output format (which I will abbreviate to UOF, henceforth) in your bitbucket branch.
> 
> I had a chance to look at the implementation and a few questions came up:
> 
> The Benchmark.bonnie run.json file has a bunch of measurements
> that are not in the bonnie test log. Specifically, the run.json file has
> measurements for "latency" for several of the test cases, but I don't
> see these anywhere in the bonnie output.  Also, the parser.py program
> never parses anything from the bonnie test log that it calls a 'latency'.

Good catch. 

I am planning to upgrade the Bonnie++ version from the old 1.03e tarball to 
the latest one (1.97.3) which does include 'latency'. I will keep compatibility 
with the old version when I add the necessary entries to parser.py. 
 
> I believe these ('latency' measurement definitions) are being added
> because they are present in the reference.json file.  Can you explain
> why they are there?

Yes.

The reference.json contains a list of expected test cases and measurements.
If a test case is not executed or it doesn't take a measurement , then the run.json
will reflect that with the word SKIP. Then, it's up to the 'criteria' field to interpret 
that as an error or not. There are many reasons why a test could be skipped. For
example, in the case of Bonnie++ some test measures are not taken if a machine
is too fast and the results are within the error margin.

By the way, I am still thinking about how the user is going to override the criteria and threshold
values.
 
> I'm a bit worried that neither run.json or reference.json is very human-readable
> anymore.  The amount of information in them is now quite big - at least
> for the bonnie test, which itself has about 28 measurements.

LTP will be much worse ;_;

> I'm also worried about the difficulty involved in programming the visualization
> and report generation tools due to the nesting of test_set,test_case,measurements.
> I think you mentioned that you took out arbitrary nesting.  Is that correct?

Arbitrary nesting is working.

For the reporting and visualization tools, I am implementing the code that merges the
run.json files into a single results.json. There will be a  results.json per test.
The format for results.json will have a much more simplified and human-readable schema.
In particular, I will discard information that is not relevant to visualization such as criteria,
thresholds, or intermediate status values.

> In terms of workflow, a tester will run a test, and will determine the success or failure
> by using reference.json.  A tester will want customize the pass criteria for their board,
> based on their environment, bug and release priorities, or other factors.  Also, we want
> testers to be able to share these pass criteria (results thresholds, pass/fail counts, and items
> to ignore) - so I think they should be easy to read and manipulate so I'd like to investigate
> 1) allow for human manipulation in an easier non-json format
> 2) creating them automatically with tools
> 
> I'm thinking about possibly generating the reference.json file (that a computer can
> use easily) from a more compact and human-readable non-json-formatted file:
> Something like this:
> Sequential_output.Block.speed > 0 K/sec
> Sequential_output.Block.CPU < 100 %CPU
> Sequential_output.Block.PerChr.speed > 0 K/sec
> Sequential_output.Block.CPU < 100 %CPU
> Sequential_output.Rewrite.speed > 0 K/sec
> Sequential_output.Block.fail_count = 0

Yes, testers must be able to adjust the pass criteria and benchmark thresholds without
modifying the reference.json files. But it also needs to work without any adjustments,
out of the box. For that reason, I'm providing default criteria and threshold values on
the reference.json files that can be overridden by testers. Testers should be able to
override these and other parameters (such as the test spec parameters) in an easy way.
I was thinking that the format used could be just  a json file where you only include
the "diff" information required. 
 
> The run.json file got a lot more complicated than I thought it would.
> I'm a bit worried about this.  I'm not sure that I like all the reference data
> embedded here.  I think you had a rationale for this (or maybe it was Milo),
> but could you refresh my memory on why it needs to be in the run.json
> file rather than kept separate.  (For example, we could copy the reference.json
> file used to evaluate the results into the log directory, if needed).

That can be done.

I just used reference.json to initialize run.json in a simple way and then traverse
it to update the measured data and apply the criteria. I can simplify the 'run_data'
structure just before exporting it to run.json.

> I think it might be useful if the parser.py program allowed for introspection
> of the measurement names.  Something like: './parser.py -l' to show just a list of
> names, without the measurement values from a run.

Good idea. Actually that could be done from ftc in a more generic way by parsing 
the test's reference.json file,  instead of implementing it on each test's parser.py

> I'm still working through it, but it's hard to evaluate the format without the
> associated report generation code.  Matching the JTA-AGL reporting capabilities was
> ultimately is the purpose behind the UOF in the first place.

The HTML reporting code is inside mod.js. The problem is that this is my 2nd iteration
over the parser code, so I need to rewrite the function that generates results.json files
from a number of run.json files.

The other part that needs to be implemented is ftc report which will produce a html/pdf
with graphs, tables and user-provided information.
 
> Do we have examples of the reports that AGL-JTA produced, so we can compare
> with those and make sure we're not losing any features?

You can see some snapshots here: 
http://elinux.org/images/6/6d/Japan_Technical_Jamboree_60.pdf

The current HTML report code does not provide links to hide test cases under test sets. 
It just shows everything on separate tables (one table per test set).
 
> Here are the goals of the UOF project:
>  - allow for capturing Functional sub-test results in a reportable and visualizable way
>    - only benchmarks supported reporting of individual metric data previously
>  - allow reports to be generated in multiple formats (HTML, PDF, etc.)
>  - allow reports and visualization to be possible with aggregate data
>    - that is, allow comparing multiple runs from the same host, or with other
>    runs from other hosts (or developers)
>  - allow for saving data in a database
>  - support for reporting measurements of all possible test (that is, universality
>   of the format)
> 
> A background goal is that it's as easy as possible to develop tests (including the base script,
> parser programs, and pass-criteria (reference) data.

I agree with all of the goals. Maybe we can discuss a roadmap today in the EG-CIAT meeting.

Thanks,
Daniel







More information about the Fuego mailing list