[Fuego] LPC Increase Test Coverage in a Linux-based OS

Wed Nov 9 00:21:05 UTC 2016

> -----Original Message-----
> From: Guillermo Adrian Ponce Castañeda on Tuesday, November 08, 2016 11:38 AM
>
> I am a co-author of this code and I must confess that it was more or less my
> fault that it was made on Perl.

No blame intended. :-)

> 
> Regarding how many logs the program analyzes, I think it is nowhere near
> 5000, it is much less, but taking in count that some logs are similar I think it is
> possible that some logs that haven't been tested are going to work, but who
> knows :).
> 
> 
> And about the output file, right now it delivers a comma separated list of
> numbers, without headers, this is because this code is part of a  bigger tool, I
> think that code is not open source yet, but that doesn't matter I guess, the
> thing here is that I think the output could be changed into a json like you
> suggested and i can try to translate the code from Perl to Python, still not
> sure how long it's gonna take, but I can sure try.

Well, don't do any re-writing just yet.  I think we need to consider the
output format some more, and decide whether it makes sense to have a
single vs. multiple parsers first.

An important issue here is scalability of the project, and making it easy
to allow (and incentivize) other developers to create and maintain
parsers for the log files.  Or, to help encourage people to use a common
format either initially, or by conversion from their current log format.
The only way to scale this is by having 3rd parties adopt the format, and
be willing to maintain compatibility with it over time.

I think it's important to consider what will motivate people to adopt a common
log format.  They either need to 1) write a parser for their current format, or
2) identify an existing parser which is close enough and modify it to
support their format, or 3) convert their test output directly to the desired
format.  This will be some amount of work whichever route people take.

I think what will be of value is having tools that read and process the format,
and provide utility to those who use the format for output.  So I want to do a bit
of a survey on what tools (visualizers, aggregators, automated processors, 
notifiers, etc.) might be useful to different developer groups, and make sure
the format is something that can be used by existing tools or by envisioned
future tools, that would be valuable to community members.

In more high-level terms, we should trying to create a double-sided network effect,
where use  (output) of the format drives tools creation, and tools usage
of the format (input) drives format popularity.

Can you describe a bit more what tools, if any, you use to view the results,
or any other processing systems that the results are used with?  If you are reviewing
results manually, are there steps you are doing now by hand that you'd like to
do automatically in the future, that a common format would help you with?

I'll go first - Fuego is currently just using the standard Jenkins "weather" report
and 'list of recent overall pass/failure' for each test. So we don't have anything
visualizing the results of sub-tests, or even displaying the counts for each test run, at the moment.
Daniel Sangorrin has just recently proposed a facility to put LTP results into spreadsheet format,
to allow visualizing test results over time via spreadsheet tools.  I'd like to add better
sub-test visualization in the future, but that's lower on our priority list at the moment.

Also in the future, we'd like to do test results aggregation, to allow for data mining
of results from tests on different hardware platforms and embedded distributions.
This will require that the parsed log output be machine-readable, and consistent.
 -- Tim

> On Mon, Nov 7, 2016 at 6:26 PM, Bird, Timothy <Tim.Bird at am.sony.com
> <mailto:Tim.Bird at am.sony.com> > wrote:
> 
> 
> 	Victor,
> 
> 	Thanks for raising this topic.  I think it's an important one.  I have
> some comments below, inline.
> 
> 	> -----Original Message-----
> 	> From: Victor Rodriguez on Saturday, November 05, 2016 10:15 AM
> 	>
> 	> This week I presented a case of study for the problem of lack of
> test
> 	> log output standardization in the majority of packages that are used
> 	> to build the current Linux distributions. This was presented as a BOF
> 	> ( https://www.linuxplumbersconf.org/2016/ocw/proposals/3555
> <https://www.linuxplumbersconf.org/2016/ocw/proposals/3555> )  during
> 	> the Linux Plumbers Conference.
> 	>
> 	> it was a productive  discussion that let us share the problem that
> we
> 	> have in the current projects that we use every day to build a
> 	> distribution ( either in embedded as in a cloud base distribution).
> 	> The open source projects don't follow a standard output log format
> to
> 	> print the passing and failing tests that they run during packaging
> 	> time ( "make test" or "make check" )
> 	>
> 	> The Clear Linux project is using a simple Perl script that helps them
> 	> to count the number of passing and failing tests (which should be
> 	> trivial if could have a single standard output among all the projects,
> 	> but we don’t):
> 	>
> 	>
> https://github.com/clearlinux/autospec/blob/master/autospec/count.pl
> <https://github.com/clearlinux/autospec/blob/master/autospec/count.pl>
> 	>
> 	> # perl count.pl <http://count.pl>  <build.log>
> 
> 	A few remarks about this.  This will be something of a stream of
> ideas, not
> 	very well organized.  I'd like to prevent requiring too many different
> 	language skills in Fuego.  In order to write a test for Fuego, we
> already require
> 	knowledge of shell script, python (for the benchmark parsers) and
> json formats
> 	(for the test specs and plans).  I'd be hesitant to adopt something in
> perl, but maybe
> 	there's a way to leverage the expertise embedded in your script.
> 
> 	I'm not that fond of the idea of integrating all the parsers into a single
> program.
> 	I think it's conceptually simpler to have a parser per log file format.
> However,
> 	I haven't looked in detail at your parser, so I can't really comment on
> it's
> 	complexity.  I note that 0day has a parser per test (but I haven't
> checked to
> 	see if they re-use common parsers between tests.)  Possibly some
> combination
> 	of code-driven and data-driven parsers is best, but I don't have the
> experience
> 	you guys do with your parser.
> 
> 	If I understood your presentation, you are currently parsing
> 	logs for thousands of packages. I thought you said that about half of
> the
> 	20,000 packages in a distro have unit tests, and I thought you said
> that
> 	your parser was covering about half of those (so, about 5000
> packages currently).
> 	And this is with 26 log formats parsed so far.
> 
> 	I'm guessing that packages have a "long tail" of formats, with them
> getting
> 	weirder and weirder the farther out on the tail of formats you get.
> 
> 	Please correct my numbers if I'm mistaken.
> 
> 	> Examples of real packages build logs:
> 	>
> 	>
> https://kojipkgs.fedoraproject.org//packages/gcc/6.2.1/2.fc25/data/logs/x8
> <https://kojipkgs.fedoraproject.org//packages/gcc/6.2.1/2.fc25/data/logs/x
> 8>
> 	> 6_64/build.log
> 	>
> https://kojipkgs.fedoraproject.org//packages/acl/2.2.52/11.fc24/data/logs/x
> <https://kojipkgs.fedoraproject.org//packages/acl/2.2.52/11.fc24/data/logs/
> x>
> 	> 86_64/build.log
> 	>
> 	> So far that simple (and not well engineered) parser has found 26
> 	> “standard” outputs ( and counting ) .
> 
> 	This is actually remarkable, as Fuego is only handing the formats for
> the
> 	standalone tests we ship with Fuego.  As I stated in the BOF, we have
> two
> 	mechanisms, one for functional tests that uses shell, grep and diff,
> and
> 	one for benchmark tests that uses a very small python program that
> uses
> 	regexes.   So, currently we only have 50 tests covered, but many of
> these
> 	parsers use very simple one-line grep regexes.
> 
> 	Neither of these Fuego log results parser methods supports tracking
> individual
> 	subtest results.
> 
> 	> The script has the fail that it
> 	> does not recognize the name of the tests in order to detect
> 	> regressions. Maybe one test was passing in the previous release
> and in
> 	> the new one is failing, and then the number of failing tests remains
> 	> the same.
> 
> 	This is a concern with the Fuego log parsing as well.
> 
> 	I would like to modify Fuego's parser to not just parse out counts, but
> to
> 	also convert the results to something where individual sub-tests can
> be
> 	tracked over time.  Daniel Sangorrin's recent work converting the
> output
> 	of LTP into excel format might be one way to do this (although I'm
> not
> 	that comfortable with using a proprietary format - I would prefer CSV
> 	or json, but I think Daniel is going for ease of use first.)
> 
> 	I need to do some more research, but I'm hoping that there are
> Jenkins
> 	plugins (maybe xUnit) that will provide tools to automatically handle
> 	visualization of test and sub-test results over time.  If so, I might
> 	try converting the Fuego parsers to product that format.
> 
> 	> To be honest, before presenting at LPC I was very confident that
> this
> 	> script ( or another version of it , much smarter ) could be beginning
> 	> of the solution to the problem we have. However, during the
> discussion
> 	> at LPC I understand that this might be a huge effort (not sure if
> 	> bigger) in order to solve the nightmare we already have.
> 
> 	So far, I think you're solving a bit different problem than Fuego is,
> and in one sense are
> 	much farther along than Fuego.  I'm hoping we can learn from your
> 	experience with this.
> 
> 	I do think we share the goal of producing a standard, or at least a
> recommendation,
> 	for a common test log output format.  This would help the industry
> going forward.
> 	Even if individual tests don't produce the standard format, it will help
> 3rd parties
> 	write parsers that conform the test output to the format, as well as
> encourage the
> 	development of tools that utilize the format for visualization or
> regression checking.
> 
> 	Do you feel confident enough to propose a format?  I don't at the
> moment.
> 	I'd like to survey the industry for 1) existing formats produced by
> tests (which you have good experience
> 	with, which is already maybe capture well by your perl script), and 2)
> existing tools
> 	that use common formats as input (e.g. the Jenkins xunit plugin).
> From this I'd like
> 	to develop some ideas about the fields that are most commonly
> used, and a good language to
> 	express those fields. My preference would be JSON - I'm something
> of an XML naysayer, but
> 	I could be talked into YAML.  Under no circumstances do I want to
> invent a new language for
> 	this.
> 
> 	> Tim Bird participates at the BOF and recommends me to send a mail
> to
> 	> the Fuego project team in order to look for more inputs and ideas
> bout
> 	> this topic.
> 	>
> 	> I really believe in the importance of attack this problem before we
> 	> have a bigger problem
> 	>
> 	> All feedback is more than welcome
> 
> 	Here is how I propose moving forward on this.  I'd like to get a group
> together to study this
> 	issue.  I wrote down a list of people at LPC who seem to be working
> on test issues.  I'd like to
> 	do the following:
> 	 1) perform a survey of the areas I mentioned above
> 	 2) write up a draft spec
> 	 3) send it around for comments (to what individual and lists? is an
> open issue)
> 	 4) discuss it at a future face-to-face meeting (probably at ELC or
> maybe next year's plumbers)
> 	 5) publish it as a standard endorsed by the Linux Foundation
> 
> 	Let me know what you think, and if you'd like to be involved.
> 
> 	Thanks and regards,
> 	 -- Tim
> 
> 
> 
> 
> 
> 
> --
> 
> - Guillermo Ponce