[Fuego] LPC Increase Test Coverage in a Linux-based OS

Thu Nov 10 04:07:10 UTC 2016

Hi all,

> -----Original Message-----
> From: fuego-bounces at lists.linuxfoundation.org [mailto:fuego-bounces at lists.linuxfoundation.org] On Behalf Of Bird, Timothy
> Sent: Wednesday, November 09, 2016 9:21 AM
> To: Guillermo Adrian Ponce Castañeda
> Cc: fuego at lists.linuxfoundation.org
> Subject: Re: [Fuego] LPC Increase Test Coverage in a Linux-based OS
> 
...
> I'll go first - Fuego is currently just using the standard Jenkins "weather" report
> and 'list of recent overall pass/failure' for each test. So we don't have anything
> visualizing the results of sub-tests, or even displaying the counts for each test run, at the moment.
> Daniel Sangorrin has just recently proposed a facility to put LTP results into spreadsheet format,
> to allow visualizing test results over time via spreadsheet tools.  I'd like to add better
> sub-test visualization in the future, but that's lower on our priority list at the moment.

Actually, the spreadsheet format I'm using is basically CSV + some colors to easily distinguish
failed from pass tests. It can be opened with libreoffice or exported to CSV format.
I can add CSV output to my script (which can also be opened with libreoffice).

Best regards
Daniel

> Also in the future, we'd like to do test results aggregation, to allow for data mining
> of results from tests on different hardware platforms and embedded distributions.
> This will require that the parsed log output be machine-readable, and consistent.
>  -- Tim
> 
> > On Mon, Nov 7, 2016 at 6:26 PM, Bird, Timothy <Tim.Bird at am.sony.com
> > <mailto:Tim.Bird at am.sony.com> > wrote:
> >
> >
> > 	Victor,
> >
> > 	Thanks for raising this topic.  I think it's an important one.  I have
> > some comments below, inline.
> >
> > 	> -----Original Message-----
> > 	> From: Victor Rodriguez on Saturday, November 05, 2016 10:15 AM
> > 	>
> > 	> This week I presented a case of study for the problem of lack of
> > test
> > 	> log output standardization in the majority of packages that are used
> > 	> to build the current Linux distributions. This was presented as a BOF
> > 	> ( https://www.linuxplumbersconf.org/2016/ocw/proposals/3555
> > <https://www.linuxplumbersconf.org/2016/ocw/proposals/3555> )  during
> > 	> the Linux Plumbers Conference.
> > 	>
> > 	> it was a productive  discussion that let us share the problem that
> > we
> > 	> have in the current projects that we use every day to build a
> > 	> distribution ( either in embedded as in a cloud base distribution).
> > 	> The open source projects don't follow a standard output log format
> > to
> > 	> print the passing and failing tests that they run during packaging
> > 	> time ( "make test" or "make check" )
> > 	>
> > 	> The Clear Linux project is using a simple Perl script that helps them
> > 	> to count the number of passing and failing tests (which should be
> > 	> trivial if could have a single standard output among all the projects,
> > 	> but we don’t):
> > 	>
> > 	>
> > https://github.com/clearlinux/autospec/blob/master/autospec/count.pl
> > <https://github.com/clearlinux/autospec/blob/master/autospec/count.pl>
> > 	>
> > 	> # perl count.pl <http://count.pl>  <build.log>
> >
> > 	A few remarks about this.  This will be something of a stream of
> > ideas, not
> > 	very well organized.  I'd like to prevent requiring too many different
> > 	language skills in Fuego.  In order to write a test for Fuego, we
> > already require
> > 	knowledge of shell script, python (for the benchmark parsers) and
> > json formats
> > 	(for the test specs and plans).  I'd be hesitant to adopt something in
> > perl, but maybe
> > 	there's a way to leverage the expertise embedded in your script.
> >
> > 	I'm not that fond of the idea of integrating all the parsers into a single
> > program.
> > 	I think it's conceptually simpler to have a parser per log file format.
> > However,
> > 	I haven't looked in detail at your parser, so I can't really comment on
> > it's
> > 	complexity.  I note that 0day has a parser per test (but I haven't
> > checked to
> > 	see if they re-use common parsers between tests.)  Possibly some
> > combination
> > 	of code-driven and data-driven parsers is best, but I don't have the
> > experience
> > 	you guys do with your parser.
> >
> > 	If I understood your presentation, you are currently parsing
> > 	logs for thousands of packages. I thought you said that about half of
> > the
> > 	20,000 packages in a distro have unit tests, and I thought you said
> > that
> > 	your parser was covering about half of those (so, about 5000
> > packages currently).
> > 	And this is with 26 log formats parsed so far.
> >
> > 	I'm guessing that packages have a "long tail" of formats, with them
> > getting
> > 	weirder and weirder the farther out on the tail of formats you get.
> >
> > 	Please correct my numbers if I'm mistaken.
> >
> > 	> Examples of real packages build logs:
> > 	>
> > 	>
> > https://kojipkgs.fedoraproject.org//packages/gcc/6.2.1/2.fc25/data/logs/x8
> > <https://kojipkgs.fedoraproject.org//packages/gcc/6.2.1/2.fc25/data/logs/x
> > 8>
> > 	> 6_64/build.log
> > 	>
> > https://kojipkgs.fedoraproject.org//packages/acl/2.2.52/11.fc24/data/logs/x
> > <https://kojipkgs.fedoraproject.org//packages/acl/2.2.52/11.fc24/data/logs/
> > x>
> > 	> 86_64/build.log
> > 	>
> > 	> So far that simple (and not well engineered) parser has found 26
> > 	> “standard” outputs ( and counting ) .
> >
> > 	This is actually remarkable, as Fuego is only handing the formats for
> > the
> > 	standalone tests we ship with Fuego.  As I stated in the BOF, we have
> > two
> > 	mechanisms, one for functional tests that uses shell, grep and diff,
> > and
> > 	one for benchmark tests that uses a very small python program that
> > uses
> > 	regexes.   So, currently we only have 50 tests covered, but many of
> > these
> > 	parsers use very simple one-line grep regexes.
> >
> > 	Neither of these Fuego log results parser methods supports tracking
> > individual
> > 	subtest results.
> >
> > 	> The script has the fail that it
> > 	> does not recognize the name of the tests in order to detect
> > 	> regressions. Maybe one test was passing in the previous release
> > and in
> > 	> the new one is failing, and then the number of failing tests remains
> > 	> the same.
> >
> > 	This is a concern with the Fuego log parsing as well.
> >
> > 	I would like to modify Fuego's parser to not just parse out counts, but
> > to
> > 	also convert the results to something where individual sub-tests can
> > be
> > 	tracked over time.  Daniel Sangorrin's recent work converting the
> > output
> > 	of LTP into excel format might be one way to do this (although I'm
> > not
> > 	that comfortable with using a proprietary format - I would prefer CSV
> > 	or json, but I think Daniel is going for ease of use first.)
> >
> > 	I need to do some more research, but I'm hoping that there are
> > Jenkins
> > 	plugins (maybe xUnit) that will provide tools to automatically handle
> > 	visualization of test and sub-test results over time.  If so, I might
> > 	try converting the Fuego parsers to product that format.
> >
> > 	> To be honest, before presenting at LPC I was very confident that
> > this
> > 	> script ( or another version of it , much smarter ) could be beginning
> > 	> of the solution to the problem we have. However, during the
> > discussion
> > 	> at LPC I understand that this might be a huge effort (not sure if
> > 	> bigger) in order to solve the nightmare we already have.
> >
> > 	So far, I think you're solving a bit different problem than Fuego is,
> > and in one sense are
> > 	much farther along than Fuego.  I'm hoping we can learn from your
> > 	experience with this.
> >
> > 	I do think we share the goal of producing a standard, or at least a
> > recommendation,
> > 	for a common test log output format.  This would help the industry
> > going forward.
> > 	Even if individual tests don't produce the standard format, it will help
> > 3rd parties
> > 	write parsers that conform the test output to the format, as well as
> > encourage the
> > 	development of tools that utilize the format for visualization or
> > regression checking.
> >
> > 	Do you feel confident enough to propose a format?  I don't at the
> > moment.
> > 	I'd like to survey the industry for 1) existing formats produced by
> > tests (which you have good experience
> > 	with, which is already maybe capture well by your perl script), and 2)
> > existing tools
> > 	that use common formats as input (e.g. the Jenkins xunit plugin).
> > From this I'd like
> > 	to develop some ideas about the fields that are most commonly
> > used, and a good language to
> > 	express those fields. My preference would be JSON - I'm something
> > of an XML naysayer, but
> > 	I could be talked into YAML.  Under no circumstances do I want to
> > invent a new language for
> > 	this.
> >
> > 	> Tim Bird participates at the BOF and recommends me to send a mail
> > to
> > 	> the Fuego project team in order to look for more inputs and ideas
> > bout
> > 	> this topic.
> > 	>
> > 	> I really believe in the importance of attack this problem before we
> > 	> have a bigger problem
> > 	>
> > 	> All feedback is more than welcome
> >
> > 	Here is how I propose moving forward on this.  I'd like to get a group
> > together to study this
> > 	issue.  I wrote down a list of people at LPC who seem to be working
> > on test issues.  I'd like to
> > 	do the following:
> > 	 1) perform a survey of the areas I mentioned above
> > 	 2) write up a draft spec
> > 	 3) send it around for comments (to what individual and lists? is an
> > open issue)
> > 	 4) discuss it at a future face-to-face meeting (probably at ELC or
> > maybe next year's plumbers)
> > 	 5) publish it as a standard endorsed by the Linux Foundation
> >
> > 	Let me know what you think, and if you'd like to be involved.
> >
> > 	Thanks and regards,
> > 	 -- Tim
> >
> >
> >
> >
> >
> >
> > --
> >
> > - Guillermo Ponce
> _______________________________________________
> Fuego mailing list
> Fuego at lists.linuxfoundation.org
> https://lists.linuxfoundation.org/mailman/listinfo/fuego