Test log output in split format
| Here are some common log output formats: | Here are some common log output formats: |
| See also Other test systems and Test results formats | See also [[Other test systems]] and [[Test results formats]] |
|
== Discusssion from Fuego list ==Victor Rodriquez wrote (on November 8, 2016): (see here for discussion thread.) |
== Discusssion from Fuego list ==Victor Rodriquez wrote (on November 8, 2016):{{BR}}(see [[https://lists.linuxfoundation.org/pipermail/fuego/2016-November/000103.html|here]] for discussion thread.)
|
| This week I presented a case of study for the problem of lack of testlog output standardization in the majority of packages that are usedto build the current Linux distributions. This was presented as a BOF( https://www.linuxplumbersconf.org/2016/ocw/proposals/3555) duringthe Linux Plumbers Conference. | This week I presented a case of study for the problem of lack of testlog output standardization in the majority of packages that are usedto build the current Linux distributions. This was presented as a BOF( https://www.linuxplumbersconf.org/2016/ocw/proposals/3555) duringthe Linux Plumbers Conference. |
| it was a productive discussion that let us share the problem that wehave in the current projects that we use every day to build adistribution ( either in embedded as in a cloud base distribution).The open source projects don't follow a standard output log format toprint the passing and failing tests that they run during packagingtime ( "make test" or "make check" ) | it was a productive discussion that let us share the problem that wehave in the current projects that we use every day to build adistribution ( either in embedded as in a cloud base distribution).The open source projects don't follow a standard output log format toprint the passing and failing tests that they run during packagingtime ( "make test" or "make check" ) |
| The Clear Linux project is using a simple Perl script that helps themto count the number of passing and failing tests (which should betrivial if could have a single standard output among all the projects,but we dont): | The Clear Linux project is using a simple Perl script that helps themto count the number of passing and failing tests (which should betrivial if could have a single standard output among all the projects,but we dont): |
| https://github.com/clearlinux/autospec/blob/master/autospec/count.pl | https://github.com/clearlinux/autospec/blob/master/autospec/count.pl |
| # perl count.pl <build.log> | # perl count.pl <build.log> |
| Examples of real packages build logs: | Examples of real packages build logs: |
| https://kojipkgs.fedoraproject.org//packages/gcc/6.2.1/2.fc25/data/logs/x86_64/build.loghttps://kojipkgs.fedoraproject.org//packages/acl/2.2.52/11.fc24/data/logs/x86_64/build.log | https://kojipkgs.fedoraproject.org//packages/gcc/6.2.1/2.fc25/data/logs/x86_64/build.loghttps://kojipkgs.fedoraproject.org//packages/acl/2.2.52/11.fc24/data/logs/x86_64/build.log |
| So far that simple (and not well engineered) parser has found 26"standard" outputs ( and counting ) . The script has the fail that itdoes not recognize the name of the tests in order to detectregressions. Maybe one test was passing in the previous release and inthe new one is failing, and then the number of failing tests remainsthe same. | So far that simple (and not well engineered) parser has found 26"standard" outputs ( and counting ) . The script has the fail that itdoes not recognize the name of the tests in order to detectregressions. Maybe one test was passing in the previous release and inthe new one is failing, and then the number of failing tests remainsthe same. |
| To be honest, before presenting at LPC I was very confident that thisscript ( or another version of it , much smarter ) could be beginningof the solution to the problem we have. However, during the discussionat LPC I understand that this might be a huge effort (not sure ifbigger) in order to solve the nightmare we already have. | To be honest, before presenting at LPC I was very confident that thisscript ( or another version of it , much smarter ) could be beginningof the solution to the problem we have. However, during the discussionat LPC I understand that this might be a huge effort (not sure ifbigger) in order to solve the nightmare we already have. |
|
| ---- |
| Tim Bird responded:A few remarks about this. This will be something of a stream of ideas, notvery well organized. I'd like to prevent requiring too many differentlanguage skills in Fuego. In order to write a test for Fuego, we already requireknowledge of shell script, python (for the benchmark parsers) and json formats(for the test specs and plans). I'd be hesitant to adopt something in perl, but maybethere's a way to leverage the expertise embedded in your script. | Tim Bird responded:A few remarks about this. This will be something of a stream of ideas, notvery well organized. I'd like to prevent requiring too many differentlanguage skills in Fuego. In order to write a test for Fuego, we already requireknowledge of shell script, python (for the benchmark parsers) and json formats(for the test specs and plans). I'd be hesitant to adopt something in perl, but maybethere's a way to leverage the expertise embedded in your script. |
| I'm not that fond of the idea of integrating all the parsers into a single program.I think it's conceptually simpler to have a parser per log file format. However,I haven't looked in detail at your parser, so I can't really comment on it'scomplexity. I note that 0day has a parser per test (but I haven't checked tosee if they re-use common parsers between tests.) Possibly some combinationof code-driven and data-driven parsers is best, but I don't have the experienceyou guys do with your parser. | I'm not that fond of the idea of integrating all the parsers into a single program.I think it's conceptually simpler to have a parser per log file format. However,I haven't looked in detail at your parser, so I can't really comment on it'scomplexity. I note that 0day has a parser per test (but I haven't checked tosee if they re-use common parsers between tests.) Possibly some combinationof code-driven and data-driven parsers is best, but I don't have the experienceyou guys do with your parser. |
| If I understood your presentation, you are currently parsinglogs for thousands of packages. I thought you said that about half of the20,000 packages in a distro have unit tests, and I thought you said thatyour parser was covering about half of those (so, about 5000 packages currently).And this is with 26 log formats parsed so far. | If I understood your presentation, you are currently parsinglogs for thousands of packages. I thought you said that about half of the20,000 packages in a distro have unit tests, and I thought you said thatyour parser was covering about half of those (so, about 5000 packages currently).And this is with 26 log formats parsed so far. |
| I'm guessing that packages have a "long tail" of formats, with them gettingweirder and weirder the farther out on the tail of formats you get. | I'm guessing that packages have a "long tail" of formats, with them gettingweirder and weirder the farther out on the tail of formats you get. |
| Please correct my numbers if I'm mistaken. | Please correct my numbers if I'm mistaken. |
{{{#!IndentPreindent=2> So far that simple (and not well engineered) parser has found 26> standard outputs ( and counting ) . }}}
| |
| This is actually remarkable, as Fuego is only handing the formats for thestandalone tests we ship with Fuego. As I stated in the BOF, we have two mechanisms, one for functional tests that uses shell, grep and diff, andone for benchmark tests that uses a very small python program that usesregexes. So, currently we only have 50 tests covered, but many of theseparsers use very simple one-line grep regexes. | This is actually remarkable, as Fuego is only handing the formats for thestandalone tests we ship with Fuego. As I stated in the BOF, we have two mechanisms, one for functional tests that uses shell, grep and diff, andone for benchmark tests that uses a very small python program that usesregexes. So, currently we only have 50 tests covered, but many of theseparsers use very simple one-line grep regexes. |
| Neither of these Fuego log results parser methods supports tracking individualsubtest results. | Neither of these Fuego log results parser methods supports tracking individualsubtest results. |
{{{#!IndentPreindent=2> The script has the fail that it> does not recognize the name of the tests in order to detect> regressions. Maybe one test was passing in the previous release and in> the new one is failing, and then the number of failing tests remains> the same.}}}
| |
| This is a concern with the Fuego log parsing as well. | This is a concern with the Fuego log parsing as well. |
| I would like to modify Fuego's parser to not just parse out counts, but toalso convert the results to something where individual sub-tests can betracked over time. Daniel Sangorrin's recent work converting the outputof LTP into excel format might be one way to do this (although I'm notthat comfortable with using a proprietary format - I would prefer CSVor json, but I think Daniel is going for ease of use first.) | I would like to modify Fuego's parser to not just parse out counts, but toalso convert the results to something where individual sub-tests can betracked over time. Daniel Sangorrin's recent work converting the outputof LTP into excel format might be one way to do this (although I'm notthat comfortable with using a proprietary format - I would prefer CSVor json, but I think Daniel is going for ease of use first.) |
| I need to do some more research, but I'm hoping that there are Jenkinsplugins (maybe xUnit) that will provide tools to automatically handle visualization of test and sub-test results over time. If so, I mighttry converting the Fuego parsers to produce that format. | I need to do some more research, but I'm hoping that there are Jenkinsplugins (maybe xUnit) that will provide tools to automatically handle visualization of test and sub-test results over time. If so, I mighttry converting the Fuego parsers to produce that format. |
| ... | ... |
| I do think we share the goal of producing a standard, or at least a recommendation,for a common test log output format. This would help the industry going forward.Even if individual tests don't produce the standard format, it will help 3rd partieswrite parsers that conform the test output to the format, as well as encourage thedevelopment of tools that utilize the format for visualization or regression checking. | I do think we share the goal of producing a standard, or at least a recommendation,for a common test log output format. This would help the industry going forward.Even if individual tests don't produce the standard format, it will help 3rd partieswrite parsers that conform the test output to the format, as well as encourage thedevelopment of tools that utilize the format for visualization or regression checking. |
| Do you feel confident enough to propose a format? I don't at the moment.I'd like to survey the industry for 1) existing formats produced by tests (which you have good experiencewith, which is already maybe capture well by your perl script), and 2) existing toolsthat use common formats as input (e.g. the Jenkins xunit plugin). From this I'd liketo develop some ideas about the fields that are most commonly used, and a good language toexpress those fields. My preference would be JSON - I'm something of an XML naysayer, butI could be talked into YAML. Under no circumstances do I want to invent a new language forthis. | Do you feel confident enough to propose a format? I don't at the moment.I'd like to survey the industry for 1) existing formats produced by tests (which you have good experiencewith, which is already maybe capture well by your perl script), and 2) existing toolsthat use common formats as input (e.g. the Jenkins xunit plugin). From this I'd liketo develop some ideas about the fields that are most commonly used, and a good language toexpress those fields. My preference would be JSON - I'm something of an XML naysayer, butI could be talked into YAML. Under no circumstances do I want to invent a new language forthis. |
| ... | ... |
| Here is how I propose moving forward on this. I'd like to get a group together to study thisissue. I wrote down a list of people at LPC who seem to be working on test issues. I'd like todo the following: * perform a survey of the areas I mentioned above * write up a draft spec * send it around for comments (to what individual and lists? is an open issue) * discuss it at a future face-to-face meeting (probably at ELC or maybe next year's plumbers) * publish it as a standard endorsed by the Linux Foundation | Here is how I propose moving forward on this. I'd like to get a group together to study thisissue. I wrote down a list of people at LPC who seem to be working on test issues. I'd like todo the following: * perform a survey of the areas I mentioned above * write up a draft spec * send it around for comments (to what individual and lists? is an open issue) * discuss it at a future face-to-face meeting (probably at ELC or maybe next year's plumbers) * publish it as a standard endorsed by the Linux Foundation |
| ----Victor wrote later: | ----Victor wrote later: |
| After talking with Guillermo we came to the idea of move our parsersto the Fuego modules | After talking with Guillermo we came to the idea of move our parsersto the Fuego modules |
| We are going to attack this problem with two solutions, happy to hear feeadback | We are going to attack this problem with two solutions, happy to hear feeadback |
| * 1) Merge the parsers we have into the Fuego infrastructure * 2) Provide an API to the new developers ( and current maintainers ofthe existing packages ) to check if their logs are easy to track * 'easy to track' means that we can get the status and name of each test * if the parser can't read the log file we suggest the developer to fit their test to a standard ( as CMAKE or autotools ) |
| To be honest it seems like a Titanic work to change all the packagesto a standard log output ( specially since there are things from the80's ) but we can make the new ones fit the standards we have andsugest the maintainers to fit into one. | To be honest it seems like a Titanic work to change all the packagesto a standard log output ( specially since there are things from the80's ) but we can make the new ones fit the standards we have andsugest the maintainers to fit into one. |
| Tim , I think that we should make a call for action to the linuxcomunity , do you think a publication might be useful ? maybe LWN orsomeplace else ? | Tim , I think that we should make a call for action to the linuxcomunity , do you think a publication might be useful ? maybe LWN orsomeplace else ? |
|
| ---- |
Discussion summary [edit section]https://packages.debian.org/sid/autopkgtest | = Discussion summary = * the ClearLinux project has a program count.pl (perl script) which has embedded in it about 26 different parsers for log formats, and can produce counts of passing and failingtests, based on build logs and test logs (produced using 'make' and 'make test' for the packages. * it produces text output with a comma-separate list of numbers * something like '<package>,100,80,20,0,0' * visualization is done by combining the CSV files and creating graphs from the data * Fuego 1.0 does not provide counts or fancy visualization at the moment (pass/fail at the level of a Jenkins job (Fuego test), and plots for some Benchmark measures. * There are some existing systems for testing packages in debian and yocto: * https://packages.debian.org/sid/autopkgtest * https://wiki.yoctoproject.org/wiki/Ptest |
| * essential elements of a good output format are: * per testcase: * status * test identifier (string) * duration (?) |