Fuego 1.2 wiki

OSS Test Vision in split format

Contents:

overview of concepts

Letter to ksummit discuss

Ideas related to the vision

Capturing tests easily
test collateral
test app store
authenticating tests
test system metrics

This page describes aspects of the Open Source Test vision forthe Fuego project, along with some ideas for implementing specificideas related to this vision.

{{TableOfContents}}This page describes aspects of the Open Source Test vision forthe Fuego project, along with some ideas for implementing specificideas related to this vision.

overview of concepts [edit section]

= overview of concepts === Letter to ksummit discuss ==Here's an e-mail Tim sent to the ksummit-discuss list in October, 2016:

I have some ideas on Open Source testing that I'd like to throw out therefor discussion.  Some of these I have been stewing on for a while, whilesome came to mind after talking to people at recent conference events.

{{{I have some ideas on Open Source testing that I'd like to throw out therefor discussion.  Some of these I have been stewing on for a while, whilesome came to mind after talking to people at recent conference events.

Sorry - this is going to be long...

Sorry - this is going to be long...

First, it would be nice to increase the amount of testing we do, by having more test automation. (ok, that's a no-brainer). Recently therehas been a trend towards more centralized testing facilities, like the zero-day stuff or board farms used by kernelci. That makes sense, asthis requires specialized hardware, setup, or skills to operate certainkinds of test environments. As one example, an automated test ofkernel boot requires automated control of power to a board orplatform, which is not very common among kernel developers.A centralized test facility has the expertise and hardware to addnew test nodes relatively cheaply. They can do this more quicklyand much less expensively than the first such node by an individualnew to testing.

First, it would be nice to increase the amount of testing we do, by having more test automation. (ok, that's a no-brainer). Recently therehas been a trend towards more centralized testing facilities, like the zero-day stuff or board farms used by kernelci. That makes sense, asthis requires specialized hardware, setup,  or skills to operate certainkinds of test environments.  As one example, an automated test ofkernel boot requires automated control of power to a board orplatform, which is not very common among kernel developers.A centralized test facility has the expertise and hardware to addnew test nodes relatively cheaply. They can do this more quicklyand much less expensively than the first such node by an individualnew to testing.

However, I think to make great strides in test quantity and coverage,it's important to focus on ease of use for individual test nodes. Myvision would be to have tens of thousands of individual test nodesrunning automated tests on thousands of different hardware platformsand configurations and workloads.

However, I think to make great strides in test quantity and coverage,it's important to focus on ease of use for individual test nodes. Myvision would be to have tens of thousands of individual test nodesrunning automated tests on thousands of different hardware platformsand configurations and workloads.

The kernel selftest project is a step in the right direction for this, becauseit allows any kernel developer to easily (in theory) run automated unit testsfor the kernel. However, this is still a manual process. I'd like to seeimproved standards and infrastructure for automating tests.

The kernel selftest project is a step in the right direction for this, becauseit allows any kernel developer to easily (in theory) run automated unit testsfor the kernel.  However, this is still a manual process.  I'd like to seeimproved standards and infrastructure for automating tests.

It turns out there are lots of manual steps in the testingand bug-fixing process with the kernel (and other Linux-relatedsoftware). It would be nice if a new system allowed us to capturemanual steps, and over time convert them to automation.

It turns out there are lots of manual steps in the testingand bug-fixing process with the kernel (and other Linux-relatedsoftware).  It would be nice if a new system allowed us to capturemanual steps, and over time convert them to automation.

Here are some problems with the manual process that I think needaddressing:

Here are some problems with the manual process that I think needaddressing:

1) How does an individual know what tests are valid for their platform?Currently, this is a manual decision. In a world with thousands or tens ofthousands of tests, this will be very difficult. We need to have automatedmechanisms to indicate which tests are relevant for a platform.Test definitions should include a description of the hardware they need,or the test setup they need. For example, it would be nice to have testsindicate that they need to be run on a node with USB gadget support,or on a node with the gadget hardware from a particular vendor (e.g. aparticular SOC), or with a particular hardware phy (e.g. Synopsis). Asanother example, if a test requires that the hardware physically reboot,then that should be indicated in the test. If a test requires that a particularbutton be pressed (and that the button be available to be pressed), itshould be listed. Or if the test requires that an external node be availableto participate in the test (such as a wifi endpoint, CANbus endpoint, ori2C device) be present, that should be indicated. There should be away for the test nodes which provide those hardware capabilities,setups, or external resources to identify themselves. Standards shouldbe developed for how a test node and a test can express these capabilitiesand requirements. Also, standards need to be developed so thata test can control those external resources to participate in tests.Right now each test framework handles this in its own way (if it providessupport for it at all).

I heard of a neat setup at one company where the video outputfrom a system was captured by another video system, and the resultsanalyzed automatically. This type of test setup currently requires anenormous investment of expertise, and possibly specialized hardware.Once such a setup is performed in a few locations, it makes muchmore sense to direct tests that need such facilities to those locations,than it does to try to spread the expertise to lots of differentindividuals (although that certainly has value also).

I heard of a neat setup at one company where the video outputfrom a system was captured by another video system, and the resultsanalyzed automatically.  This type of test setup currently requires anenormous investment of expertise, and possibly specialized hardware.Once such a setup is performed in a few locations, it makes muchmore sense to direct tests that need such facilities to those locations,than it does to try to spread the expertise to lots of differentindividuals (although that certainly has value also).

For a first pass, I think the kernel CONFIG variables needed by a testshould be indicated, and they could be compared with the configfor the device under test. This would be a start on the expressionof the dependencies between a test and the features of the test node.

For a first pass, I think the kernel CONFIG variables needed by a testshould be indicated, and they could be compared with the configfor the device under test.  This would be a start on the expressionof the dependencies between a test and the features of the test node.

2) how do you connect people who are interested in a particulartest with a node that can perform that test?

2) how do you connect people who are interested in a particulartest with a node that can perform that test?

My proposal here is simple - for every subsystem of the kernel,put a list of test nodes in the MAINTAINERS file, toindicate nodes that are available to test that subsystem. Tests canbe scheduled to run on those nodes, either whenever new patchesare received for that sub-system, or when a bug is encounteredand developers for that subsystem want to investigate it by writinga new test. Tests or data collection instructions that are nowprovided manually would be converted to formal test definitions,and added to a growing body of tests. This should help peoplere-use test operations that are common. Capturing test operationsthat are done manually into a script would need to be very easy(possibly itself automated), and it would need to be easy to publishthe new test for others to use.

My proposal here is simple - for every subsystem of the kernel,put a list of test nodes in the MAINTAINERS file, toindicate nodes that are available to test that subsystem.  Tests canbe scheduled to run on those nodes, either whenever new patchesare received for that sub-system, or when a bug is encounteredand developers for that subsystem want to investigate it by writinga new test.  Tests or data collection instructions that are nowprovided manually would be converted to formal test definitions,and added to a growing body of tests.  This should help peoplere-use test operations that are common.  Capturing test operationsthat are done manually into a script would need to be very easy(possibly itself automated), and it would need to be easy to publishthe new test for others to use.

Basically, in the future, it would be nice if when a person reporteda bug, instead of the maintainer manually walking someone throughthe steps to identify the bug and track down the problem, they couldpoint the user at an existing test that the user could easily run.

Basically, in the future, it would be nice if when a person reporteda bug, instead of the maintainer manually walking someone throughthe steps to identify the bug and track down the problem, they couldpoint the user at an existing test that the user could easily run.

I imagine a kind of "test app store", where a tester canselect from thousands of tests according to their interest. Also,people could rate the tests, and maintainers could point peopleto tests that are helpful to solve specific problems.

I imagine a kind of "test app store", where a tester canselect from thousands of tests according to their interest.  Also,people could rate the tests, and maintainers could point peopleto tests that are helpful to solve specific problems.

3) How does an individual know how to execute a test and howto interpret the results?

3) How does an individual know how to execute a test and howto interpret the results?

For many features or sub-systems, there are existing tools(e.g bonnie for filesystem tests, netperf for networking tests,or cyclictest for realtime), but these tools have a variety ofoptions for testing different aspects of a problem or for dealingwith different configurations or setups. Online you can find tutorialsfor running each of these, and for helping people interpretthe results. A new test system should take care of runningthese tools with the proper command line arguments for differenttest aspects, and for different test targets ('device-under-test's).

For many features or sub-systems, there are existing tools(e.g bonnie for filesystem tests, netperf for networking tests,or cyclictest for realtime), but these tools have a variety ofoptions for testing different aspects of a problem or for dealingwith different configurations or setups.  Online you can find tutorialsfor running each of these, and for helping people interpretthe results. A new test system should take care of runningthese tools with the proper command line arguments for differenttest aspects, and for different test targets ('device-under-test's).

For example, when someone figures out a set of usefularguments to cyclictest for testing realtime on a beaglebone board,they should be able to easily capture those arguments to allowanother developer using the same board to easily re-usethose test parameters, and interpret the cylictest results,in an automated fashion. Basically we want to automatethe process of finding out "what options do I use for this teston this board, and what the heck number am I supposedto look at in this output, and what should its value be?".

For example, when someone figures out a set of usefularguments to cyclictest for testing realtime on a beaglebone board,they should be able to easily capture those arguments to allowanother developer using the same board to easily re-usethose test parameters, and interpret the cylictest results,in an automated fashion.  Basically we want to automatethe process of finding out "what options do I use for this teston this board, and what the heck number am I supposedto look at in this output, and what should its value be?".

Another issue is with interpretation of test results from large testsuites. One notorious example of this is LTP. It producesthousands of results, and almost always produces failures orresults that can be safely ignored on a particular board or in aparticular environment. It requires a large amount of manualevaluation and expertise to determine which items to payattention to from LTP. It would be nice to be able to capturethis evaluation, and share it with others with either the sameboard, or the same test environment, to allow them to avoidduplicating this work.

Another issue is with interpretation of test results from large testsuites.  One notorious example of this is LTP.  It producesthousands of results, and almost always produces failures orresults that can be safely  ignored on a particular board or in aparticular environment. It requires a large amount of manualevaluation and expertise to determine which items to payattention to from LTP.  It would be nice to be able to capturethis evaluation, and share it with others with either the sameboard, or the same test environment, to allow them to avoidduplicating this work.

Of course, this should not be used to gloss over bugs in LTP orbugs that LTP is reporting correctly and actually need to be paidattention to.

Of course, this should not be used to gloss over bugs in LTP orbugs that LTP is reporting correctly and actually need to be paidattention to.

4) How should this test collateral be expressed, and how shouldit be collected, stored, shared and re-used?

4) How should this test collateral be expressed, and how shouldit be collected, stored, shared and re-used?

There are a multitude of test frameworks available. I am proposingthat as a community we develop standards for test packaging whichinclude this type of information (test dependencies, test parameters,results interpretation). I don't know all the details yet. For this reasonI am coming to the community see how others are solving these problemsand to get ideas for how to solve them in a way that would be usefulfor multiple frameworks. I'm personally working on the Fuego testframework - see http://bird.org/fuego, but I'd like to create somethingthat could be used with any test framework.

There are a multitude of test frameworks available.  I am proposingthat as a community we develop standards for test packaging whichinclude this type of information (test dependencies, test parameters,results interpretation).  I don't know all the details yet.  For this reasonI am coming to the community see how others are solving these problemsand to get ideas for how to solve them in a way that would be usefulfor multiple frameworks.  I'm personally working on the Fuego testframework - see http://bird.org/fuego, but I'd like to create somethingthat could be used with any test framework.

5) How to trust test collateral from other sources (tests, interpretation)

5) How to trust test collateral from other sources (tests, interpretation)

One issue which arises with this type of sharing (or with any type of sharing)is how to trust the materials involved. If a user puts up a node withtheir own hardware, and trusts the test framework to automatically downloadand execute a never-before-seen test, this creates a security and trustissue. I believe this will require the same types of authentication andtrust mechanisms (e.g. signing, validation and trust relationships) that weuse to manage code in the kernel.

One issue which arises with this type of sharing (or with any type of sharing)is how to trust the materials involved.  If a user puts up a node withtheir own hardware, and trusts the test framework to automatically downloadand execute a never-before-seen test, this creates a security and trustissue.  I believe this will require the same types of authentication andtrust mechanisms (e.g. signing, validation and trust relationships) that weuse to manage code in the kernel.

I think this is more important than it sounds. I think the real value of thissystem will come when tens of thousands of nodes are running tests wherethe system owners can largely ignore the operation of the system, andinstead the test scheduling and priorities can be driven by the needs ofdevelopers and maintainers who the test node owners have neverinteracted with.

I think this is more important than it sounds.  I think the real value of thissystem will come when tens of thousands of nodes are running tests wherethe system owners can largely ignore the operation of the system, andinstead the test scheduling and priorities can be driven by the needs ofdevelopers and maintainers who the test node owners have neverinteracted with.

Finally, 6) What is the motivation for someone to run a test on their hardware?

Finally, 6) What is the motivation for someone to run a test on their hardware?

Well, there's an obvious benefit to executing a test if you are personallyinterested in the result. However, I think the benefit of running an enormoustest system needs to be de-coupled from that immediate direct benefit.I think we should look at this the same way we look at other crowd-sourcedinitiatives, like Wikipedia. While there is some small benefit for someoneproducing an individual page edit, we need to move beyond that tothe benefit to the community of the cumulative effort.

Well, there's an obvious benefit to executing a test if you are personallyinterested in the result.  However, I think the benefit of running an enormoustest system needs to be de-coupled from that immediate direct benefit.I think we should look at this the same way  we look at other crowd-sourcedinitiatives, like Wikipedia.  While there is some small benefit for someoneproducing an individual page edit, we need to move beyond that tothe benefit to the community of the cumulative effort.

I think that if we want tens of thousands of people to run tests, then weneed to increase the cost/benefit ratio for the system. First, you need toreduce the cost so that it is very cheap, in all of [time|money|expertise|ongoing attention], to set up and maintain a test node. Second, thereneeds to be a real benefit that people can measure from the cumulativeeffect of participating in the system. I think it would be valuable toreport bugs found and fixed by the system as a whole, and possibly toattribute positive results to the output provided by individualnodes. (Maybe you could 'game-ify' the operation of test nodes.)

I think that if we want tens of thousands of people to run tests, then weneed to increase the cost/benefit ratio for the system.  First, you need toreduce the cost so that it is very cheap, in all of [time|money|expertise|ongoing attention], to set up and maintain a test node.  Second, thereneeds to be a real benefit that people can measure from the cumulativeeffect of participating in the system.  I think it would be valuable toreport bugs found and fixed by the system as a whole, and possibly toattribute positive results to the output provided by individualnodes.  (Maybe you could 'game-ify' the operation of test nodes.)

Well, if you are still reading by now, I appreciate it. I have more ideas, includingmore details for how such a system might work, and what types of thingsit could accomplish. But I'll save that for smaller groups who might be moredirectly interested in this topic.

Well, if you are still reading by now, I appreciate it.  I have more ideas, includingmore details for how such a system might work, and what types of thingsit could accomplish. But I'll save that for smaller groups who might be moredirectly interested in this topic.

To get started, I will begin working on a prototype of a test packaging systemthat includes some of the ideas mentioned here: inclusion of test collateral, and package validation. I would also like to schedule a "test summit" ofsome kind (maybe associated with ELC or Linaro Connect, or someother event), to discuss standards in the area I propose.

To get started, I will begin working on a prototype of a test packaging systemthat includes some of the ideas mentioned here: inclusion of test collateral, and package validation.  I would also like to schedule a "test summit" ofsome kind (maybe associated with ELC or Linaro Connect, or someother event), to discuss standards in the area I propose.

I welcome any response to these ideas. I plan to discuss themat the upcoming test framework mini-jamboree in Tokyo next week,and at Plumbers (particularly during the 'testing and fuzzing' session)the week following. But feel free to respond to this e-mail as well.

I welcome any response to these ideas.  I plan to discuss themat the upcoming test framework mini-jamboree in Tokyo next week,and at Plumbers (particularly during the 'testing and fuzzing' session)the week following.  But feel free to respond to this e-mail as well.

Thanks. -- Tim Bird}}}

Thanks. -- Tim Bird}}}

Ideas related to the vision [edit section]

= Ideas related to the vision =

Capturing tests easily [edit section]

== Capturing tests easily == * should be easy to capture a command line sequence, and test the results * maybe do an automated capture and format into a clitest file that   can be used at a here document inside a fuego test script?

test collateral [edit section]

== test collateral == * does it need to be board-specific * elements of test collateral:   * test dependencies:     * kernel config values needed     * kernel features needed:       * proc filesystem       * sys filesystem       * trace filesystem     * test hardware needed     * test node setup features       * ability to reboot the board       * ability to soft-reset the board       * ability to install a new kernel     * presence of certain programs on target       * bc       * top, ps, /bin/sh, bash? * already have:    * CAPABILITIES?    * pn and reference logs    * positive and negative result counts (specific to board)    * test specs indicate parameters for the test    * test plans indicate different profiles (method to match test to test environment - e.g. filesystem test with type of filesystem hardware)

test app store [edit section]

== test app store == * need a repository where tests can be downloaded   * like Jenkins plugin repository   * like debian package feed * need a client for browsing tests, installing tests, updating tests * store a test in github, and just refer to different tests in different git repositories? * test ratings * test metrics (how many bugs found)

authenticating tests [edit section]

== authenticating tests == * need to prevent malicious tests * packages should be signed by an authority, after review by someone   * who? the Fuego maintainers?  This would turn into a bottleneck

test system metrics [edit section]

Metrics)

== test system metrics == * number of bugs found and fixed in upstream software * number of bugs found and fixed in test system * bug categories (See [[Metrics]])