Is this thing on?
This post describes the status of the various pieces of B2G test automation.
We use a Jenkins instance to run continuous integration tests for B2G, using B2G emulators. Unfortunately, this has been unable to run any tests for several weeks due to incompatibilities between the emulator and the headless Amazon AWS linux VM’s we have been running the CI on, which have arisen due to the work on hardware acceleration in B2G. Michael Wu has identified a new VM configuration which does work (Ubuntu 12.04 + Xorg + xorg-video-dummy), and I’m busy switching our CI over to new VM’s of this configuration. The WebAPI tests are already running again, and the rest will be soon.
As soon as tests are rolling again normally, those of us most closely involved in B2G test automation (myself, Malini Das, Andrew Halberstadt, and Geo Mealer) will institute some informal sheriffing on Autolog (a TBPL look-alike) to help keep track of test failures. If you’d like to help with this effort, let me know.
Our B2G test automation has gone down for weeks at a time on several occasions over the past few months. Typically this has one of two causes:
Mochitest: will be running soon. We’re currently only running the subset of tests that used to be run by Fennec. We know we want to run all of them, but running all of them results in so many timeouts that the harness aborts. We’ll need to spend some time triaging these. We also know we want to change the way we run mochitests so that we can run them out-of-process: bug 777871.
XPCShell tests: running locally with good results, thanks to Mihnea Balaur, an A-Team intern. We will add them to the CI after mochitests.
Reftests: Andrew Halberstadt has these running locally and is working to understand test failures (bug 773842). He will get them running on a daily basis on a linux desktop with an Nvidia GPU, reporting to the same Autolog instance used by our Jenkins CI. If we need more frequent coverage and running them on the Amazon VM’s would provide useful data, we can do that. The reftest runner also needs to be modified so that it runs tests out-of-process: bug 778072.
Eideticker: Malini Das is working to adapt William Lachance’s Eideticker harness to B2G. This will be used to generate frame-rate data for inter- and intra-app transitions. The testing will be performed on panda boards. See bug 769167.
Other performance tests: There are no plans at this time to port talos to B2G. Malini has written a simple performance test using Marionette, which tracks the amount of time needed to launch each of the Gaia apps on an emulator. This has suffered from the same emulator problems described above, and needs to be moved to a new VM. This test currently reports to a new graphserver system called Datazilla, which isn’t in production yet. Once it goes live, we’ll be able to analyze the data and see whether the current test provides useful data, and what other tests would be useful to write.
Gaia integration tests: James Lal has recently added these. I’ll hook these up to CI soon.
The emulator is not an ideal test platform for several reasons, most notably poor performance and the fact that it doesn’t provide the real hardware environment that we care about. But actual phones are often not good automation targets either; they tend to suffer from problems relating to networking, power consumption, and rebooting that make them a nightmare to deal with in large-scale automation. Because of this, we’re going to target panda boards for test automation on real hardware. This is the same platform that will be used for Fennec automation, so we can leverage a lot of that team’s work.
There are several things needed for this to happen; see bug 777530. First, we need to get B2G panda builds in production using buildbot; we need to figure out how to flash B2G on pandas remotely; we need to adapt all the testrunners to work with the panda boards; and we need to write mozharness scripts for B2G unit tests, to allow them to work in rel-eng’s infrastructure.
For reftests, we also need to figure out “the resolution problem”: the fact that we can’t set the pandas to a resolution that would allow the reftest window to be exactly 800×1000, which is the resolution that test authors assume when writing reftests. Running reftests at other resolutions is possible, but we don’t know how many false passes we might be seeing, and analyzing the tests to try and determine this is laborious.
There are a lot of dependencies here, so I don’t have a very good ETA. But when this work is done, we will transition all of testing to pandas on rel-eng infrastructure, except for the WebAPI tests which have been written specifically for the emulator. This means the tests will show up on TBPL; they’ll be available on try; they will benefit from formal sheriffing. The emulator WebAPI tests will eventually be transitioned to rel-eng as well, if/when rel-eng starts making emulator builds.