JGriffin's Blog

Is this thing on?

Category Archives: Uncategorized

Mozilla A-Team: B2G Test Automation Update

This post describes the status of the various pieces of B2G test automation.

Jenkins Continuous Integration

We use a Jenkins instance to run continuous integration tests for B2G, using B2G emulators.  Unfortunately, this has been unable to run any tests for several weeks due to incompatibilities between the emulator and the headless Amazon AWS linux VM’s we have been running the CI on, which have arisen due to the work on hardware acceleration in B2G.  Michael Wu has identified a new VM configuration which does work (Ubuntu 12.04 + Xorg + xorg-video-dummy), and I’m busy switching our CI over to new VM’s of this configuration.  The WebAPI tests are already running again, and the rest will be soon.

As soon as tests are rolling again normally, those of us most closely involved in B2G test automation (myself, Malini Das, Andrew Halberstadt, and Geo Mealer) will institute some informal sheriffing on Autolog (a TBPL look-alike) to help keep track of test failures.  If you’d like to help with this effort, let me know.

Automation Stability

Our B2G test automation has gone down for weeks at a time on several occasions over the past few months.   Typically this has one of two causes:

  1. Changes to B2G which break the emulator.  These are identified fairly quickly, but can take a week or longer to resolve, as they require engineering resources that are busy with other things.  Now that B2G has reached “feature complete” stage, it may be that such breaking changes will be less frequent.  Usually, this kind of breakage prevents the emulator from launching successfully, rather than resulting in a build error.  To help identify these more quickly, I will write a simple “launch the emulator” test which gets performed after every build; if this test fails, it will automatically message the B2G mailing list.
  2. Changes to non-Marionette code in mozilla-central which break Marionette.  Typically these changes have occurred in the remote debugger, but we’ve also seen them with JS and browser code.  To address this, we’re working on getting Marionette unit tests in TBPL using desktop Firefox:  bug 770769.  Once these are live, changes which break Marionette will get caught by try or mozilla-inbound and won’t be allowed to propagate to mozilla-central where they end up breaking B2G CI.

Test Harness Status

WebAPI:  running again, 2 intermittent oranges: bug 760199 and bug 779217.

Mochitest:  will be running soon.  We’re currently only running the subset of tests that used to be run by Fennec.  We know we want to run all of them, but running all of them results in so many timeouts that the harness aborts.  We’ll need to spend some time triaging these.  We also know we want to change the way we run mochitests so that we can run them out-of-process: bug 777871.

XPCShell tests:  running locally with good results, thanks to Mihnea Balaur, an A-Team intern.  We will add them to the CI after mochitests.

Reftests:  Andrew Halberstadt has these running locally and is working to understand test failures (bug 773842).  He will get them running on a daily basis on a linux desktop with an Nvidia GPU, reporting to the same Autolog instance used by our Jenkins CI.  If we need more frequent coverage and running them on the Amazon VM’s would provide useful data, we can do that.  The reftest runner also needs to be modified so that it runs tests out-of-process: bug 778072.

Eideticker:  Malini Das is working to adapt William Lachance’s Eideticker harness to B2G.  This will be used to generate frame-rate data for inter- and intra-app transitions.  The testing will be performed on panda boards.  See bug 769167.

Other performance tests:  There are no plans at this time to port talos to B2G.  Malini has written a simple performance test using Marionette, which tracks the amount of time needed to launch each of the Gaia apps on an emulator.  This has suffered from the same emulator problems described above, and needs to be moved to a new VM.  This test currently reports to a new graphserver system called Datazilla, which isn’t in production yet.  Once it goes live, we’ll be able to analyze the data and see whether the current test provides useful data, and what other tests would be useful to write.

Gaia integration tests:  James Lal has recently added these.  I’ll hook these up to CI soon.

Panda Boards

The emulator is not an ideal test platform for several reasons, most notably poor performance and the fact that it doesn’t provide the real hardware environment that we care about.  But actual phones are often not good automation targets either; they tend to suffer from problems relating to networking, power consumption, and rebooting that make them a nightmare to deal with in large-scale automation.  Because of this, we’re going to target panda boards for test automation on real hardware.  This is the same platform that will be used for Fennec automation, so we can leverage a lot of that team’s work.

There are several things needed for this to happen; see bug 777530.  First, we need to get B2G panda builds in production using buildbot; we need to figure out how to flash B2G on pandas remotely; we need to adapt all the testrunners to work with the panda boards; and we need to write mozharness scripts for B2G unit tests, to allow them to work in rel-eng’s infrastructure.

For reftests, we also need to figure out “the resolution problem”:  the fact that we can’t set the pandas to a resolution that would allow the reftest window to be exactly 800×1000, which is the resolution that test authors assume when writing reftests.  Running reftests at other resolutions is possible, but we don’t know how many false passes we might be seeing, and analyzing the tests to try and determine this is laborious.

There are a lot of dependencies here, so I don’t have a very good ETA.  But when this work is done, we will transition all of testing to pandas on rel-eng infrastructure, except for the WebAPI tests which have been written specifically for the emulator.  This means the tests will show up on TBPL; they’ll be available on try; they will benefit from formal sheriffing. The emulator WebAPI tests will eventually be transitioned to rel-eng as well, if/when rel-eng starts making emulator builds.

B2G and WebAPI testing in Emulators

Malini Das and I have been working on a new test framework called Marionette, in which tests of a Gecko-based product (B2G, Fennec, etc.) are driven remotely, ala Selenium.  Marionette has client and server components; the server side is embedded inside Gecko, and the client side runs on a (possibly remote) host PC.  The two components communicate using a JSON protocol over a TCP socket.  The Marionette JSON protocol is based loosely on the Selenium JSON Wire Protocol; it defines a set of commands that the Marionette server inside Gecko knows how to execute.

This differs from past approaches to remote automation in that we don’t need any extra software (i.e., a SUTAgent) running on the device, we don’t need special access to the device via something like adb (although we do use adb to manage emulators), nor do tests need to be particularly browser-centric.  These differences seem advantageous when thinking about testing B2G.

The first use case to which we might apply Marionette in B2G seems to be WebAPI testing in emulators.  There are some WebAPI features that we can’t test well in an automated manner using either desktop builds or real devices, such as WebSMS.  But we can write automated tests for these using emulators, since we can manipulate the emulator’s hardware state and emulators know how to “talk” to each other for the purposes of SMS and telephony.

Since Marionette tests are driven from the client side, they’re written in Python.  This is what a WebSMS test in Marionette might look like:

from marionette import Marionette

if __name__ == '__main__':
    # launch the emulators that will do the sending and receiving
    sender = Marionette(emulator=True)

    receiver = Marionette(emulator=True)

    # setup the SMS event listener on the receiver
        var sms_body = "";
                                 function(m) { sms_body = m.body });


    # send the SMS event on the sender
    message = "hello world!"
    sender.execute_script("navigator.sms.send(%d, '%s');" %
        (receiver.emulator.port, message))

    # verify the message was received by the receiver
    assert(receiver.execute_script("return sms_body;") == message)

The JavaScript portions of the test could be split into a separate file from the Python, for easier editing and syntax highlighting.  Here’s the adjusted Python file:

from marionette import Marionette

if __name__ == '__main__':
    # launch the emulators that will do the sending and receiving and
    # load the JS scripts for each
    sender = Marionette(emulator=True)

    receiver = Marionette(emulator=True)

    # setup the SMS event listener on the receiver

    # send the SMS event on the sender
    message = "hello world!"
    target = receiver.emulator.port
    sender.execute_script_function("send_sms", [target, message])

    # verify the message was received by the receiver
    assert(receiver.execute_script_function("get_sms_body") == message)

And here’s the JavaScript file:

function send_sms(target, msg) {
    navigator.sms.send(target, msg);

var sms_body = "";

function setup_sms_listener() {
                            function(m) { sms_body = m.body });

function get_sms_body() {
    return sms_body;

Both of these options are just about usable in Marionette right now.  Note that the test is driven, and some of the test logic (like asserts) resides on the client side, in Python.  This makes synchronization between multiple emulators straightforward, and provides a natural fit for Python libraries that will be used to interact with the emulator’s battery and other hardware.

What if we wanted JavaScript-only WebAPI tests in emulators, without any Python?  Driving a multiple-emulator test from JavaScript running in Gecko introduces some complications, chief among them the necessity of sharing state between the tests, the emulators, and the Python testrunner, all from within the context of the JavaScript test.  We can imagine such a test might look like this:

var message = "hello world!";
var device_number = Marionette.get_device_number(Marionette.THIS_DEVICE);

if (device_number == 1) {
  // we're being run in the "sender"

  // wait for the test in the other emulator to be in a ready state
  Marionette.wait_for_state(Marionette.THAT_DEVICE, Marionette.STATE_READY);

  // send the SMS
  navigator.sms.send(Marionette.get_device_port(Marionette.THAT_DEVICE), message);
else {
  // we're being run in the "receiver"

  // notify Marionette that this test is asynchronous

  // setup the event listener
                          function (m) { 
                                         // perform the test assertion and notify Marionette 
                                         // that the test is finished
                                         is(m.body, message, "Wrong message body received"); 

  // notify Marionette we're in a ready state

Conceptually, this is more similar to xpcshell tests, but implementing support for this kind of test in Marionette (or inside the existing xpcshell harness) would require substantial additional work. As it currently exists, Marionette is designed with a client-server architecture, in which information flows from the client (the Python part) to the server (inside Gecko) using TCP requests, and then back. Implementing the above JS-only test syntax would require us to implement the approximate reverse, in which requests could be initiated at will from within the JS part of the test, and this would require non-trivial changes to Marionette in several different areas, as well as requiring new code to handle the threading and synchronization that would be required.

Do you think the Python/JS hybrid tests will be sufficient for WebAPI testing in emulators?

WebGL Conformance Tests now in GrafxBot

GrafxBot has been updated to include the mochitest version of the WebGL Conformance Tests.  When you run GrafxBot tests using the new version, it will run the usual reftests first, followed by the new WebGL tests.  Both sets of test results are posted to the database at the end of test.

The WebGL tests may be skipped for a couple of reasons:  they’ll be skipped if you have a Mac running less than 10.6, or if WebGL isn’t enabled in Firefox on your machine, which could happen if you don’t have supported hardware or drivers.  GrafxBot doesn’t try to force-enable either WebGL or accelerated layers.

Partially to support these tests, GrafxBot now reports some additional details about Firefox’s acceleration status, similar to what you see in about:support:

webgl results 132 pass / 7 fail
webgl renderer Google Inc. — ANGLE — OpenGL ES 2.0 (ANGLE
acceleration mode 2/2 Direct3D 10
d2d enabled true
directwrite enabled true: 6.1.7600.20830, font cache n/a

I encourage users to download and run the new version; I’d like to get some feedback before I update it on AMO, to make sure users aren’t running into problems with the new tests.

The new version of GrafxBot can be downloaded here.

Latest Tinderbox Build URL’s

The automation tools team creates a variety of automation tools that test a wide range of things.  There are times when these tools need to locate the latest tinderbox build for a given platform, in order to test against.  In the past, this task involved spidering along the FTP site that is home to tinderbox builds.

Now, however, there’s a much easier way:  a web service which returns a JSON document that always contains the latest tinderbox build url’s for all platforms.  This is made possible by Christian Legnitto’s awesome Mozilla Pulse, which sends messages to consumers when certain buildbot events (among other things) occur.  I’ve written a Python library, pulsebuildmonitor, which makes it even easier to act as a consumer for these messages, and layered a small web service on top of that.

The result is http://brasstacks.mozilla.com/latestbuilds/README, or get the actual JSON at http://brasstacks.mozilla.com/latestbuilds/.

Currently this only works for mozilla-central, but I could easily extend it to other trees if needed.

ProfileManager icons requested!

The Profile Manager which has been bundled with Firefox from time immemorial is going to be removed from Firefox builds soon after Firefox 4 ships; see bug 214675.  Firefox will still support multiple profiles, it just won’t have a built-in UI for managing them.

Instead, a few of us on the Mozilla Automation Tools team have been busy building a standalone replacement.  This will be available as a separate download, and will include a lot of cool features not available in the current incarnation of Profile Manager, like the capability to backup and restore profiles.  For background, see bug 539524, and this wiki page.

There are builds available to play with, but exercise caution, as these builds are beta quality, and it’s possible there may be bugs therein which would cause profile corruption or other problems.  If you do decide to play with it, you may want to backup your profiles first.

Currently, the icon for the new Profile Manager is the default xulrunner icon:  

This doesn’t seem very interesting for a new Profile Manager, and I lack even rudimentary graphics skills, so I’d like to request help!  If you have some graphics experience and would like to contribute to a cool new Mozilla tool, please submit an icon you think would be awesome as an attachment to bug 605576.  Icons should be in PNG format, preferably 48×48 or 64×64, and should be freely distributable.  The creator of the icon that is selected will be mentioned in Profile Manager’s about box, and will have the satisfaction of knowing that their icon is seen every time the new Profile Manager is used.

GrafxBot Results Update

Thanks to the many thousands of you who have downloaded GrafxBot and submitted test results to Mozilla!  In case you’re curious about what we’ve done with all that data, here are some statistics:

  • According to AMO, GrafxBot has been downloaded about 4700 times.
  • We have nearly 30,000 sets of test results that were submitted to our database, for a total of 3.7 million tests.
  • The test results span 282 unique video cards on Windows, 30 on Mac, and 171 on Linux.
  • The failure rate (which is somewhat subjective given the manual pass/fail mechanism) averages around 0.4% on Windows, 0.5% on Mac, and 1.1% on Linux.
  • Test data has resulted in a total of 37 bugs being filed.

Aside from the raw test results, many of you have submitted useful comments.  Some have noted that fonts look bad when Firefox is accelerated; others have described scrolling or other issues.  Not all of these problems can be detected by GrafxBot, so if you notice problems like these when browsing, I encourage you to file a bug report in Bugzilla, under Core -> Graphics.  If you submit a bug report, please include the details of your graphics hardware, and include a screenshot if possible.

GrafxBot continues to be updated along with Firefox betas, so I encourage interested folks to continue running GrafxBot each beta release.  Thanks for all your help in making Firefox 4 the fastest ever!