JGriffin's Blog

Is this thing on?

A-Team Update, July 29, 2015


Treeherder: We’ve added to mozlog the ability to create error summaries which will be used as the basis for automatic starring.  The Treeherder team is working on implementing database changes which will make it easier to add support for that.  On the front end, there’s now a “What’s Deployed” link in the footer of the help page, to make it easier to see what commits have been applied to staging and production.  Job details are now shown in the Logviewer, and a mockup has been created of additional Logviewer enhancements; see bug 1183872.

MozReview and Autoland: Work continues to allow autoland to work on inbound; MozReview has been changed to carry forward r+ on revised commits.

Bugzilla: The ability to search attachments by content has been turned off; BMO documentation has been started at https://bmo.readthedocs.org.

Perfherder/Performance Testing: We’re working towards landing Talos in-tree.  A new Talos test measuring tab-switching performance has been created (TPS, or Talos Page Switch); e10s Talos has been enabled on all platforms for PGO builds on mozilla-central.  Some usability improvements have been made to Perfherder – https://treeherder.mozilla.org/perf.html#/graphs.

TaskCluster: Successful OSX cross-compilation has been achieved; working on the ability to trigger these on Try and sorting out details related to packaging and symbols.  Work on porting Linux tests to TaskCluster is blocked due to problems with the builds.

Marionette: The Marionette-WebDriver proxy now works on Windows.  Documentation on using this has been added at https://developer.mozilla.org/en-US/docs/Mozilla/QA/Marionette/WebDriver.

Developer Workflow: A kill_and_get_minidump method has been added to mozcrash, which allows us to get stack traces out of Windows mochitests in more situations, particularly plugin hangs.  Linux xpcshell debug tests have been split into two chunks in buildbot in order to reduce E2E times, and chunks of mochitest-browser-chrome and mochitest-devtools-chrome have been re-normalized by runtime across all platforms.  Now that mozharness lives in the tree, we’re planning on removing the “in-tree configs”, and consolidating them with the previously out-of-tree mozharness configs (bug 1181261).

Tools: We’re testing an auto-backfill tool which will automatically retrigger coalesced jobs in Treeherder that precede a failing job.  The goal is to reduce the turnaround time required for this currently manual process, which should in turn reduce tree closure times related to test failures

The Details


Treeherder/Automatic Starring

  • We’re generating error summaries now that will serve as the basis for automatic starring work.

Treeherder/Front End

  • New “What’s Deployed” feature in Help footer to view stage/prod deployment status
  • Logviewer now contains the full ‘Job Info’ aka. tinderbox printlines (bug 1092209)
  • Created a mock of logviewer UI changes (bug 1183872)

Perfherder/Performance Testing

  • Working towards moving Talos code in-tree (bug 787200)
  • New Talos test TPS (Talos Page Switch) (bug 1166132)
  • Fixed a few data ingestion/duplication cases.
  • Adjusting calculation of suite summaries to match graph server, not finished yet (tracking: bug 1184968)
  • e10s on all platforms, only runs on mozilla-central for pgo builds, broken tests, big regressions are tracked in bug 1144120
  • perfherder is easier to use, some polish on test selection and the  compare view, and most importantly we have found a few odd bugs that has  caused duplicate data to show up, check it out: https://treeherder.mozilla.org/perf.html#/graphs
  • Starting the work of moving Android Talos to Autophone (bug 1170685)


  • bug 1184079 – Fix for autopublishing when authenticating to MozReview via BMO cookies
  • bug 1178025 – Commits table looks nicer
  • bug 1175166 – r+ is now carried forward on commits from level 3 authors

TaskCluster Support

Mobile Automation

  • Continued work on porting android talos tests to autophone, remaining work is to figure out posting results and ensuring it runs on a regular basis and reliable.
  • Support for the Android stock browser and Dolphin has been added to mozbench (bug 1103134)

Dev Workflow

  • Created patch that replaces mach’s logger with mozlog. Still several rough edges and perf issues to iron out

Media Automation

  • The new MSE rewrite is now enabled by default on Nightly and we’re replacing a few tests in response: bug 1186943 – detection of video stalls has to repond to new internal strings from new MSE implementation by :jya.
  • firefox-media-tests mozharness log is now parsed into steps for Treeherder’s Log Viewer
  • Fixed a problem with automation scripts for WebRTC tests for Windows 64.

General Automation

  • Moved mozlog.structured to top-level mozlog, and released mozlog 3.0
  • Added a kill_and_get_minidump method to mozcrash (bug 890026). As a result we’re getting minidumps out of Windows mochitests under more circumstances (in particular, plugin hangs in certain intermittently failing tests).
  • The MozillaPulse consumer now supports listening to multiple exchanges simultaneously (bug 1180897).
  • Bug 1186420 – Autophone – update requirements and deploy thclient 1.6
  • Bughunter moved to SCL3 without interruption
  • Bug 1185498 – Sisyphus – Bughunter – consume urls directly from Socorro
  • linux debug xpcshell was split into two chunks to reduce E2E times (bug 1185499)
  • runtimes for mochitest-browser-chrome and mochitest-devtools have been renormalized across all platforms
  • Allow Firefox UI tests to determine where to get Firefox crash symbols for releases and improve reproducibility
  • Testing auto-backfill in production (bug 1180732)
  • Now that mozharness lives in the tree, we’re going to remove the “in-tree configs”, which will consolidate mozharness options and make maintenance simpler (bug 1181261)


  • ActiveData requires monitoring on all nodes before it can be left alone for more than a day without it failing:
    • Made  fork of Supervisor to run simple Cron jobs – the biggest task was  finding and installing (and compiling!) the C libraries used
    • Added  Supervisor to spot instances to monitor ES; not just the process, but  query response time.  Also monitoring the indexing jobs.
  • Replicated OrangeFactor to ActiveData so masters student (and the public) we can query it, or extract it.


  • Landed Proxy support via capabilities
  • Updating cookie support to return httpOnly flag
  • Added a –version arg to Marionette (bug 1183157)
  • Landing support for W3C Compatible Drivers in Selenium Tree and released 2.46.1 so users can use it.
  • Wrote a small guide to use it https://developer.mozilla.org/en-US/docs/Mozilla/QA/Marionette/WebDriver
  • Marionette<->WebDriver Proxy now works on Windows, Linux and OSX as of 0.3.0

Automation and Tools Team Update, July 16 2015

The Automation and Tools Team (the A-Team, for short) is a large team that oversees a diverse set of services, tools and test harnesses used by nearly everyone at Mozilla.  We’re borrowing a page from Release Engineering and publishing a series of updates to inform people about what we’re up to, in the hopes of fostering better visibility and inter-team coordination.


Treeherder and Automatic Starring: Our focus for Treeherder in Q3 will be improving the signal-to-noise ratio for dealing with intermittent oranges. An overall design has been agreed to for the “automatic starring” project, and work has begun; final rollout is likely in Q4. This quarter, we’ll also stop spamming Bugzilla with comments for each intermittent, but we will put in place an alternate notification system for people who rely on Bugzilla orange comments to determine when an intermittent needs attention. We’ve also agreed on a redesign for the Logviewer that should result in a more useful and intuitive interface.
MozReview and Autoland:  MozReview now offers to publish review requests when you push, so it isn’t necessary to visit the MozReview’s UI. Work has started on adding support for autoland-to-inbound, which will allow developers to push changes to inbound directly from MozReview… no more battling tree closures!
Performance: Work continues on Perfherder’s “comparison mode”, a view that compares Talos performance data between two revisions. See wlach’s blog post for more details.
TaskCluster Support: We’re helping Release Engineering migrate from Buildbot to TaskCluster; this quarter we’re standing up Linux tests in TaskCluster and getting OS X cross-compilation to work so that we can move those builds to the cloud.
BMO now has tests running in continuous integration using TaskCluster and reporting to Treeherder.
Mobile Automation: mochitest-chrome for Android is now live! Work is also underway to enable debug reftests on Android emulators, and significant reliability improvements have been landed in Autophone.
Desktop Automation: Work is in progress to get Thread Sanitizer (TSan) builds running on try and to split gTest into its own test chunk. We’re also working towards applying –run-by-dir to mochitest-plain, in order to improve isolation and enable smarter chunking in CI.
Developer Workflow: We’re adding test-selection flexibility to the reftest harness as a prelude to making ‘mach try’ work with more test types.

The Details

Treeherder/Automatic Starring
  • Work has started on backend work needed to support automatic starring, including db simplification, and db unification (so each tree doesn’t have its own database).  Bug 1179263 tracks this work.  As a side effect of this work, Treeherder code should become less complex and easier to maintain.
  • Work has started on identifying what needs to happen in order to turn off Bugzilla comments for intermittents, and to create an alternative notification mechanism instead.  Bug 1179310 tracks this work.
Treeherder/Front End
  • New shortcuts for Logviewer, Delete Classification plus improved classification save
  • Design work is in progress for collapsing chunks in Treeherder in order to reduce visual noise in bug 1163064
Perfherder/Performance Testing
  • Evaluating alerts generated from PerfHerder
  • Improvements to compare chooser and viewer inside of PerfHerder
  • Work towards building a new tab switching test (bug 1166132)
  • Automatic publishing of reviews upon pushing
  • Known bug: people using cookie auth may experience bug 1181886
  • Better error message when MozReview’s Bugzilla session has expired (bug 1178811)
  • Pruned user database to improve user searching (bug 1171274)
  • Work is progressing on autoland-to-inbound (bug 1128039)
TaskCluster Support
  • Ability to schedule Linux64 tests on try (tests not running yet due to a couple blockers) – bug 1171033
  • Working on OSX cross-compilation, which will allow us to move OSX builds to the cloud; this will make OSX builds much faster in CI.
Mobile Automation
  • Autophone detects USB lock-ups and gracefully restarts. This is a huge improvement in system reliability.
  • Continued work on getting Android Talos tests ported to Autophone (bug 1170685)
  • Updated manifests and mozharness configs for mochitest-chrome (bug 1026290)
  • Determined total-chunks requirements for Android 4.3 Debug reftests (bug 1140471)
  • Re-wrote robocop harness to significantly improve run-time efficiency (bug 1179981)
Dev Workflow
  • Helped RelEng resolve some problems that were preventing them from landing mozharness in the tree.  This opens the door to a lot of future dev workflow improvements, including better unification of the ways we run automated tests in continuous integration and locally.  We’ve wanted this for years and it’s great to see it finally happen.
  • Did some work on top of jgraham’s patch to make mach use mozlog structured logging
Platform QA
  • We had to respond to the breakup of .tests.zip into several files to keep our Jenkins instance running.
  • Getting firefox-media-tests to satisfy Tier-2 Treeherder visibility requirements involves changing how Treeherder accommodates non-buildbot jobs (e.g bug 1182299)
General Automation
  • Working on running multiple tests/manifests through reftests harness as a prelude for supporting |mach try| for more test types.
  • Created patch to move mozlog.structured to the top level package (and what was previously there to mozlog.unstructured)
  • Figured out the series of steps needed to produce a usable Thread Sanitizer enabled linux build on our infra
  • Separating out gTest into a separate job in CI – bug 1179955.
  • More memory optimizations (motivation: releng query for Chris Atlee:  query slow tests)
    • run staging environment as stability test for production
    • change etl procedure so pushing changes to prod are easier (moving toward standard procedure)
  • import treeherder data markup to active data (motivation: characterizing test failures
    • ateam query: summary of test failures, stars and resolutions (bug 1161268bug 1172048)
    • subtests are too large for download of more than one day – working on code to only pull what’s required


Mozilla A-Team: B2G Test Automation Update

This post describes the status of the various pieces of B2G test automation.

Jenkins Continuous Integration

We use a Jenkins instance to run continuous integration tests for B2G, using B2G emulators.  Unfortunately, this has been unable to run any tests for several weeks due to incompatibilities between the emulator and the headless Amazon AWS linux VM’s we have been running the CI on, which have arisen due to the work on hardware acceleration in B2G.  Michael Wu has identified a new VM configuration which does work (Ubuntu 12.04 + Xorg + xorg-video-dummy), and I’m busy switching our CI over to new VM’s of this configuration.  The WebAPI tests are already running again, and the rest will be soon.

As soon as tests are rolling again normally, those of us most closely involved in B2G test automation (myself, Malini Das, Andrew Halberstadt, and Geo Mealer) will institute some informal sheriffing on Autolog (a TBPL look-alike) to help keep track of test failures.  If you’d like to help with this effort, let me know.

Automation Stability

Our B2G test automation has gone down for weeks at a time on several occasions over the past few months.   Typically this has one of two causes:

  1. Changes to B2G which break the emulator.  These are identified fairly quickly, but can take a week or longer to resolve, as they require engineering resources that are busy with other things.  Now that B2G has reached “feature complete” stage, it may be that such breaking changes will be less frequent.  Usually, this kind of breakage prevents the emulator from launching successfully, rather than resulting in a build error.  To help identify these more quickly, I will write a simple “launch the emulator” test which gets performed after every build; if this test fails, it will automatically message the B2G mailing list.
  2. Changes to non-Marionette code in mozilla-central which break Marionette.  Typically these changes have occurred in the remote debugger, but we’ve also seen them with JS and browser code.  To address this, we’re working on getting Marionette unit tests in TBPL using desktop Firefox:  bug 770769.  Once these are live, changes which break Marionette will get caught by try or mozilla-inbound and won’t be allowed to propagate to mozilla-central where they end up breaking B2G CI.

Test Harness Status

WebAPI:  running again, 2 intermittent oranges: bug 760199 and bug 779217.

Mochitest:  will be running soon.  We’re currently only running the subset of tests that used to be run by Fennec.  We know we want to run all of them, but running all of them results in so many timeouts that the harness aborts.  We’ll need to spend some time triaging these.  We also know we want to change the way we run mochitests so that we can run them out-of-process: bug 777871.

XPCShell tests:  running locally with good results, thanks to Mihnea Balaur, an A-Team intern.  We will add them to the CI after mochitests.

Reftests:  Andrew Halberstadt has these running locally and is working to understand test failures (bug 773842).  He will get them running on a daily basis on a linux desktop with an Nvidia GPU, reporting to the same Autolog instance used by our Jenkins CI.  If we need more frequent coverage and running them on the Amazon VM’s would provide useful data, we can do that.  The reftest runner also needs to be modified so that it runs tests out-of-process: bug 778072.

Eideticker:  Malini Das is working to adapt William Lachance’s Eideticker harness to B2G.  This will be used to generate frame-rate data for inter- and intra-app transitions.  The testing will be performed on panda boards.  See bug 769167.

Other performance tests:  There are no plans at this time to port talos to B2G.  Malini has written a simple performance test using Marionette, which tracks the amount of time needed to launch each of the Gaia apps on an emulator.  This has suffered from the same emulator problems described above, and needs to be moved to a new VM.  This test currently reports to a new graphserver system called Datazilla, which isn’t in production yet.  Once it goes live, we’ll be able to analyze the data and see whether the current test provides useful data, and what other tests would be useful to write.

Gaia integration tests:  James Lal has recently added these.  I’ll hook these up to CI soon.

Panda Boards

The emulator is not an ideal test platform for several reasons, most notably poor performance and the fact that it doesn’t provide the real hardware environment that we care about.  But actual phones are often not good automation targets either; they tend to suffer from problems relating to networking, power consumption, and rebooting that make them a nightmare to deal with in large-scale automation.  Because of this, we’re going to target panda boards for test automation on real hardware.  This is the same platform that will be used for Fennec automation, so we can leverage a lot of that team’s work.

There are several things needed for this to happen; see bug 777530.  First, we need to get B2G panda builds in production using buildbot; we need to figure out how to flash B2G on pandas remotely; we need to adapt all the testrunners to work with the panda boards; and we need to write mozharness scripts for B2G unit tests, to allow them to work in rel-eng’s infrastructure.

For reftests, we also need to figure out “the resolution problem”:  the fact that we can’t set the pandas to a resolution that would allow the reftest window to be exactly 800×1000, which is the resolution that test authors assume when writing reftests.  Running reftests at other resolutions is possible, but we don’t know how many false passes we might be seeing, and analyzing the tests to try and determine this is laborious.

There are a lot of dependencies here, so I don’t have a very good ETA.  But when this work is done, we will transition all of testing to pandas on rel-eng infrastructure, except for the WebAPI tests which have been written specifically for the emulator.  This means the tests will show up on TBPL; they’ll be available on try; they will benefit from formal sheriffing. The emulator WebAPI tests will eventually be transitioned to rel-eng as well, if/when rel-eng starts making emulator builds.

Writing WebAPI tests for B2G using Marionette

At Mozilla, we have many different testing frameworks, each of which fills a different niche (although there is definitely some degree of overlap among them). For testing WebAPIs in B2G, some of these existing frameworks can be utilized, depending on the API. For example, mozSettings and mozContacts can be tested using mochitests, since there isn’t much, if anything, that’s device-specific to them. (We’re not currently running mochitests on B2G devices, but will be soon.)

But there are many other WebAPIs which are not testable using any of our standard frameworks, because tests for them need to interact with hardware in interesting ways, and most of our frameworks are designed to operate entirely within a gecko context, and thus have no ability to directly access hardware.

Malini Das and I have been working on a new framework called Marionette which can help. Marionette is a remote test driver, so it can remotely execute test steps within a gecko process while retaining the ability to interact with the outside world, including devices running B2G. When this is combined with the B2G emulator’s ability to query and set hardware state, we have a solution for testing a number of WebAPIs that would be difficult or impossible to test otherwise.

To illustrate how this works, I’m going to walk through the entire process of writing WebAPI tests for mozBattery and mozTelephony, to be run on B2G emulators. We already have such tests running in continuous integration, reporting to autolog. If developers add new Marionette WebAPI tests, they will be run and reported here as well. Eventually, they will likely be migrated over to TBPL.

Building the emulator

These tests will be run on the emulator, so you’ll have to build the B2G Ice Cream Sandwich emulator first, if you don’t have one already.  You’ll need to do this on linux, preferably Ubuntu.  Make sure to install the build prerequisites before you begin, if you haven’t built B2G before.

git clone https://github.com/andreasgal/B2G
cd B2G
make sync (get a cup of coffee, this takes quite a while)
make config-qemu-ics (get another cup of coffee)
make gonk (get another drink, but I think you've had enough coffee by now)

You should now have an emulator, which can you launch using:


After you’ve verified the emulator is working, close it again.

Running a Marionette sanity test

Now we’ll run a single Marionette test to verify that everything is working as expected.   First, ensure that you have Python 2.7 on your system.  Then, install some prerequisites:

pip install (or easy_install) manifestdestiny
pip install (or easy_install) mozhttpd
pip install (or easy_install) mozprocess

Now, from the directory where you cloned the B2G repo:

cd gecko/testing/marionette/client/marionette
python runtests.py --emulator --homedir /path/to/B2G/repo \

If everything has gone well, you should see something like the following:

TEST-START test_simpletest_sanity.py
test_is (test_simpletest_sanity.SimpletestSanityTest) ... ok
test_isnot (test_simpletest_sanity.SimpletestSanityTest) ... ok
test_ok (test_simpletest_sanity.SimpletestSanityTest) ... ok

Ran 3 tests in 2.952s


passed: 3
failed: 0
todo: 0

Writing a battery test

The B2G emulator allows you to arbitrarily set the battery level and charging state, by telnetting into the emulator’s console port and issuing certain commands.  Marionette has an EmulatorBattery class which abstracts these operations, and allows you to interact with the emulator’s battery using a very simple API.

A simple example is given in the EmulatorBattery documentation on MDN.  Save this example to a file named test_battery_example.py, and run this command:

python runtests.py --emulator --homedir /path/to/B2G/repo /path/to/test_battery_example.py

Marionette should launch an emulator and run the test; when it’s done you should see:

TEST-START test_battery_example.py
test_level (test_battery_example.TestBatteryLevel) ... ok

Ran 1 test in 0.391s


passed: 1
failed: 0
todo: 0

How it works

This test, like all Marionette Python tests, is written using Python’s unittest framework, which provides the assert methods used in the test.  Other methods used by the test are provided by the Marionette and EmulatorBattery classes.

When the test executes this line:

self.marionette.emulator.battery.level = 0.25

the EmulatorBattery class telnets into the emulator and sets the battery’s level.  We then read the level back (which invokes another telnet command) to verify that the emulator’s battery state was updated as expected.  And finally, we execute a snippet of JavaScript inside gecko:

moz_level = self.marionette.execute_script("return navigator.mozBattery.level;")

and verify that it returns the same battery level as the emulator is reporting directly.

More tests with hardware interaction

In addition to battery interaction, the B2G emulator allows you to query and set the state of other properties normally set by hardware, like GPS location, network status, and various sensors.  Tests for all these could be written in a similar way.  It probably makes sense to make classes for these similar to EmulatorBattery which abstract the details of getting and setting the state of the underlying hardware.  I would encourage WebAPI developers to add as many WebAPI tests as possible; if you would like us to add convenience classes, please ping us on IRC (jgriffin and mdas, on #ateam or #b2g) or file a bug under Testing:Marionette.

Multi-emulator tests

There are some WebAPIs which cannot be completely tested using  a single device or emulator, like telephony and SMS.  Marionette can help with these too, as Marionette can be used to manipulate two emulator instances which are capable of communicating with each other.

In any tests run with the --emulator switch, Marionette launches an emulator before running the tests, and this emulator is associated with an instance of the Marionette class available to the test as self.marionette. Tests can invoke a second emulator instance using self.get_new_emulator(), and these emulator instances can call and text each other using their port numbers as their phone numbers.

To illustrate how this works, Malini has written an example test in which one emulator is used to dial another, and the caller’s number is verified on the receiver. See this example at https://developer.mozilla.org/en/Marionette/Marionette_Python_Tests/Emulator_Integrated_Tests#Manage_Multiple_Emulators.

If you save this example to test_dial_example.py and run the command:

python runtests.py --emulator --homedir /path/to/B2G/repo /path/to/test_dial_example.py

you should see Marionette launch one emulator, and then after it starts execution of the test, you should see a second emulator instance launch. After the test is done, you should see a successful report, similar to the one shown for the battery test.

We currently have a few tests for mozTelephony, but many more could be added, and new tests should be added for SMS/MMS as well.

Adding new tests to the B2G continuous integration

When new test are ready to be added to the CI, they should be checked into gecko under their dom component, e.g., dom/telephony/test/marionette. They should be added to the manifest.ini file in the same directory, and then for new manifest.ini files, the path to the .ini file should be added to the master manifest at http://mxr.mozilla.org/mozilla-central/source/testing/marionette/client/marionette/tests/unit-tests.ini. After this is done, it should be picked up by the B2G CI, after the gecko fork of B2G is updated, where it will be reported along with the other tests to autolog.

Caveats, provisos, and miscellanea

B2G builds go to sleep after 60 seconds of inactivity.  In the emulator, this “sleep” will completely lock up Marionette if it occurs while a test is running.  This is very inconvenient while testing.  See bug 739476. Until some better mechanism of handling this is available, I usually edit gecko/b2g/apps/b2g.js to increase the value of the power.screen.timeout pref before building, to prevent the emulator from going to sleep.

The current test failures in autolog are being tracked as bug 751403 and bug 751406.

Network access in the emulator currently doesn’t seem to work (see https://github.com/andreasgal/B2G/issues/287).  This prevents some parts of Gaia from working correctly but doesn’t interfere with the above style of WebAPI tests, none of which rely on Gaia or network access.

Building the emulator is very time-consuming, mostly due to the time required to sync all the various repos needed by B2G.  We hope to be able to post emulator builds for download soon, after a few details are worked out.

More reading

What is Marionette

Marionette Python tests

Marionette Emulator tests

the Marionette class

the Emulator class

Please contribute tests

There are many WebAPIs which are less tested than they could be.  Please help us expand test coverage by contributing tests in areas similar to those described above.    If you need help, contact :jgriffin or :mdas on IRC, or file a bug under Testing:Marionette.

B2G and WebAPI testing in Emulators

Malini Das and I have been working on a new test framework called Marionette, in which tests of a Gecko-based product (B2G, Fennec, etc.) are driven remotely, ala Selenium.  Marionette has client and server components; the server side is embedded inside Gecko, and the client side runs on a (possibly remote) host PC.  The two components communicate using a JSON protocol over a TCP socket.  The Marionette JSON protocol is based loosely on the Selenium JSON Wire Protocol; it defines a set of commands that the Marionette server inside Gecko knows how to execute.

This differs from past approaches to remote automation in that we don’t need any extra software (i.e., a SUTAgent) running on the device, we don’t need special access to the device via something like adb (although we do use adb to manage emulators), nor do tests need to be particularly browser-centric.  These differences seem advantageous when thinking about testing B2G.

The first use case to which we might apply Marionette in B2G seems to be WebAPI testing in emulators.  There are some WebAPI features that we can’t test well in an automated manner using either desktop builds or real devices, such as WebSMS.  But we can write automated tests for these using emulators, since we can manipulate the emulator’s hardware state and emulators know how to “talk” to each other for the purposes of SMS and telephony.

Since Marionette tests are driven from the client side, they’re written in Python.  This is what a WebSMS test in Marionette might look like:

from marionette import Marionette

if __name__ == '__main__':
    # launch the emulators that will do the sending and receiving
    sender = Marionette(emulator=True)

    receiver = Marionette(emulator=True)

    # setup the SMS event listener on the receiver
        var sms_body = "";
                                 function(m) { sms_body = m.body });


    # send the SMS event on the sender
    message = "hello world!"
    sender.execute_script("navigator.sms.send(%d, '%s');" %
        (receiver.emulator.port, message))

    # verify the message was received by the receiver
    assert(receiver.execute_script("return sms_body;") == message)

The JavaScript portions of the test could be split into a separate file from the Python, for easier editing and syntax highlighting.  Here’s the adjusted Python file:

from marionette import Marionette

if __name__ == '__main__':
    # launch the emulators that will do the sending and receiving and
    # load the JS scripts for each
    sender = Marionette(emulator=True)

    receiver = Marionette(emulator=True)

    # setup the SMS event listener on the receiver

    # send the SMS event on the sender
    message = "hello world!"
    target = receiver.emulator.port
    sender.execute_script_function("send_sms", [target, message])

    # verify the message was received by the receiver
    assert(receiver.execute_script_function("get_sms_body") == message)

And here’s the JavaScript file:

function send_sms(target, msg) {
    navigator.sms.send(target, msg);

var sms_body = "";

function setup_sms_listener() {
                            function(m) { sms_body = m.body });

function get_sms_body() {
    return sms_body;

Both of these options are just about usable in Marionette right now.  Note that the test is driven, and some of the test logic (like asserts) resides on the client side, in Python.  This makes synchronization between multiple emulators straightforward, and provides a natural fit for Python libraries that will be used to interact with the emulator’s battery and other hardware.

What if we wanted JavaScript-only WebAPI tests in emulators, without any Python?  Driving a multiple-emulator test from JavaScript running in Gecko introduces some complications, chief among them the necessity of sharing state between the tests, the emulators, and the Python testrunner, all from within the context of the JavaScript test.  We can imagine such a test might look like this:

var message = "hello world!";
var device_number = Marionette.get_device_number(Marionette.THIS_DEVICE);

if (device_number == 1) {
  // we're being run in the "sender"

  // wait for the test in the other emulator to be in a ready state
  Marionette.wait_for_state(Marionette.THAT_DEVICE, Marionette.STATE_READY);

  // send the SMS
  navigator.sms.send(Marionette.get_device_port(Marionette.THAT_DEVICE), message);
else {
  // we're being run in the "receiver"

  // notify Marionette that this test is asynchronous

  // setup the event listener
                          function (m) { 
                                         // perform the test assertion and notify Marionette 
                                         // that the test is finished
                                         is(m.body, message, "Wrong message body received"); 

  // notify Marionette we're in a ready state

Conceptually, this is more similar to xpcshell tests, but implementing support for this kind of test in Marionette (or inside the existing xpcshell harness) would require substantial additional work. As it currently exists, Marionette is designed with a client-server architecture, in which information flows from the client (the Python part) to the server (inside Gecko) using TCP requests, and then back. Implementing the above JS-only test syntax would require us to implement the approximate reverse, in which requests could be initiated at will from within the JS part of the test, and this would require non-trivial changes to Marionette in several different areas, as well as requiring new code to handle the threading and synchronization that would be required.

Do you think the Python/JS hybrid tests will be sufficient for WebAPI testing in emulators?

OrangeFactor changes: Talos, bug correlations

Talos oranges now in OrangeFactor

When OrangeFactor (aka WOO) premiered, it did not include Talos oranges in its calculations.  There were many reasons for this, including the fact that Talos oranges were quite rare at the time.

As philor noted last week, that is no longer the case; there are now several frequent Talos oranges on the Android platform.  Because of this, I’ve just added Talos oranges into OrangeFactor.  The result is that the OrangeFactor has jumped from 4.01 (678 failures in 169 testruns) to 5.44 (921 failures in 169 testruns) on mozilla-central.

New bug correlations view

Mark Côté has recently implemented a new view in OrangeFactor, the bug correlations view.  This view shows bugs which occur together on the same commit.  We’ve already had a couple of suggestions for this page which we’re going to implement:  add bug summaries, and show the actual revision numbers for the correlations.  If anyone has other suggestions, please file a bug under Testing:Orange Factor.

Upcoming changes

Next up:  adding the ability to guess when an orange was introduced.  Stay tuned!

GoFaster: deeper data analysis

For the GoFaster project, releng and the A-team have been working on various tasks which we hope will result in getting the total commit to all-tests-done time down to 2 hours for the main branches (try excluded).   This total turnaround time was 6-8 hours a couple of months ago when we began this project.

We’ve recently made some improvements that seriously reduce the total machine time required to run all tests for a given commit.  These include hiding the mochitest results table, removing packed.js from mochitest, and streamlining individual slow tests (see bug 674738, bug 676412, and bug 670229).  These together have reduced the total machine time for test down from about 40 hours to around 25 hours per commit, a big win.

However, the total turnaround times are still much slower than our goal:

We already knew that PGO builds are slow, and jhford is working on turning on-demand builds into non-PGO builds, and make PGO builds every four hours (bug 658313).  However, we needed a way to dig deeper into the data to see what our other pain points are.

Will Lachance made some awesome build charts which help us visualize what’s going on in these buildbot jobs.  Clicking any commit will show a chart that displays all the relevant buildbot jobs in relative clock time; this makes it easier to see where the bottlenecks are.

Build times

Display the build chart for just about any commit (e58e98a89827 for instance), and you’ll see the problem right away:  just about every commit includes builds that far exceed 2 hours.  These aren’t always opt builds, and they sometimes occur even on our ‘fast’ OS:  linux.  Check out 5d9989c3bff6, which has a linux64 opt build that takes 214 min, compared to the linux32 opt build that takes 61 minutes.  198c7de0699d has an OSX 10.5 debug build that takes 171 minutes, but the 10.6 debug build takes only 82 minutes.  Clearly, we can’t hit our 2-hour goal with builds that take 2+ hours.  What’s going on?

It’s necessary to spend a little time digging through build logs to find out.  It turns out there are multiple factors.

  1. We already know that PGO builds are slow, particularly on Windows.  Once bug 658313 lands, we expect the overall situation to improve dramatically.
  1. On some builds, the ‘update’ step includes a full ‘hg clone’ of mozilla-central, while others use ‘hg pull -u’.  Below is a graph of update times; the average time for an update that includes ‘hg clone’ is 12.9 min, for those that use ‘hg pull’ the average is 0.6 min.  Each full clone is costing us an average of 12 minutes.

  1. On some build slaves, we do a full build (with no obj dir from a previous build), on others we do an incremental build.   Below is a graph showing incremental vs full compile times for opt and debug builds.   On average, full compiles are taking 17 minutes longer than incremental ones.

  1. We have a mix of slow and fast slaves.  This can easily be seen in the below graph of linux compile times.  On linux and linux64 builds, full compiles with moz2-linux(64)-* slaves are slow (those > 75 min), while those made with linux(64)-ix-* slaves are fast (those < 75 min).  32-bit mac builds show a similar split, with those on moz2-darwin9* slaves slow, and those on bm-xserve* slaves fast.  Hardware doesn’t appear to create a significant difference for windows and 64-bit mac builds.

  1. On macosx64 machines, the ‘alive test’ step takes an average of 6 min (vs 1 min on other os’s).
  2. The ‘checking clobber times’ step often takes just a couple of seconds, however when this step actually results in some clobbering being done, it can take up to 21 minutes (average: 6 min).

When all these factors coincide, we can get builds (which include compile, update, and other steps) that exceed 4 hours.  This suggests doing away with on-demand PGO builds may not in itself get us to our 2-hour goal.

From this data, two of the more obvious ways to improve our build times might be:

  1. Investigate retiring slow linux and 32-bit mac build slaves.
  2. Investigate ways to reduce clobbering.  Clobbering itself takes time (see bullet #6 above), but also indirectly costs time through increased update and compile times.  Currently, about 51% of our builds are operating on clobbered slaves, requiring full hg clones and full compiles.  If this number could be reduced, we might see a significant reduction in our average turnaround times.

Test times

According to Will’s build charts, the E2E time for tests is often within our 30-minute target range.  The exception is mochitest-other on debug builds, which often takes from 60 to 90 minutes.  We could improve this situation somewhat by splitting mochitest-browser-chrome (the longest-running chunk of mochitest-other) into its own test job.

Additionally, wait times for test slaves running android and win 7 tests is sometimes non-trivial; see e.g. the details for commit 97216ae0fc04.  We should try to understand why this happens; the graph of test wait times doesn’t show a clear trend, other than highlighting the fact that wait times for windows and android are usually worse than the other os’s.



GoFaster: hiding the mochitest results table

I’m sure anyone who has ever submitted a patch to a Mozilla tree is familiar with this drill:

  1. hg push
  2. check TBPL, wait
  3. check TBPL again, wait some more
  4. go to Starbucks for a caramel macchiato, install a new OS on your laptop, review all the patches in your queue, plan next winter’s tropical vacation, check TBPL, and….
  5. wait some more

Recently, the total end-to-end time from submit to all-tests-done has been around 6-8 hours, depending on load.  That’s too long, and RelEng and the A-Team think we can do something about it.  For the past couple of months we’ve been working on the GoFaster project; our goal is to get that turnaround time down to 2 hours.  We have a list of tasks, and recently one of these landed with some significant improvements.

Cameron McCormack wrote a patch which hides the mochitest results table when MOZ_HIDE_RESULTS_TABLE=1 (see: bug 479352).  The initial version of this patch caused frequent hangs during mochitest-1/5.  We didn’t discover the reason behind this, but  I updated the patch to hide the result table in a different way, and the hang vanished.  I pushed this change to mozilla-central, and Cameron made a table displaying before and after durations for all the test runs.

The results?  That one change saves about 13 hours of machine time per checkin.  The entire suite of unit tests which prior to that change took about 40 machine-hours to run now takes 27.  Wow!

What kind of improvement in the end-to-end time does that translate into?  I’m not sure.  Sam Liu, an A-Team intern, has been working on a dashboard to help track this, but it’s currently using canned (stale) data.  RelEng is working on exposing live data to be consumed by the dashboard, and when that’s ready we should be able to easily track the effect of changes like this in the overall time.

Meanwhile, check out the project’s wiki page or attend one of our meetings.  If you have thoughts on ways we can improve our total turnaround time, we’d love to hear from you.

WebGL Conformance Tests now in GrafxBot

GrafxBot has been updated to include the mochitest version of the WebGL Conformance Tests.  When you run GrafxBot tests using the new version, it will run the usual reftests first, followed by the new WebGL tests.  Both sets of test results are posted to the database at the end of test.

The WebGL tests may be skipped for a couple of reasons:  they’ll be skipped if you have a Mac running less than 10.6, or if WebGL isn’t enabled in Firefox on your machine, which could happen if you don’t have supported hardware or drivers.  GrafxBot doesn’t try to force-enable either WebGL or accelerated layers.

Partially to support these tests, GrafxBot now reports some additional details about Firefox’s acceleration status, similar to what you see in about:support:

webgl results 132 pass / 7 fail
webgl renderer Google Inc. — ANGLE — OpenGL ES 2.0 (ANGLE
acceleration mode 2/2 Direct3D 10
d2d enabled true
directwrite enabled true: 6.1.7600.20830, font cache n/a

I encourage users to download and run the new version; I’d like to get some feedback before I update it on AMO, to make sure users aren’t running into problems with the new tests.

The new version of GrafxBot can be downloaded here.

Latest Tinderbox Build URL’s

The automation tools team creates a variety of automation tools that test a wide range of things.  There are times when these tools need to locate the latest tinderbox build for a given platform, in order to test against.  In the past, this task involved spidering along the FTP site that is home to tinderbox builds.

Now, however, there’s a much easier way:  a web service which returns a JSON document that always contains the latest tinderbox build url’s for all platforms.  This is made possible by Christian Legnitto’s awesome Mozilla Pulse, which sends messages to consumers when certain buildbot events (among other things) occur.  I’ve written a Python library, pulsebuildmonitor, which makes it even easier to act as a consumer for these messages, and layered a small web service on top of that.

The result is http://brasstacks.mozilla.com/latestbuilds/README, or get the actual JSON at http://brasstacks.mozilla.com/latestbuilds/.

Currently this only works for mozilla-central, but I could easily extend it to other trees if needed.


Get every new post delivered to your Inbox.

Join 36 other followers