JGriffin's Blog

Is this thing on?

Category Archives: Mozilla

B2G and WebAPI testing in Emulators

Malini Das and I have been working on a new test framework called Marionette, in which tests of a Gecko-based product (B2G, Fennec, etc.) are driven remotely, ala Selenium.  Marionette has client and server components; the server side is embedded inside Gecko, and the client side runs on a (possibly remote) host PC.  The two components communicate using a JSON protocol over a TCP socket.  The Marionette JSON protocol is based loosely on the Selenium JSON Wire Protocol; it defines a set of commands that the Marionette server inside Gecko knows how to execute.

This differs from past approaches to remote automation in that we don’t need any extra software (i.e., a SUTAgent) running on the device, we don’t need special access to the device via something like adb (although we do use adb to manage emulators), nor do tests need to be particularly browser-centric.  These differences seem advantageous when thinking about testing B2G.

The first use case to which we might apply Marionette in B2G seems to be WebAPI testing in emulators.  There are some WebAPI features that we can’t test well in an automated manner using either desktop builds or real devices, such as WebSMS.  But we can write automated tests for these using emulators, since we can manipulate the emulator’s hardware state and emulators know how to “talk” to each other for the purposes of SMS and telephony.

Since Marionette tests are driven from the client side, they’re written in Python.  This is what a WebSMS test in Marionette might look like:

from marionette import Marionette

if __name__ == '__main__':
    # launch the emulators that will do the sending and receiving
    sender = Marionette(emulator=True)
    assert(sender.emulator.is_running)
    assert(sender.start_session())

    receiver = Marionette(emulator=True)
    assert(receiver.emulator.is_running)
    assert(receiver.start_session())

    # setup the SMS event listener on the receiver
    receiver.execute_script("""
        var sms_body = "";
        window.addEventListener("smsreceived",
                                 function(m) { sms_body = m.body });

    """)

    # send the SMS event on the sender
    message = "hello world!"
    sender.execute_script("navigator.sms.send(%d, '%s');" %
        (receiver.emulator.port, message))

    # verify the message was received by the receiver
    assert(receiver.execute_script("return sms_body;") == message)

The JavaScript portions of the test could be split into a separate file from the Python, for easier editing and syntax highlighting.  Here’s the adjusted Python file:

from marionette import Marionette

if __name__ == '__main__':
    # launch the emulators that will do the sending and receiving and
    # load the JS scripts for each
    sender = Marionette(emulator=True)
    assert(sender.emulator.is_running)
    assert(sender.start_session())
    assert(sender.load_script('test_sms.js'))

    receiver = Marionette(emulator=True)
    assert(receiver.emulator.is_running)
    assert(receiver.start_session())
    assert(receiver.load_script('test_sms.js'))

    # setup the SMS event listener on the receiver
    receiver.execute_script_function("setup_sms_listener")

    # send the SMS event on the sender
    message = "hello world!"
    target = receiver.emulator.port
    sender.execute_script_function("send_sms", [target, message])

    # verify the message was received by the receiver
    assert(receiver.execute_script_function("get_sms_body") == message)

And here’s the JavaScript file:

function send_sms(target, msg) {
    navigator.sms.send(target, msg);
}

var sms_body = "";

function setup_sms_listener() {
    window.addEventListener("smsreceived",
                            function(m) { sms_body = m.body });
}

function get_sms_body() {
    return sms_body;
}

Both of these options are just about usable in Marionette right now.  Note that the test is driven, and some of the test logic (like asserts) resides on the client side, in Python.  This makes synchronization between multiple emulators straightforward, and provides a natural fit for Python libraries that will be used to interact with the emulator’s battery and other hardware.

What if we wanted JavaScript-only WebAPI tests in emulators, without any Python?  Driving a multiple-emulator test from JavaScript running in Gecko introduces some complications, chief among them the necessity of sharing state between the tests, the emulators, and the Python testrunner, all from within the context of the JavaScript test.  We can imagine such a test might look like this:


var message = "hello world!";
var device_number = Marionette.get_device_number(Marionette.THIS_DEVICE);

if (device_number == 1) {
  // we're being run in the "sender"

  // wait for the test in the other emulator to be in a ready state
  Marionette.wait_for_state(Marionette.THAT_DEVICE, Marionette.STATE_READY);

  // send the SMS
  navigator.sms.send(Marionette.get_device_port(Marionette.THAT_DEVICE), message);
}
else {
  // we're being run in the "receiver"

  // notify Marionette that this test is asynchronous
  Marionette.test_pending();

  // setup the event listener
  window.addEventListener("smsreceived",
                          function (m) { 
                                         // perform the test assertion and notify Marionette 
                                         // that the test is finished
                                         is(m.body, message, "Wrong message body received"); 
                                         Marionette.test_finished();
                                       }
                         );

  // notify Marionette we're in a ready state
  Marionette.set_state(Marionette.STATE_READY);
}

Conceptually, this is more similar to xpcshell tests, but implementing support for this kind of test in Marionette (or inside the existing xpcshell harness) would require substantial additional work. As it currently exists, Marionette is designed with a client-server architecture, in which information flows from the client (the Python part) to the server (inside Gecko) using TCP requests, and then back. Implementing the above JS-only test syntax would require us to implement the approximate reverse, in which requests could be initiated at will from within the JS part of the test, and this would require non-trivial changes to Marionette in several different areas, as well as requiring new code to handle the threading and synchronization that would be required.

Do you think the Python/JS hybrid tests will be sufficient for WebAPI testing in emulators?

OrangeFactor changes: Talos, bug correlations

Talos oranges now in OrangeFactor

When OrangeFactor (aka WOO) premiered, it did not include Talos oranges in its calculations.  There were many reasons for this, including the fact that Talos oranges were quite rare at the time.

As philor noted last week, that is no longer the case; there are now several frequent Talos oranges on the Android platform.  Because of this, I’ve just added Talos oranges into OrangeFactor.  The result is that the OrangeFactor has jumped from 4.01 (678 failures in 169 testruns) to 5.44 (921 failures in 169 testruns) on mozilla-central.

New bug correlations view

Mark Côté has recently implemented a new view in OrangeFactor, the bug correlations view.  This view shows bugs which occur together on the same commit.  We’ve already had a couple of suggestions for this page which we’re going to implement:  add bug summaries, and show the actual revision numbers for the correlations.  If anyone has other suggestions, please file a bug under Testing:Orange Factor.

Upcoming changes

Next up:  adding the ability to guess when an orange was introduced.  Stay tuned!

GoFaster: deeper data analysis

For the GoFaster project, releng and the A-team have been working on various tasks which we hope will result in getting the total commit to all-tests-done time down to 2 hours for the main branches (try excluded).   This total turnaround time was 6-8 hours a couple of months ago when we began this project.

We’ve recently made some improvements that seriously reduce the total machine time required to run all tests for a given commit.  These include hiding the mochitest results table, removing packed.js from mochitest, and streamlining individual slow tests (see bug 674738, bug 676412, and bug 670229).  These together have reduced the total machine time for test down from about 40 hours to around 25 hours per commit, a big win.

However, the total turnaround times are still much slower than our goal:

We already knew that PGO builds are slow, and jhford is working on turning on-demand builds into non-PGO builds, and make PGO builds every four hours (bug 658313).  However, we needed a way to dig deeper into the data to see what our other pain points are.

Will Lachance made some awesome build charts which help us visualize what’s going on in these buildbot jobs.  Clicking any commit will show a chart that displays all the relevant buildbot jobs in relative clock time; this makes it easier to see where the bottlenecks are.

Build times

Display the build chart for just about any commit (e58e98a89827 for instance), and you’ll see the problem right away:  just about every commit includes builds that far exceed 2 hours.  These aren’t always opt builds, and they sometimes occur even on our ‘fast’ OS:  linux.  Check out 5d9989c3bff6, which has a linux64 opt build that takes 214 min, compared to the linux32 opt build that takes 61 minutes.  198c7de0699d has an OSX 10.5 debug build that takes 171 minutes, but the 10.6 debug build takes only 82 minutes.  Clearly, we can’t hit our 2-hour goal with builds that take 2+ hours.  What’s going on?

It’s necessary to spend a little time digging through build logs to find out.  It turns out there are multiple factors.

  1. We already know that PGO builds are slow, particularly on Windows.  Once bug 658313 lands, we expect the overall situation to improve dramatically.
  1. On some builds, the ‘update’ step includes a full ‘hg clone’ of mozilla-central, while others use ‘hg pull -u’.  Below is a graph of update times; the average time for an update that includes ‘hg clone’ is 12.9 min, for those that use ‘hg pull’ the average is 0.6 min.  Each full clone is costing us an average of 12 minutes.

  1. On some build slaves, we do a full build (with no obj dir from a previous build), on others we do an incremental build.   Below is a graph showing incremental vs full compile times for opt and debug builds.   On average, full compiles are taking 17 minutes longer than incremental ones.

  1. We have a mix of slow and fast slaves.  This can easily be seen in the below graph of linux compile times.  On linux and linux64 builds, full compiles with moz2-linux(64)-* slaves are slow (those > 75 min), while those made with linux(64)-ix-* slaves are fast (those < 75 min).  32-bit mac builds show a similar split, with those on moz2-darwin9* slaves slow, and those on bm-xserve* slaves fast.  Hardware doesn’t appear to create a significant difference for windows and 64-bit mac builds.

  1. On macosx64 machines, the ‘alive test’ step takes an average of 6 min (vs 1 min on other os’s).
  2. The ‘checking clobber times’ step often takes just a couple of seconds, however when this step actually results in some clobbering being done, it can take up to 21 minutes (average: 6 min).

When all these factors coincide, we can get builds (which include compile, update, and other steps) that exceed 4 hours.  This suggests doing away with on-demand PGO builds may not in itself get us to our 2-hour goal.

From this data, two of the more obvious ways to improve our build times might be:

  1. Investigate retiring slow linux and 32-bit mac build slaves.
  2. Investigate ways to reduce clobbering.  Clobbering itself takes time (see bullet #6 above), but also indirectly costs time through increased update and compile times.  Currently, about 51% of our builds are operating on clobbered slaves, requiring full hg clones and full compiles.  If this number could be reduced, we might see a significant reduction in our average turnaround times.

Test times

According to Will’s build charts, the E2E time for tests is often within our 30-minute target range.  The exception is mochitest-other on debug builds, which often takes from 60 to 90 minutes.  We could improve this situation somewhat by splitting mochitest-browser-chrome (the longest-running chunk of mochitest-other) into its own test job.

Additionally, wait times for test slaves running android and win 7 tests is sometimes non-trivial; see e.g. the details for commit 97216ae0fc04.  We should try to understand why this happens; the graph of test wait times doesn’t show a clear trend, other than highlighting the fact that wait times for windows and android are usually worse than the other os’s.

 

 

GoFaster: hiding the mochitest results table

I’m sure anyone who has ever submitted a patch to a Mozilla tree is familiar with this drill:

  1. hg push
  2. check TBPL, wait
  3. check TBPL again, wait some more
  4. go to Starbucks for a caramel macchiato, install a new OS on your laptop, review all the patches in your queue, plan next winter’s tropical vacation, check TBPL, and….
  5. wait some more

Recently, the total end-to-end time from submit to all-tests-done has been around 6-8 hours, depending on load.  That’s too long, and RelEng and the A-Team think we can do something about it.  For the past couple of months we’ve been working on the GoFaster project; our goal is to get that turnaround time down to 2 hours.  We have a list of tasks, and recently one of these landed with some significant improvements.

Cameron McCormack wrote a patch which hides the mochitest results table when MOZ_HIDE_RESULTS_TABLE=1 (see: bug 479352).  The initial version of this patch caused frequent hangs during mochitest-1/5.  We didn’t discover the reason behind this, but  I updated the patch to hide the result table in a different way, and the hang vanished.  I pushed this change to mozilla-central, and Cameron made a table displaying before and after durations for all the test runs.

The results?  That one change saves about 13 hours of machine time per checkin.  The entire suite of unit tests which prior to that change took about 40 machine-hours to run now takes 27.  Wow!

What kind of improvement in the end-to-end time does that translate into?  I’m not sure.  Sam Liu, an A-Team intern, has been working on a dashboard to help track this, but it’s currently using canned (stale) data.  RelEng is working on exposing live data to be consumed by the dashboard, and when that’s ready we should be able to easily track the effect of changes like this in the overall time.

Meanwhile, check out the project’s wiki page or attend one of our meetings.  If you have thoughts on ways we can improve our total turnaround time, we’d love to hear from you.

WebGL Conformance Tests now in GrafxBot

GrafxBot has been updated to include the mochitest version of the WebGL Conformance Tests.  When you run GrafxBot tests using the new version, it will run the usual reftests first, followed by the new WebGL tests.  Both sets of test results are posted to the database at the end of test.

The WebGL tests may be skipped for a couple of reasons:  they’ll be skipped if you have a Mac running less than 10.6, or if WebGL isn’t enabled in Firefox on your machine, which could happen if you don’t have supported hardware or drivers.  GrafxBot doesn’t try to force-enable either WebGL or accelerated layers.

Partially to support these tests, GrafxBot now reports some additional details about Firefox’s acceleration status, similar to what you see in about:support:

webgl results 132 pass / 7 fail
webgl renderer Google Inc. — ANGLE — OpenGL ES 2.0 (ANGLE 0.0.0.541)
acceleration mode 2/2 Direct3D 10
d2d enabled true
directwrite enabled true: 6.1.7600.20830, font cache n/a

I encourage users to download and run the new version; I’d like to get some feedback before I update it on AMO, to make sure users aren’t running into problems with the new tests.

The new version of GrafxBot can be downloaded here.

Latest Tinderbox Build URL’s

The automation tools team creates a variety of automation tools that test a wide range of things.  There are times when these tools need to locate the latest tinderbox build for a given platform, in order to test against.  In the past, this task involved spidering along the FTP site that is home to tinderbox builds.

Now, however, there’s a much easier way:  a web service which returns a JSON document that always contains the latest tinderbox build url’s for all platforms.  This is made possible by Christian Legnitto’s awesome Mozilla Pulse, which sends messages to consumers when certain buildbot events (among other things) occur.  I’ve written a Python library, pulsebuildmonitor, which makes it even easier to act as a consumer for these messages, and layered a small web service on top of that.

The result is http://brasstacks.mozilla.com/latestbuilds/README, or get the actual JSON at http://brasstacks.mozilla.com/latestbuilds/.

Currently this only works for mozilla-central, but I could easily extend it to other trees if needed.

ProfileManager 1.0_beta1

ProfileManager is a standalone app that can be used to manage profiles for Firefox and other xulrunner apps. The profile manager which is built into Firefox is going away after 4.0, so this new app will be the best choice for managing profiles in future Firefox versions, but it works great with 4.0 and earlier versions as well, not to mention Thunderbird.

Some of its features:

  • easy profile backup and restore
  • ability to save/restore profiles to zip archives (which makes it easy to move them between machines)
  • ability to manage multiple versions of Firefox, and associate profiles with specific Firefox versions
  • allows user to launch any profile with any version of Firefox installed on his system, as shown in the graphic below

You can download a build of the 1.0_beta1 version from ftp://ftp.mozilla.org/pub/utilities/profilemanager/1.0_beta1/.  Myself and the others who have worked on this (principally Jeffrey Hammel and Mark Côté) would love feedback; feel free to leave comments or file a bugzilla bug if you find problems.

Note: by default, ProfileManager works with Firefox profiles.  To use it with the profiles of a different xulrunner app, pass the name of the app to it as an argument, e.g., ‘profilemanager thunderbird’.

 

ProfileManager icons requested!

The Profile Manager which has been bundled with Firefox from time immemorial is going to be removed from Firefox builds soon after Firefox 4 ships; see bug 214675.  Firefox will still support multiple profiles, it just won’t have a built-in UI for managing them.

Instead, a few of us on the Mozilla Automation Tools team have been busy building a standalone replacement.  This will be available as a separate download, and will include a lot of cool features not available in the current incarnation of Profile Manager, like the capability to backup and restore profiles.  For background, see bug 539524, and this wiki page.

There are builds available to play with, but exercise caution, as these builds are beta quality, and it’s possible there may be bugs therein which would cause profile corruption or other problems.  If you do decide to play with it, you may want to backup your profiles first.

Currently, the icon for the new Profile Manager is the default xulrunner icon:  

This doesn’t seem very interesting for a new Profile Manager, and I lack even rudimentary graphics skills, so I’d like to request help!  If you have some graphics experience and would like to contribute to a cool new Mozilla tool, please submit an icon you think would be awesome as an attachment to bug 605576.  Icons should be in PNG format, preferably 48×48 or 64×64, and should be freely distributable.  The creator of the icon that is selected will be mentioned in Profile Manager’s about box, and will have the satisfaction of knowing that their icon is seen every time the new Profile Manager is used.

GrafxBot Results Update

Thanks to the many thousands of you who have downloaded GrafxBot and submitted test results to Mozilla!  In case you’re curious about what we’ve done with all that data, here are some statistics:

  • According to AMO, GrafxBot has been downloaded about 4700 times.
  • We have nearly 30,000 sets of test results that were submitted to our database, for a total of 3.7 million tests.
  • The test results span 282 unique video cards on Windows, 30 on Mac, and 171 on Linux.
  • The failure rate (which is somewhat subjective given the manual pass/fail mechanism) averages around 0.4% on Windows, 0.5% on Mac, and 1.1% on Linux.
  • Test data has resulted in a total of 37 bugs being filed.

Aside from the raw test results, many of you have submitted useful comments.  Some have noted that fonts look bad when Firefox is accelerated; others have described scrolling or other issues.  Not all of these problems can be detected by GrafxBot, so if you notice problems like these when browsing, I encourage you to file a bug report in Bugzilla, under Core -> Graphics.  If you submit a bug report, please include the details of your graphics hardware, and include a screenshot if possible.

GrafxBot continues to be updated along with Firefox betas, so I encourage interested folks to continue running GrafxBot each beta release.  Thanks for all your help in making Firefox 4 the fastest ever!

Introducting Grafx Bot

One of the new features of Firefox 4 is graphics hardware acceleration.   This, along with the new layers code, will help improve Firefox performance during things like page rendering and full-screen video.

Firefox’s hardware acceleration interacts with a machine’s graphics hardware via DirectX or OpenGL, depending on platform.  These interactions tend to be very sensitive to the graphics environment on the system (e.g., the specific video card(s) on the system, how much VRAM is available, the version of the video driver, the OS version, etc).  In fact, there are so many permutations of the relevant factors that we can’t test them all internally.  We need help from the community, so we can get exposure on as many unique hardware environments as possible.

To answer this need, I developed Grafx Bot.  It’s an add-on that end users can download, which runs a suite of automatic tests on their machine that exercises interesting aspects of hardware acceleration.  At the end of the tests, it allows users to post their results to Mozilla, where the data will be collected and analyzed, and hopefully lead to bug fixes and more reliable code for hardware acceleration than we’d otherwise have.

How it works

Grafx Bot is basically a new UI wrapped around Mozilla’s existing reftest framework.  Reftest works by comparing the visual output of two different pages, which are supposed to be rendered identically.  For example, this page and this page should be rendered identically, even though their markup is different.  If reftest detects that the pages are rendered differently, it fails the test.

Grafx Bot runs a series of reftests that are designed to exercise the hardware acceleration code.  Primarily, these are tests in the following categories:  css-transitions, layers, ogg-video, scrolling, svg, text, text-decoration, and z-index.  While the tests are executing, users will see images and colors flicker across their screen.

It’s possible that graphics hardware acceleration will produce small, but invisible changes in rendering between test pages.  We don’t really care about these, we only care when the differences are visually apparent.  Because of this, when Grafx Bot detects a reftest failure, it will ask the user if the differences are visible, by displaying both test images side-by-side.

Once the tests are complete, it will ask the user to click a button which submits the test results to Mozilla.  After the data is in our database, the results will be reviewed, and bugs will be filed as needed.

I recommend that users install Grafx Bot in a clean profile with no other add-ons or extensions installed.  This is because certain add-ons can interfere with Grafx Bot, or can cause it to run tests much slower than it would otherwise be able to.  For instructions on how to create a new profile in Firefox, see this article at support.mozilla.org.

Types of acceleration tested

There are multiple types of hardware acceleration being developed.  Here’s what Grafx Bot tests.

Grafx Bot runs the reftests with MOZ_ACCELERATED=11. This enables acceleration of layers via OpenGL (on Linux and Mac) and Direct3D (on Windows).  On Windows, Grafx Bot also toggles Direct2D acceleration.

The latter type of acceleration can be toggled dynamically, whereas the former cannot.  For this reason, on Windows only, Grafx Bot runs twice as many tests as on the other platforms.  It basically runs each test twice, with different settings, like so:

  • test 1a: test file with D2D enabled vs reference file with D2D enabled
  • test 1b: test file with D2D enabled vs test file with D2D disabled

Test data

Along with the results of all the reftests that Grafx Bot performs, we collect anonymous hardware information.  This allows us to associate test failures with specific hardware.  Users can see the system information we collect before they submit test results by clicking on the “System info” link on Grafx Bot’s homepage.  It looks something like this:

Even though this data is anonymous, this data is protected by Grafx Bot’s privacy policy (see link from the add-on’s homepage), and so is available only to Mozilla and to the user who submitted the test data.

Aggregate data, on the other hand, can be viewed by anyone.  Click here to peruse Grafx Bot data summaries.

More info

Want more information on hardware acceleration features in Firefox?  Check out these blog posts from Firefox graphics developers:

Hardware accelerating Firefox

Firefox video goes up to 11

Firefox and Direct2d: performance analysis

Direct2d: hardware rendering a browser