JGriffin's Blog

Is this thing on?

Monthly Archives: October 2015

Engineering Productivity Update, Oct 21, 2015

It’s Q4, and at Mozilla that means it’s planning season. There’s a lot of work happening to define a Vision, Strategy and Roadmap for all of the projects that Engineering Productivity is working on; I’ll share progress on that over the next couple of updates.

Higlights

Build System: Work is starting on a comprehensive revamp of the build system, which should make it modern, fast, and flexible. A few bits of this are underway (like migration of remaining Makefiles to moz.build); more substantial progress is being planned for Q1 and the rest of 2016.

Bugzilla: Duo 2FA support is coming soon! The necessary Bugzilla changes has landed, we’re just waiting for some licensing details to be sorted out.

Treeherder: Improvements have been made to the way that sheriffs can backfill jobs in order to bisect a regression. Meanwhile, lots of work continues on backend and frontend support for automatic starring.

Perfherder and Performance Testing: Some optimizations were made to Perfherder which has made it more performant – no one wants a slow performance monitoring dashboard! jmaher and bc are getting close to being able to run Talos on real devices via Autophone; some experimental runs are already showing up on Treeherder.

MozReview and Autoland: It’s no longer necessary to have an LDAP account in order to push commits to MozReview; all that’s needed is a Bugzilla account. This opens the door to contributors using the system. Testing of Autoland is underway on MozReview’s dev instance – expect it to be available in production soon.

TaskCluster Migration: OSX cross-compiled builds are now running in TaskCluster and appearing in Treeherder as Tier-2 jobs, for debug and static checking. The TC static checking build with likely become the official build soon (and the buildbot build retired); the debug build won’t become official until work is done to enable existing test jobs to consume the TC build.

Work is progressing on enabling TaskCluster test jobs for linux64-debug; our goal is to have these all running side-by-side the buildbot jobs this quarter, so we can compare failure rates before turning off the corresponding buildbot jobs in Q1. Moving these jobs to TaskCluster enables us to chunk them to a much greater degree, which will offer some additional flexibility in automation and improve end-to-end times for these tests significantly.

Mobile Automation: All Android test suites that show in Treeherder can now be run easily using mach.

Dev Workflow: It’s now easier to create new web-platform-tests, thanks to a new |mach web-platform-tests-create| command.

e10s Support: web-platform-tests are now running in e10s mode on linux and OSX platforms. We want to turn these and other tests in e10s mode on for Windows, but have hardware capacity problems. Discussions are underway on how to resolve this in the short-term; longer-term plans include an increase in hardware capacity.

Test Harnesses: run-by-dir is now applied to all mochitest jobs on desktop. This improves test isolation and paves the way for chunking changes which we will use to improve end-to-end times and make bisection turnaround faster. Structured logging has been rolled out to Android reftests; Firefox OS reftests still to come.

ActiveData: Work is in progress to build out a model of our test jobs running in CI, so that we can identify pieces of job setup and teardown which are too slow and targets of possible optimization, and so that we can begin to predict the effects of changes to jobs and hardware capacities.

hg.mozilla.org: Mercurial 3.6 will have built-in support for seeding clones from pre-generated bundle files, and will have improved performance for cloning, especially on Windows.

Marionette and WebDriver: Message sequencing is being added to Marionette; this will help prevent synchronization issues where the client mixes up responses. Client-side work is being done in both Python and node.js. ato wrote an article making a case against visibility checks in WebDriver.

Details

bugzilla.mozilla.org

  • bug 1199089 – support for Duo 2FA has landed.  It isn’t available just yet as we’re waiting on the licensing situation to be sorted
  • lots of tweaks to the experimental UI

Treeherder

Perfherder/Performance Testing

MozReview/Autoland

TaskCluster Support

  • Landed all patches for cross-mac builds, running fine on inbound/central!
  • [ahal] Got some linux64 tests running (various flavours of mochitest, reftest and xpcshell), though not yet green.

Mobile Automation

  • [gbrown] mach reftest|crashtest|jstestbrowser now supports Firefox for Android (all Android test suites run on treeherder can now be run from mach)

Dev Workflow

  • [jgraham] Added a |mach web-platform-tests-create| target to help with the workflow of creating new web-platform-tests.

Firefox and Media Automation

  • Netflix bandwidth limiting tests blocked because of a problem on Netflix side.
  • Web platform media-source directory no longer being run on our Jenkins since all platforms of web platform tests now run as part of release.
  • We’ve established a roadmap that coordinates moving ui-tests and media-tests in-tree, updating the Marionette test runner and moving media jobs into mozmill-ci

General Automation

  • [jgraham] web-platform-tests-e10s now running across all trees on Mac/Linux (Windows has capacity problems)
  • SETA updated to support new android debug tests
  • run-by-dir is enabled for all desktop mochitests.
  • [ahal] reftest structured logging working on desktop and android (b2g still left to do)

ActiveData

hg.mozilla.org

  • Mercurial 3.6 will have built-in support for seeding clones from pre-generated, externally-hosted bundle files (i.e. the bundleclone extension)
  • Mercurial 3.6 features significant performance improvements for cloning, especially on Windows.

WebDriver

Marionette

Advertisements

Engineering Productivity Update, Oct 1, 2015

We’ve said good-bye to Q3, and are moving on to Q4. Planning for Q4 goals and deliverables is well underway; I’ll post a link to the final versions next update.

Last week, a group of 8-10 people from Engineering Productivity gathered in Toronto to discuss approaches to several aspects of developer workflow. You can look at the notes we took; next up is articulating a formal Vision and Roadmap for 2016, which incorporates both this work as well as other planning which is ongoing separately for things like MozReview and Treeherder.

Highlights

Bugzilla: Support for 2FA has been enhanced.

Treeherder:

  • The automatic starring backend, along with related database changes, is now in production. In Q4 we’ll be developing a simple UI for this, and by the end of quarter, automatic starring for at least simple failures should be a reality.
  • Treeherder will soon stop posting bug comments for each intermittent failure. Instead OrangeFactor will post periodic summaries on bugs – see: https://groups.google.com/d/msg/mozilla.dev.tree-management/az643p0u4hs/3el7fqIDBwAJ
  • Job Ingestion via Pulse Exchanges is in the final review stages.  This will allow projects like Task Cluster to send JSON Schema-validated job data to Treeherder via a Pulse Exchange, rather than our APIs.  It also enables developers and testers the ability to ingest production jobs from Task Cluster to their local machine.  Blog post: https://cheshirecam.wordpress.com/2015/09/30/treeherder-loading-data-from-pulse/
  • :Goma’s line highlighting and linking in the log viewer are now live. See this blog post for details.
  • Jonathan French, our awesome contractor and contributor, has landed onscreen shortcuts; see this blog post. Jonathan will be moving on to other things soon, and we’ll sorely miss him!

Perfherder and Performance Automation:

  • Work is underway to prototype a UI in Perfherder which can be used for performance sheriffing sans Alert Manager or Graphserver; follow bug 1201154 for more details. Separately, work has been started to allow other performance harnesses (besides Talos) submit data to Perfherder; bug 1175295.
  • Talos on linux32 has been turned off; the machines that had been used for this are being repurposed as Windows 7 and Windows 8 test workers, in order to reduce overall wait times on those platforms.
  • The dromaeo DOM Talos test has been enabled on linux64.

MozReview and Autoland: mcote posted a blog post detailing some of the rough edges in MozReview, and explaining how the team intends on tackling these. dminor blogged about the state of autoland; in short, we’re getting close to rolling out an initial implementation which will work similarly to the current “checkin-needed” mechanism, except, of course, it will be entirely automated. May you never have to worry about closed trees again!

Mobile Automation: gbrown made some additional improvements to mach commands on Android; bc has been busy with a lot of Autophone fixes and enhancements.

Firefox Automation: maja_zf has enabled MSE playback tests on trunk, running per-commit. They will go live at the next buildbot reconfig.

Developer Workflow: numerous enhancements have been made to |mach try|; see list below in the Details section.  run-by-dir has been applied to mochitest-plain on most platforms, and to mochitest-chrome-opt, by kaustabh93, one of team’s contributors. This reduces test bleedthrough, a source of intermittent failures, as well as improves our ability to change job chunking without breaking tests.

Build System: gps has improved test package generation, which results in significantly faster builds – a savings of about 5 minutes per build on OSX and Windows in automation; about 90s on linux.

TaskCluster Migration: linux64 debug builds are now running, so ahal is unblocked on getting linux64 debug tests running in TaskCluster.  armenzg has landed mozharness code to support running buildbot jobs via TaskCluster scheduling, via buildbot bridge.

The Details

bugzilla.mozilla.org

Treeherder

Perfherder/Performance Testing

TaskCluster Support

Mobile Automation

  • mach cppunittest now supports Firefox for Android
  • mach test commands now download host utilities for Firefox for Android
  • [bc] Autophone
  • Bug 1202826 – Autophone – 2015-09-09 deployment
  • Bug 1202833 – Autophone – CHARGING state should not prevent Autophone shutdown/restart
  • Bug 1201061 – Autophone – deploy robocop_adobe_flash.html
  • Bug 1196115 – Intermittent Crash Autophone S1S2Test beginning 2015-08-18
  • Bug 1207836 – Autophone – 2015-09-23 deployment
  • Bug 1205864 –  Autophone – phonetest.py:Logcat collects duplicate messages
  • Bug 1206954 – Autophone – better handle failures to submit results to PhoneDash
  • Bug 1209796 – Autophone – next deployment (In progress)
  • Bug 1205836 – Autophone – investigate orange for remote nytimes s1s2
  • Bug 1208782 – Autophone – do not attempt to get response json during Treeherder submission error if response is None
  • Bug 1209647 – Autophone – eliminate startup check for network connectivity
  • Bug 1209651 – Autophone – do not allow logcat device error to prevent setup_job initialization
  • Bug 1209653 – Autophone – after clearing logcat, specifying -b main can hang
  • Bug 1209675 – Autophone – Logcat should use PhoneTest loggerdeco
  • Bug 1209691 – Autophone – handle incorrect logcat dates emitted by devices.
  • jmaher/wlach working to get Autophone Talos reporting results to PerfHerder

Firefox and Media Automation

  • [maja_zf] MSE Video Playback buildbot jobs will be deployed to run per-commit on mozilla-inbound any day now…

General Automation

  • [ahal] started work on reftest using structured logging
  • [ahal] consolidate mochitest + xpcshell’s StructuredLog.jsm
  • [jgraham] Landed new |mach try| implementation that passes test paths rather than manifest paths; this adds support for web-platform-tests in |mach try|
  • [jgraham] Added support for saving and reusing try strings in |mach try|
  • [jgraham] Added Talos support to |mach try|
  • [jgraham] reftest and xpcshell test harnesses now take paths to multiple test locations on the command line and expose more functionality through mach
  • [jmaher] Kaustabh93 has runbydir live for mochitest-plain osx debug, and mochitest-chrome opt;  All that is left is mochitest-chrome debug and linux64 ASAN e10s.
  • [ato] Support for running Marionette tests using `mach try` in review

ActiveData

WebDriver (highlights)

  • [ato] Defined remote end steps for Element Clear command
  • [ato] Element location strategies have been outlined
  • [ato] Added steps to Base64 encode screen capture results
  • [ato] Because implementors have relied on prose from outdated sections, warnings were added to those sections which have yet to be redefined
  • + a ton of various fixes and rewording

Marionette

  • [ato] findChildElement and findChildElements commands removed

bughunter

  • [bc] Have been keeping the system running, helping triage bugs
  • [tomcat] Has been filing bugs, sent a September status report to internal set of people.

bugherder

  • bugs 924405/1199788 – Bugherder now uses Bugzilla’s native REST API and can use bugzilla api keys for authentication even when 2FA is enabled.

Firefox build system

  • [gps] Test packaging is now drastically faster in automation. 50% reduction across all platforms. This is a 5+ minute decrease on OS X build jobs!