JGriffin's Blog

Is this thing on?

Tag Archives: A-Team

Automation and Tools Team Update, July 16 2015

The Automation and Tools Team (the A-Team, for short) is a large team that oversees a diverse set of services, tools and test harnesses used by nearly everyone at Mozilla.  We’re borrowing a page from Release Engineering and publishing a series of updates to inform people about what we’re up to, in the hopes of fostering better visibility and inter-team coordination.

Highlights

Treeherder and Automatic Starring: Our focus for Treeherder in Q3 will be improving the signal-to-noise ratio for dealing with intermittent oranges. An overall design has been agreed to for the “automatic starring” project, and work has begun; final rollout is likely in Q4. This quarter, we’ll also stop spamming Bugzilla with comments for each intermittent, but we will put in place an alternate notification system for people who rely on Bugzilla orange comments to determine when an intermittent needs attention. We’ve also agreed on a redesign for the Logviewer that should result in a more useful and intuitive interface.
MozReview and Autoland:  MozReview now offers to publish review requests when you push, so it isn’t necessary to visit the MozReview’s UI. Work has started on adding support for autoland-to-inbound, which will allow developers to push changes to inbound directly from MozReview… no more battling tree closures!
Performance: Work continues on Perfherder’s “comparison mode”, a view that compares Talos performance data between two revisions. See wlach’s blog post for more details.
 
TaskCluster Support: We’re helping Release Engineering migrate from Buildbot to TaskCluster; this quarter we’re standing up Linux tests in TaskCluster and getting OS X cross-compilation to work so that we can move those builds to the cloud.
BMO now has tests running in continuous integration using TaskCluster and reporting to Treeherder.
Mobile Automation: mochitest-chrome for Android is now live! Work is also underway to enable debug reftests on Android emulators, and significant reliability improvements have been landed in Autophone.
Desktop Automation: Work is in progress to get Thread Sanitizer (TSan) builds running on try and to split gTest into its own test chunk. We’re also working towards applying –run-by-dir to mochitest-plain, in order to improve isolation and enable smarter chunking in CI.
Developer Workflow: We’re adding test-selection flexibility to the reftest harness as a prelude to making ‘mach try’ work with more test types.

The Details

Bugzilla.mozilla.org
Treeherder/Automatic Starring
  • Work has started on backend work needed to support automatic starring, including db simplification, and db unification (so each tree doesn’t have its own database).  Bug 1179263 tracks this work.  As a side effect of this work, Treeherder code should become less complex and easier to maintain.
  • Work has started on identifying what needs to happen in order to turn off Bugzilla comments for intermittents, and to create an alternative notification mechanism instead.  Bug 1179310 tracks this work.
Treeherder/Front End
  • New shortcuts for Logviewer, Delete Classification plus improved classification save
  • Design work is in progress for collapsing chunks in Treeherder in order to reduce visual noise in bug 1163064
Perfherder/Performance Testing
  • Evaluating alerts generated from PerfHerder
  • Improvements to compare chooser and viewer inside of PerfHerder
  • Work towards building a new tab switching test (bug 1166132)
MozReview/Autoland
  • Automatic publishing of reviews upon pushing
  • Known bug: people using cookie auth may experience bug 1181886
  • Better error message when MozReview’s Bugzilla session has expired (bug 1178811)
  • Pruned user database to improve user searching (bug 1171274)
  • Work is progressing on autoland-to-inbound (bug 1128039)
TaskCluster Support
  • Ability to schedule Linux64 tests on try (tests not running yet due to a couple blockers) – bug 1171033
  • Working on OSX cross-compilation, which will allow us to move OSX builds to the cloud; this will make OSX builds much faster in CI.
Mobile Automation
  • Autophone detects USB lock-ups and gracefully restarts. This is a huge improvement in system reliability.
  • Continued work on getting Android Talos tests ported to Autophone (bug 1170685)
  • Updated manifests and mozharness configs for mochitest-chrome (bug 1026290)
  • Determined total-chunks requirements for Android 4.3 Debug reftests (bug 1140471)
  • Re-wrote robocop harness to significantly improve run-time efficiency (bug 1179981)
Dev Workflow
  • Helped RelEng resolve some problems that were preventing them from landing mozharness in the tree.  This opens the door to a lot of future dev workflow improvements, including better unification of the ways we run automated tests in continuous integration and locally.  We’ve wanted this for years and it’s great to see it finally happen.
  • Did some work on top of jgraham’s patch to make mach use mozlog structured logging
Platform QA
  • We had to respond to the breakup of .tests.zip into several files to keep our Jenkins instance running.
  • Getting firefox-media-tests to satisfy Tier-2 Treeherder visibility requirements involves changing how Treeherder accommodates non-buildbot jobs (e.g bug 1182299)
General Automation
  • Working on running multiple tests/manifests through reftests harness as a prelude for supporting |mach try| for more test types.
  • Created patch to move mozlog.structured to the top level package (and what was previously there to mozlog.unstructured)
  • Figured out the series of steps needed to produce a usable Thread Sanitizer enabled linux build on our infra
  • Separating out gTest into a separate job in CI – bug 1179955.
ActiveData
  • More memory optimizations (motivation: releng query for Chris Atlee:  query slow tests)
    • run staging environment as stability test for production
    • change etl procedure so pushing changes to prod are easier (moving toward standard procedure)
  • import treeherder data markup to active data (motivation: characterizing test failures
    • ateam query: summary of test failures, stars and resolutions (bug 1161268bug 1172048)
    • subtests are too large for download of more than one day – working on code to only pull what’s required

 

Advertisements