JGriffin's Blog

Is this thing on?

Monthly Archives: July 2015

A-Team Update, July 29, 2015

Highlights

Treeherder: We’ve added to mozlog the ability to create error summaries which will be used as the basis for automatic starring.  The Treeherder team is working on implementing database changes which will make it easier to add support for that.  On the front end, there’s now a “What’s Deployed” link in the footer of the help page, to make it easier to see what commits have been applied to staging and production.  Job details are now shown in the Logviewer, and a mockup has been created of additional Logviewer enhancements; see bug 1183872.

MozReview and Autoland: Work continues to allow autoland to work on inbound; MozReview has been changed to carry forward r+ on revised commits.

Bugzilla: The ability to search attachments by content has been turned off; BMO documentation has been started at https://bmo.readthedocs.org.

Perfherder/Performance Testing: We’re working towards landing Talos in-tree.  A new Talos test measuring tab-switching performance has been created (TPS, or Talos Page Switch); e10s Talos has been enabled on all platforms for PGO builds on mozilla-central.  Some usability improvements have been made to Perfherder – https://treeherder.mozilla.org/perf.html#/graphs.

TaskCluster: Successful OSX cross-compilation has been achieved; working on the ability to trigger these on Try and sorting out details related to packaging and symbols.  Work on porting Linux tests to TaskCluster is blocked due to problems with the builds.

Marionette: The Marionette-WebDriver proxy now works on Windows.  Documentation on using this has been added at https://developer.mozilla.org/en-US/docs/Mozilla/QA/Marionette/WebDriver.

Developer Workflow: A kill_and_get_minidump method has been added to mozcrash, which allows us to get stack traces out of Windows mochitests in more situations, particularly plugin hangs.  Linux xpcshell debug tests have been split into two chunks in buildbot in order to reduce E2E times, and chunks of mochitest-browser-chrome and mochitest-devtools-chrome have been re-normalized by runtime across all platforms.  Now that mozharness lives in the tree, we’re planning on removing the “in-tree configs”, and consolidating them with the previously out-of-tree mozharness configs (bug 1181261).

Tools: We’re testing an auto-backfill tool which will automatically retrigger coalesced jobs in Treeherder that precede a failing job.  The goal is to reduce the turnaround time required for this currently manual process, which should in turn reduce tree closure times related to test failures

The Details

bugzilla.mozilla.org

Treeherder/Automatic Starring

  • We’re generating error summaries now that will serve as the basis for automatic starring work.

Treeherder/Front End

  • New “What’s Deployed” feature in Help footer to view stage/prod deployment status
  • Logviewer now contains the full ‘Job Info’ aka. tinderbox printlines (bug 1092209)
  • Created a mock of logviewer UI changes (bug 1183872)

Perfherder/Performance Testing

  • Working towards moving Talos code in-tree (bug 787200)
  • New Talos test TPS (Talos Page Switch) (bug 1166132)
  • Fixed a few data ingestion/duplication cases.
  • Adjusting calculation of suite summaries to match graph server, not finished yet (tracking: bug 1184968)
  • e10s on all platforms, only runs on mozilla-central for pgo builds, broken tests, big regressions are tracked in bug 1144120
  • perfherder is easier to use, some polish on test selection and the  compare view, and most importantly we have found a few odd bugs that has  caused duplicate data to show up, check it out: https://treeherder.mozilla.org/perf.html#/graphs
  • Starting the work of moving Android Talos to Autophone (bug 1170685)

MozReview/Autoland

  • bug 1184079 – Fix for autopublishing when authenticating to MozReview via BMO cookies
  • bug 1178025 – Commits table looks nicer
  • bug 1175166 – r+ is now carried forward on commits from level 3 authors

TaskCluster Support

Mobile Automation

  • Continued work on porting android talos tests to autophone, remaining work is to figure out posting results and ensuring it runs on a regular basis and reliable.
  • Support for the Android stock browser and Dolphin has been added to mozbench (bug 1103134)

Dev Workflow

  • Created patch that replaces mach’s logger with mozlog. Still several rough edges and perf issues to iron out

Media Automation

  • The new MSE rewrite is now enabled by default on Nightly and we’re replacing a few tests in response: bug 1186943 – detection of video stalls has to repond to new internal strings from new MSE implementation by :jya.
  • firefox-media-tests mozharness log is now parsed into steps for Treeherder’s Log Viewer
  • Fixed a problem with automation scripts for WebRTC tests for Windows 64.

General Automation

  • Moved mozlog.structured to top-level mozlog, and released mozlog 3.0
  • Added a kill_and_get_minidump method to mozcrash (bug 890026). As a result we’re getting minidumps out of Windows mochitests under more circumstances (in particular, plugin hangs in certain intermittently failing tests).
  • The MozillaPulse consumer now supports listening to multiple exchanges simultaneously (bug 1180897).
  • Bug 1186420 – Autophone – update requirements and deploy thclient 1.6
  • Bughunter moved to SCL3 without interruption
  • Bug 1185498 – Sisyphus – Bughunter – consume urls directly from Socorro
  • linux debug xpcshell was split into two chunks to reduce E2E times (bug 1185499)
  • runtimes for mochitest-browser-chrome and mochitest-devtools have been renormalized across all platforms
  • Allow Firefox UI tests to determine where to get Firefox crash symbols for releases and improve reproducibility
  • Testing auto-backfill in production (bug 1180732)
  • Now that mozharness lives in the tree, we’re going to remove the “in-tree configs”, which will consolidate mozharness options and make maintenance simpler (bug 1181261)

ActiveData

  • ActiveData requires monitoring on all nodes before it can be left alone for more than a day without it failing:
    • Made  fork of Supervisor to run simple Cron jobs – the biggest task was  finding and installing (and compiling!) the C libraries used
    • Added  Supervisor to spot instances to monitor ES; not just the process, but  query response time.  Also monitoring the indexing jobs.
  • Replicated OrangeFactor to ActiveData so masters student (and the public) we can query it, or extract it.

Marionette

  • Landed Proxy support via capabilities
  • Updating cookie support to return httpOnly flag
  • Added a –version arg to Marionette (bug 1183157)
  • Landing support for W3C Compatible Drivers in Selenium Tree and released 2.46.1 so users can use it.
  • Wrote a small guide to use it https://developer.mozilla.org/en-US/docs/Mozilla/QA/Marionette/WebDriver
  • Marionette<->WebDriver Proxy now works on Windows, Linux and OSX as of 0.3.0
Advertisements

Automation and Tools Team Update, July 16 2015

The Automation and Tools Team (the A-Team, for short) is a large team that oversees a diverse set of services, tools and test harnesses used by nearly everyone at Mozilla.  We’re borrowing a page from Release Engineering and publishing a series of updates to inform people about what we’re up to, in the hopes of fostering better visibility and inter-team coordination.

Highlights

Treeherder and Automatic Starring: Our focus for Treeherder in Q3 will be improving the signal-to-noise ratio for dealing with intermittent oranges. An overall design has been agreed to for the “automatic starring” project, and work has begun; final rollout is likely in Q4. This quarter, we’ll also stop spamming Bugzilla with comments for each intermittent, but we will put in place an alternate notification system for people who rely on Bugzilla orange comments to determine when an intermittent needs attention. We’ve also agreed on a redesign for the Logviewer that should result in a more useful and intuitive interface.
MozReview and Autoland:  MozReview now offers to publish review requests when you push, so it isn’t necessary to visit the MozReview’s UI. Work has started on adding support for autoland-to-inbound, which will allow developers to push changes to inbound directly from MozReview… no more battling tree closures!
Performance: Work continues on Perfherder’s “comparison mode”, a view that compares Talos performance data between two revisions. See wlach’s blog post for more details.
 
TaskCluster Support: We’re helping Release Engineering migrate from Buildbot to TaskCluster; this quarter we’re standing up Linux tests in TaskCluster and getting OS X cross-compilation to work so that we can move those builds to the cloud.
BMO now has tests running in continuous integration using TaskCluster and reporting to Treeherder.
Mobile Automation: mochitest-chrome for Android is now live! Work is also underway to enable debug reftests on Android emulators, and significant reliability improvements have been landed in Autophone.
Desktop Automation: Work is in progress to get Thread Sanitizer (TSan) builds running on try and to split gTest into its own test chunk. We’re also working towards applying –run-by-dir to mochitest-plain, in order to improve isolation and enable smarter chunking in CI.
Developer Workflow: We’re adding test-selection flexibility to the reftest harness as a prelude to making ‘mach try’ work with more test types.

The Details

Bugzilla.mozilla.org
Treeherder/Automatic Starring
  • Work has started on backend work needed to support automatic starring, including db simplification, and db unification (so each tree doesn’t have its own database).  Bug 1179263 tracks this work.  As a side effect of this work, Treeherder code should become less complex and easier to maintain.
  • Work has started on identifying what needs to happen in order to turn off Bugzilla comments for intermittents, and to create an alternative notification mechanism instead.  Bug 1179310 tracks this work.
Treeherder/Front End
  • New shortcuts for Logviewer, Delete Classification plus improved classification save
  • Design work is in progress for collapsing chunks in Treeherder in order to reduce visual noise in bug 1163064
Perfherder/Performance Testing
  • Evaluating alerts generated from PerfHerder
  • Improvements to compare chooser and viewer inside of PerfHerder
  • Work towards building a new tab switching test (bug 1166132)
MozReview/Autoland
  • Automatic publishing of reviews upon pushing
  • Known bug: people using cookie auth may experience bug 1181886
  • Better error message when MozReview’s Bugzilla session has expired (bug 1178811)
  • Pruned user database to improve user searching (bug 1171274)
  • Work is progressing on autoland-to-inbound (bug 1128039)
TaskCluster Support
  • Ability to schedule Linux64 tests on try (tests not running yet due to a couple blockers) – bug 1171033
  • Working on OSX cross-compilation, which will allow us to move OSX builds to the cloud; this will make OSX builds much faster in CI.
Mobile Automation
  • Autophone detects USB lock-ups and gracefully restarts. This is a huge improvement in system reliability.
  • Continued work on getting Android Talos tests ported to Autophone (bug 1170685)
  • Updated manifests and mozharness configs for mochitest-chrome (bug 1026290)
  • Determined total-chunks requirements for Android 4.3 Debug reftests (bug 1140471)
  • Re-wrote robocop harness to significantly improve run-time efficiency (bug 1179981)
Dev Workflow
  • Helped RelEng resolve some problems that were preventing them from landing mozharness in the tree.  This opens the door to a lot of future dev workflow improvements, including better unification of the ways we run automated tests in continuous integration and locally.  We’ve wanted this for years and it’s great to see it finally happen.
  • Did some work on top of jgraham’s patch to make mach use mozlog structured logging
Platform QA
  • We had to respond to the breakup of .tests.zip into several files to keep our Jenkins instance running.
  • Getting firefox-media-tests to satisfy Tier-2 Treeherder visibility requirements involves changing how Treeherder accommodates non-buildbot jobs (e.g bug 1182299)
General Automation
  • Working on running multiple tests/manifests through reftests harness as a prelude for supporting |mach try| for more test types.
  • Created patch to move mozlog.structured to the top level package (and what was previously there to mozlog.unstructured)
  • Figured out the series of steps needed to produce a usable Thread Sanitizer enabled linux build on our infra
  • Separating out gTest into a separate job in CI – bug 1179955.
ActiveData
  • More memory optimizations (motivation: releng query for Chris Atlee:  query slow tests)
    • run staging environment as stability test for production
    • change etl procedure so pushing changes to prod are easier (moving toward standard procedure)
  • import treeherder data markup to active data (motivation: characterizing test failures
    • ateam query: summary of test failures, stars and resolutions (bug 1161268bug 1172048)
    • subtests are too large for download of more than one day – working on code to only pull what’s required