JGriffin's Blog

Is this thing on?

Engineering Productivity Update, August 13, 2015

From Automation and Tools to Engineering Productivity

Automation and Tools” has been our name for a long time, but it is a catch-all name which can mean anything, everything, or nothing, depending on the context. Furthermore, it’s often unclear to others which “Automation” we should own or help with.

For these reasons, we are adopting the name “Engineering Productivity”. This name embodies the diverse range of work we do, reinforces our mission (https://wiki.mozilla.org/Auto-tools#Our_Mission), promotes immediate recognition of the value we provide to the organization, and encourages a re-commitment to the reason this team was originally created—to help developers move faster and be more effective through automation.

The “A-Team” nickname will very much still live on, even though our official name no longer begins with an “A”; the “get it done” spirit associated with that nickname remains a core part of our identity and culture, so you’ll still find us in #ateam, brainstorming and implementing ways to make the lives of Mozilla’s developers better.

Highlights

Treeherder: Most of the backend work to support automatic starring of intermittent failures has been done. On the front end, several features were added to make it easier for sheriffs and others to retrigger jobs to assist with bisection: the ability to fill in all missing jobs for a particular push, the ability to trigger Talos jobs N times, the ability to backfill all the coalesced jobs of a specific type, and the ability to retrigger all pinned jobs. These changes should make bug hunting much easier.  Several improvements were made to the Logviewer as well, which should increase its usefulness.

Perfherder and performance testing: Lots of Perfherder improvements have landed in the last couple of weeks. See details at wlach’s blog post.  Meanwhile, lots of Talos cleanup is underway in preparation for moving it into the tree.

MozReview: Some upcoming auth changes are explained in mcote’s blog post.

Mobile automation: gbrown has converted a set of robocop tests to the newly enabled mochitest-chrome on Android. This is a much more efficient harness and converting just 20 tests has resulted in a reduction of 30 minutes of machine time per push.

Developer workflow: chmanchester is working on building annotations into moz.build files that will automatically select or prioritize tests based on files changed in a commit. See his blog post for more details. Meanwhile, armenzg and adusca have implemented an initial version of a Try Extender app, which allows people to add more jobs on an existing try push. Additional improvements for this are planned.

Firefox automation: whimboo has written a Q2 Firefox Automation Report detailing recent work on Firefox Update and UI tests. Maja has improved the integration of Firefox media tests with Treeherder so that they now officially support all the Tier 2 job requirements.

WebDriver and Marionette: WebDriver is now officially a living standard. Congratulations to David Burns, Andreas Tolfsen, and James Graham who have contributed to this standard. dburns has created some documentation which describes which WebDriver endpoints are implemented in Marionette.

Version control: The ability to read and extra metadata from moz.build files has been added to hg.mozilla.org. This opens the door to cool future features, like the ability auto file bugs in the proper component and automatically selecting appropriate reviewers when pushing to MozReview. gps has also blogged about some operational changes to hg.mozilla.org which enables easier end-to-end testing of new features, among other things.

The Details

bugzilla.mozilla.org
Treeherder/Automatic Starring
  • almost finished the required changes to the backend (both db schema and data ingestion)
Treeherder/Front End
  • Several retrigger features were added to Treeherder to make merging and bisections easier:  auto fill all missing/coalesced jobs in a push; trigger all Talos jobs N times; backfill a specific job by triggering it on all skipped commits between this commit and the commit that previously ran the job, retrigger all pinned jobs in treeherder.  This should improve bug hunting for sheriffs and developers alike.
  • [jfrench] Logviewer ‘action buttons’ are now centralized in a Treeherder style navbar https://bugzil.la/1183872
  • [jfrench] Logviewer skipped steps are now recognized as non-failures and presented as blue info steps https://bugzil.la/1192195, https://bugzil.la/1192198
  • [jfrench] Middle-mouse-clicking on a job in treeherder now launches the Logviewer https://bugzil.la/1077338
  • [vaibhav] Added the ability to retrigger all pinned jobs (bug https://bugzil.la/1121998)
  • Camd’s job chunking management will likely land next week https://bugzil.la/1163064
Perfherder/Performance Testing
  • [wlach] / [jmaher] Lots of perfherder updates, details here: http://wrla.ch/blog/2015/08/more-perfherder-updates/ Highlights below
  • [wlach] The compare pushes view in Perfherder has been improved to highlight the most important information.
  • [wlach] If your try push contains Talos jobs, you’ll get a url for the Perfherder comparison view when pushing (https://bugzil.la/1185676).
  • [jmaher/wlach] Talos generates suite and test level metrics and perfherder now ingests those data points. This fixes results from internal benchmarks which do their own summarization to report proper numbers.
  • [jmaher/parkouss] Big talos updates (thanks to :parkouss), major refactoring, cleanup, and preparation to move talos in tree.
MozReview/Autoland
Mobile Automation
  •  [gbrown] Demonstrated that some all-javascript robocop tests can run more efficiently as mochitest-chrome; about 20 such tests converted to mochitest-chrome, saving about 30 minutes per push.
  •  [gbrown] Working on “mach emulator” support: wip can download and run 2.3, 4.3, or x86 emulator images. Sorting out cache management and cross-platform issues.
  •  [jmaher/bc] landed code for tp4m/tsvgx on autophone- getting closer to running on autophone soon.
Dev Workflow
  • [ahal] Created patch to clobber compiled python files in srcdir
  • [ahal] More progress on mach/mozlog patch https://bugzil.la/1027665)
  • [chmanchester] Fix to allow ‘mach try’ to work without test arguments (bug 1192484)
Media Automation
  • [maja_zf] firefox-media-tests ‘log steps’ and ‘failure summaries’ are now compatible with Treeherder’s log viewer, making them much easier to browse. This means the jobs now satisfy all Tier-2 Treeherder requirements.
  • [sydpolk] Refactoring of tests after fixing stall detection is complete. I can now take my network bandwidth prototype and merge it in.
Firefox Automation
General Automation
  • Finished adapting mozregression (https://bugzil.la/1132151) and mozdownload (https://bugzil.la/1136822) to S3.
  • (Henrik) Isn’t archive.mozilla.org only a temporary solution, before we move to TC?
  • (armenzg) I believe so but for the time being I believe we’re out of the woods
  • The manifestparser dependency was removed from mozprofile (bug 1189858)
  • [ahal] Fix for https://bugzil.la/1185761
  • [sydpolk] Platform Jenkins migration to the SCL data center has not yet begun in earnest due to PTO. Hope to start making that transition this week.
  • [chmanchester] work in progress to build annotations in to moz.build files to automatically select or prioritize tests based on what changed in a commit. Strawman implementation posted in https://bugzil.la/1184405 , blog post about this work at http://chmanchester.github.io/blog/2015/08/06/defining-semi-automatic-test-prioritization/
  • [adusca/armenzg] Try Extender (http://try-extender.herokuapp.com ) is open for business, however, a new plan will soon be released to make a better version that integrates well with Treeherder and solves some technichal difficulties we’re facing
  • [armenzg] Code has landed on mozci to allow re-triggering tasks on TaskCluster. This allows re-triggering TaskCluster tasks on the try server when they fail.
  • [armenzg] Work to move Firefox UI tests to the test machines instead of build machines is solving some of the crash issues we were facing
ActiveData
  • [ahal] re-implemented test-informant to use ActiveData: http://people.mozilla.org/~ahalberstadt/test-informant/
  • [ekyle] Work on stability: Monitoring added to the rest of the ActiveData machines.  
  • [ekyle] Problem:  ES was not balancing the import workload on the cluster; probably because ES assumes symmetric nodes, and we do not have that.  The architecture was changed to prefer a better distribution of work (and query load) – There now appears to be less OutOfMemoryExceptions, despite test-informant’s queries.
  • [ekyle] More Problems:  Two servers in the ActiveData complex failed: The first was the ActiveData web server; which became unresponsive, even to SSH.  The machine was terminated.  The second server was the ‘master’ node of the ES cluster: This resulted in total data loss, but it was expected to happen eventually given the cheap configuration we have.   Contingency was in place:  The master was rebooted, the  configuration was verified, and data re-indexed from S3.   More nodes would help with this, but given the rarity of the event, the contingency plan in place, and the low number of users, it is not yet worth paying for. 
WebDriver (highlights)
  • [ato] WebDriver is now officially a living standard (https://sny.no/2015/08/living)
  • [ato] Rewrote chapter on extending the WebDriver protocol with vendor-specific commands
  • [ato] Defined the Get Element CSS Value command in specification
  • [ato] Get Element Attribute no longer conflates DOM attributes and properties; introduces new command Get Element Property
  • [ato] Several significant infrastructural issues with the specification was fixed
Marionette
hg.mozilla.org
charts.mozilla.org
  • Project managers for FxOS have a renewed interest in the project tracking, and overall status dashboards.   Talk only, no coding yet.
Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: