Is this thing on?
For the GoFaster project, releng and the A-team have been working on various tasks which we hope will result in getting the total commit to all-tests-done time down to 2 hours for the main branches (try excluded). This total turnaround time was 6-8 hours a couple of months ago when we began this project.
We’ve recently made some improvements that seriously reduce the total machine time required to run all tests for a given commit. These include hiding the mochitest results table, removing packed.js from mochitest, and streamlining individual slow tests (see bug 674738, bug 676412, and bug 670229). These together have reduced the total machine time for test down from about 40 hours to around 25 hours per commit, a big win.
However, the total turnaround times are still much slower than our goal:
We already knew that PGO builds are slow, and jhford is working on turning on-demand builds into non-PGO builds, and make PGO builds every four hours (bug 658313). However, we needed a way to dig deeper into the data to see what our other pain points are.
Will Lachance made some awesome build charts which help us visualize what’s going on in these buildbot jobs. Clicking any commit will show a chart that displays all the relevant buildbot jobs in relative clock time; this makes it easier to see where the bottlenecks are.
Display the build chart for just about any commit (e58e98a89827 for instance), and you’ll see the problem right away: just about every commit includes builds that far exceed 2 hours. These aren’t always opt builds, and they sometimes occur even on our ‘fast’ OS: linux. Check out 5d9989c3bff6, which has a linux64 opt build that takes 214 min, compared to the linux32 opt build that takes 61 minutes. 198c7de0699d has an OSX 10.5 debug build that takes 171 minutes, but the 10.6 debug build takes only 82 minutes. Clearly, we can’t hit our 2-hour goal with builds that take 2+ hours. What’s going on?
It’s necessary to spend a little time digging through build logs to find out. It turns out there are multiple factors.
When all these factors coincide, we can get builds (which include compile, update, and other steps) that exceed 4 hours. This suggests doing away with on-demand PGO builds may not in itself get us to our 2-hour goal.
From this data, two of the more obvious ways to improve our build times might be:
According to Will’s build charts, the E2E time for tests is often within our 30-minute target range. The exception is mochitest-other on debug builds, which often takes from 60 to 90 minutes. We could improve this situation somewhat by splitting mochitest-browser-chrome (the longest-running chunk of mochitest-other) into its own test job.
Additionally, wait times for test slaves running android and win 7 tests is sometimes non-trivial; see e.g. the details for commit 97216ae0fc04. We should try to understand why this happens; the graph of test wait times doesn’t show a clear trend, other than highlighting the fact that wait times for windows and android are usually worse than the other os’s.