Fantastic read. My only concern is that there wasn't any talk around cost of fal...

halbersa · on July 12, 2020

One of the authors here, first off thanks!

Yes a regression slipping through would far outweigh the benefits of reduced tests. The thing the post didn't make very clear is that thanks to our integration branch, the chance of a missed regression is still nearly zero. If the scheduling algorithm misses something, the failure will show up on a "backstop" push. These are pushes where we run everything, and then a human code sheriff will inspect any failures, and if something was missed figure out what caused it and back it out.

So the costs of missed regressions are: 1) More strain on the sheriffs (too much strain means the need to hire more) 2) More backouts which is annoying to developers and can mess up annotation (though we have ideas to fix the latter).

For the record, the algorithm with the 70% reduction in tests has a regression rate almost on par with the baseline (it's ~3-4% lower). This hasn't seemed to result in much additional strain on the pipeline.

jeffbee · on July 11, 2020

There isn't any discussion of the cost at all. It just says the test run rate is down by 70%, it doesn't say anything about the defect detection rate, even though they say this is their cost function.

10 core-years per day sounds like a lot but it's only about a 10kW load, and they've saved 70% of that, or about $20 of opex per day.

halbersa · on July 12, 2020

One of the authors here, I can't exactly deny that line was added to sound impressive, so guilty as charged. However the savings are much higher than $20/day for a few reasons:

* Many tasks run on expensive instances (hardware acceleration, Windows)

* We have OSX/Android pools that run on physical devices in a data centre (these are an order of magnitude more expensive than Linux)

* There are ancillary costs. For example each task generates artifacts which incur storage costs. These artifacts are downloaded which incur transfer costs.

* There are also overhead costs (idle time, rebooting, etc) that aren't counted in the 10 years / day stat.

All these things see a corresponding decrease in costs with fewer tasks.

dmurray · on July 11, 2020

Is that really all? That would be 3650 cores running full time. 3W per core sounds too little for power consumption. And do power costs really dominate the price of running CPUs? I'm guessing the savings here are at least one order of magnitude more than your $20/day.

I get about $1000/day based on some EC2 prices for typical machines I've used, though I'm sure Mozilla's requirements are different and they can negotiate better prices than I can.

jeffbee · on July 12, 2020

I probably missed a few factors, but I just hate a blog post that uses big-sounding numbers when they aren't big.

bonoboTP · on July 12, 2020

Big for who? Hundreds of machines running constantly is big for me.

mlthoughts2018 · on July 11, 2020

> “ The cost of a bug slipping through because a test being skipped will be higher than running an irrelevant test to a commit.”

It really depends on the type of bug, and perhaps this could be factored into the model by also correlating change sets with outage severity or complexity of a fix.

sfink · on July 12, 2020

"A bug slipping through" in this case just means slipping through to where it's detected on a later push to the integration branch, or failing that, when a more complete set of tests runs when the change is merged into the main branch. In no case will poor scheduling here result in a bug making it into the final product. It's just that it's more costly in human time to detect it later, so currently the entire goal is set at detecting the problem on the first round of testing after a push.

im3w1l · on July 11, 2020

They talk about reducing on-commit test-runs. I'd assume they all run pre-release.

weaksauce · on July 12, 2020

they have a try server that developers can push to to run a swath of tests before bringing into the integration branch. outsiders can access that by being vouched for by a developer in mozilla and insiders obviously have access to it already. having used it as an outsider it's kind of a pain to use with a lot of setup and options. so having something like `mach try auto` would be awesome for outside devs in addition to the reduce server costs.