I've been burnt by performance gates on GitHub Actions. One random timing spike ...

I've been burnt by performance gates on GitHub Actions. One random timing spike and the whole PR turns red. The coefficient of variation math here nails why: GitHub Actions shows a 2.66% CV, which means a 2% performance gate gives you a 45% false positive rate (basically every other run flags a fake regression). No wonder developers stop trusting the check. In my experience the only way to make benchmarks actionable is to run them on deterministic bare-metal runners, whether CodSpeed's or something you host yourself.