Why would you expect two different virtual machines to have identical performanc...

throwdbaaway · on May 21, 2021

I didn't expect identical performance, but a 10~20% variance is just too big. For example, if https://www.cockroachlabs.com/guides/2021-cloud-report/ got a "slow" GCP virtual machine but a "fast" azure virtual machine, the final result could totally flip.

The more problematic scenario, as mentioned in the article, is when you need to do some sort of performance tuning that can take weeks/months to complete. On the cloud, you either have to keep the virtual machine running all the time (and hope that a live migration doesn't happen behind the scene to move it to a different physical host), and do the painful stop/start until you get back the "right" virtual machine before proceeding to do the actual work.

We discovered this variance a couple of months ago. And this article from talawah.io is actually the first time I have seen anyone else mentioning about it. It still remains a mystery, because we too can't figure out what contributes to the variance using tools like stress-ng, but the variance is real when looking at MySQL commits/s metric.

> If you need more than 5% accuracy for a benchmark, you absolutely have to use dedicated hosts.

After this ordeal, I am arriving at that conclusion as well. Just the perfect excuse to build a couple of ryzen boxes.

jiggawatts · on May 21, 2021

This is a bit like someone being mystified that their arrival time at a destination across the city is not repeatable to within plus-minus a minute.

There are traffic lights on the way! Other cars! Weather! Etc...

I've heard that Google's internal servers (not GCP!) use special features of the Intel Xeon processors to logically partition the CPU caches. This enables non-prod workloads to coexist with prod workloads with a minimal risk of cache trashing of the prod workload. IBM mainframes go further, splitting at the hardware level, with dedicated expansion slots and the like.

You can't reasonably expect 4-core virtual machines to behave identically to within 5% on a shared platform! That tiny little VM is probably shoulder-to-shoulder with 6 or 7 other tenants on a 28 or 32 core processor. The host itself is likely dual-socket, and some other VMs sizes may be present, so up to 60 other VMs running on the same host. All sharing memory, network, disk, etc...

The original article was also a network test. Shared fabrics aren't going to return 100% consistent results either. For that, you'd need a crossover cable.

throwdbaaway · on May 21, 2021

Well, I'll be the first one to admit that I was naive to expect <5% variance prior to this experience. But I guess you are going to far by framing this as a common wisdom?

In the HN discussion about cockroachdb cloud report 2021 (https://news.ycombinator.com/item?id=25811532), there was only 1 comment thread that talks about "cloud weather".

In https://engineering.mongodb.com/post/reducing-variability-in..., high profile engineers still claimed that it is perfectly fine to use cloud for performance testing, and "EC2 instances are neither good nor bad".

Of course, both the cockroachdb and mongodb cases could be related, as any performance variance at the instance level could be masked when the instances form a cluster, and the workload can be served by any node within the cluster.

jiggawatts · on May 21, 2021

You do have a point. I also have seen many benchmarks use cloud instances without any disclaimers, and it always made me raise an eyebrow quizzically.

Any such benchmark I do is averaged over a few instances in several availability zones. I also benchmark specifically in the local region that I will be deploying production to. They're not all the same!

Where the cloud is useful for benchmarking is that it's possible to spin up a wide range of "scenarios" at low cost. Want to run a series of tests ranging from 1 to 100 cores in a single box? You can! That's very useful for many kinds of multi-threaded development.