Assume up front none of your measured latencies from a software networked system will be Gaussian, or <exaggereation> you will die a painful death </exaggeration>. Even ping times over the internet have no mean. The only good thing about means is you can combine them easily, but since they are probably a mathematical fiction, combining them is even worse. Use T-Digest or one of the other algorithms being highlighted here.
This is why I try to plot a proper graph of times from any "optimization" I see in a PR. Too many times I see people making this assumption for example, and even if they're right they usually forget to take the width of the gaussian into account (i.e. wow your speedup is 5% of a standard deviation!)
yep, have made that mistake before. even turned in a write-up for a measurement project in a graduate level systems course that reported network performance dependent measurements with means over trials with error bars from standard deviations.
sadly, the instructor just gave it an A and moved on. (that said, the amount of work that went into a single semester project was a bit herculean, even if i do say so myself)
or put another way, if you can't model it, you're going to have to sort, or estimate a sort, because that's all that's really left to do.
this shows up in things from estimating centers with means/percentiles to doing statistical tests with things like the wilcoxon tests.