Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Other, related points:

- With heavy tails, the sample mean (i.e. the number you can see) is very likely to underestimate the population mean.

- With heavy enough tails, higher moments like variance (and therefore standard deviation) do not exist at all -- they're infinite.

- Critically: With heavy tails, the central limit theorem breaks down. Sums of heavy-tailed samples converge to a normal distribution so slowly it might not realistically ever happen with your finite data. Any computation you do that explicitly or implicitly relies on the CLT will give you junk results!



Can you elaborate on the part about sample vs population mean?

The way I see it; in these scenarios you aren’t looking at the sample mean. There is no reason to sample your customer base to get an estimation of your average revenue. You can calculate from the entire population.


Your current customer base is just a sample of your total market, for example. Next year’s (larger) customer base will be slightly closer to the whole set (so different) but still just a sample.


So the point is that the current customer base is a skewed sample of the theoretical customer population.

Therefore we shouldn’t look at the current customer mean profit to predict what would happen to profits if we doubled the customer base.


Exactly. These summaries are often used for prediction, which means historic data is used as a sample of the same distribution as future data.

Even when comparing two different historical data sets you have to be careful: if you're doing anything that resembles hypothesis testing (i.e. trying to figure out if something you changed made a difference) you're not really comparing two historical data sets -- you're trying to compare the underlying distributions from which the historical data sets were drawn, but hoping that the historical data are representative samples from those.


It's not about the doubling, it's about how the distribution changes as you gather more data and get closer to the actual mean (e.g. if you had EVERY possible customer ever, like cable companies).

Take Tesla, for example. Its first customers were enthusiasts, but the mean of that group is not the mean of all current Tesla customers - they would look very different, and had Tesla optimized for the latter we'd be looking at a very different outcome.


With heavy enough tails even the mean might not exist.


Every non-empty set of numbers has a mean.

Perhaps you mean that, in a multimodal distribution, the mean might not resemble any individual member of the population?


A finite set of numbers has a mean, yes. A probability distribution with heavy enough tails does not have a mean (e.g. the Cauchy distribution).


Sure, but let's not scope creep the conversation. This is in the context of talking about summary stats calculated against sets of discrete observations, not properties of abstract probability distributions. So we're talking about doing some basic arithmetic on a list of numbers, not taking integrals.


If your data comes from a probability distribution that doesn't have a mean then calculating the mean of your data is basically meaningless (no pun intended). In a Cauchy distribution, for example, the mean of a million datapoints is equally likely to be a thousand units off from the true center of the distribution as any individual datapoint is.

This is not scope creep. This is not an abstract academic concern. I've actually seen people run into this in practice--"I quadrupled the size my dataset, why is my sample mean still wildly inaccurate?"

If you're not aware of the properties of abstract probability distributions then your basic arithmetic on a list of numbers may well be completely useless.


"If your data comes from a probability distribution that doesn't have a mean then calculating the mean of your data is basically meaningless (no pun intended)."

I think this is actually the major takeaway for this sort of discussion. All statistical measures carry with them assumptions about the underlying distribution. When you blindly use one without verifying the underlying distribution, you are asking to be lied to by your statistics.

It took me a while, but I've trained myself to actively ask the question "is this appropriate for the underlying distribution" whenever I see a "mean" or a "standard deviation". Spoiler: The answer is usually "no"! Central limit theorem notwithstanding, we encounter a lot of non-Gaussian and outright pathological distributions in the real world. Average is useful for more than just Gaussian, but it means different things in different distributions, most of them quite unlike what our Gaussian-trained intuition suggests. Standard deviation is really Gaussian-only, or at least, if you insist on using it, you ought to pair it with the skew, kurtosis, or other measures of how non-Gaussian your distribution is. Remembering that there's only one way to be Gaussian but an infinity of ways to be non-gaussian is helpful too. This can be a helpful video to visualize that: https://www.youtube.com/watch?v=iwzzv1biHv8


No, the point is that for certain sets of numbers, calculating the mean does not give you a summary statistic. It just gives the average of a bunch of numbers, not a number that is representative of your population.


This is why using Gaussian models to predict something that's fundamentally fat-tailed (e.g. IQ vs. many forms of societal achievemnt) is doomed to failure.


To add to the arguments already presented, it's not like you never encounter practical scenarios where a mean does not exist.

A good example would be earthquakes. The commonly used Richter law [1] suggests that the number of earthquakes decreases exponentially with the magnitude, while the intensity increases exponentially with the magnitude. In seismically active regions the frequency and intensity even appear to be inversely proportional. This results in a model where there is no average intensity. Sure this model is bound to break down eventually, but until the first earthquake that quite literally breaks the mould we likely won't know what the upper bound may be.

[1]:https://en.wikipedia.org/wiki/Gutenberg%E2%80%93Richter_law


An example of your infinite variance case, sums of cauchy distributions don't even converge to a normal distribution.




Consider applying for YC's Winter 2026 batch! Applications are open till Nov 10

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: