Hacker News new | past | comments | ask | show | jobs | submit login

In this context "stable" means the thing it means in statistical process control, i.e. the operational definition of no measurements outside of 2.66 times the mean consecutive difference between observations.

It is a problem -- particularly for software -- that SPC tools do not work with subexponential distributions, but it's separate from the observation that when SPC determines that a process is stable, rougly half of measurements will lie above the average.




To be fair to OP, Wheeler never claims that for stable/in-control/predictable processes roughly half of the measurements will lie above the average. The only claim he makes is that 97% of all data points for a stable process (assuming the process draws from a J-curve or single-mound distribution) will fall between the limit lines.

He can't make this claim (about ~half falling above/below the average line), because one of the core arguments he makes is that XmR charts are usable even when you're not dealing with normal distributions. He argues that the intuition behind how they work is that they detect the presence of more than one probability distribution in the variation of a time series.

Some links below:

Arguments for non-normality:

https://spcpress.com/pdf/DJW220.pdf

https://www.spcpress.com/pdf/DJW354.Sep.19.The%20Normality-M...

Claim of homogeneity detection:

https://www.spcpress.com/pdf/DJW204.pdf


I don't have the stats-fu to back it up but I would be very surprised if someone could point to a process where XmR charts are useful, but where the mean is not within 10–20 percentiles of the median.


> the operational definition of no measurements outside of 2.66 times the mean consecutive difference between observations

Not even a simple Gaussian distribution can hold up to this standard of "stability" (unless I understood incorrectly what you mean here):

> data <- rnorm(1000) # i.i.d. normal data

> mcd <- 2.66*mean(abs(diff(data))) # mean consecutive difference * 2.66

> sum(as.numeric(abs(data) > mcd))/length(data) # fraction of bad points

[1] 0.002

Unless you are willing to add additional conditions (e.g., symmetry), I still don't see how criteria that pertain to variance and kurtosis (e.g., "the operational definition of no measurements outside of 2.66 times the mean consecutive difference between observations") can imply any strong relationship between the (sample) arithmetic mean (or any other mean) and the (population) median.

In fact, even distributions for which the "arithmetic mean is approximately equal to the median" claim is roughly correct will almost certainly not display the same property when you use some other mean (e.g., geometric or harmonic mean).

Either way, if you have some reference that supports the stated claim, I will be very happy to take a look at it (and educate myself in the process).




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: