This is a great book. I read it a couple years ago and I remember a couple takea...

capnrefsmmat · on Jan 31, 2019

Thanks, I'm glad you enjoyed the book! (Author here -- the website got its first publicity here on HN.)

Regarding AB testing, you might be interested in this recent research, which uses real data from Optimizely to estimate how often people get AB test false positives because they stopped as soon as they hit significance: https://ssrn.com/abstract=3204791

> Specifically, about 73% of experimenters stop the experiment just when a positive effect reaches 90% confidence. Also, approximately 75% of the effects are truly null. Improper optional stopping increases the false discovery rate (FDR) from 33% to 40% among experiments p-hacked at 90% confidence

montecarl · on Jan 31, 2019

While it may be possible take the frequentist approach to AB testing, Bayesian inference is becoming the way to go with this.[1] Instead of directly setting up a yes-no hypothesis test with the nearly impossible to use correctly p-values[2], Bayesian approaches aim to directly estimate whatever quantity you want. With the Bayesian approach, you get estimates for A, B, and A-B (or whatever combination you want, e.g. (A-B)/A). Each of those estimates are properly called posterior probability distributions and describe the range of possible values. The end result is that instead of saying "A is better than B (p < 0.05)" you get a probability distribution of A-B. From that probability distribution you can answer any question you want: the most likely difference between A and B (the average), the probability that A is better than B (just integrate the area above 0), or whatever is needed to make a decision.

[1] https://conversionxl.com/blog/bayesian-frequentist-ab-testin...

[2] https://www.amstat.org/asa/files/pdfs/p-valuestatement.pdf

srean · on Jan 31, 2019

> Monitoring tests on an ongoing basis and then calling them as soon as they hit some confidence threshold (like 95%) will give you biased results ...

The Bayesian method doesn't really solves this as much it answers a fundamentally different question -- modeling how your personal belief changes. These typically will not have the interpretation as normalized long term frequency. As long as the Bayesian posteriors are not interpreted as frequentist probabilities, they are perfectly acceptable.

That said, this 'peeking' problem can be easily resolved in the frequentist setting and this is well known in stats, probability and hopefully ML literature. The core results are really old. In fact this was classified information during world war II. If you are interested, search for sequential hypothesis test. They are actually more efficient than their batch cousins.

Bayesian vs Frequentist is an orthogonal axis from sequential/online vs batch. Think of a 2 X 2 box, you can choose to be in any quadrant you want.

Legogris · on Jan 31, 2019

Hi, I didn't see anything on the page about "intended audience" - would you say this is appropriate for someone who has done the basic classes of statistics in uni but is now pretty rusty or would you need a more solid foundation to be able to grasp the content fully?