I wonder how many people make this error while A/B testing their websites...

3pt14159 · on Nov 28, 2011

Nobody does since we have this thing called the G-test. We can make other errors, but this specific one isn't possible.

btilly · on Nov 28, 2011

I only wish it were so simple. I've presented correct statistical results for A/B tests only to have people try to argue me into accepting incorrect results that would follow from this logical error. If I knew less statistics, or was unwilling to argue with my boss, this error would have been made.

And another common variant of the problem happens when you're testing 10 variations. People want to do a pairwise test on the top and bottom right away, without realizing that, even if all are equal, the top and bottom frequently look different. Or the flip side of that error is that people see that the G-test says that there is a difference, and conclude that the current top one must be better than the current bottom one. Which is again incorrect.

There is a lot of subtlety, and just saying, "I have this statistical test that most people don't understand" is not really going to cut it.

3pt14159 · on Nov 29, 2011

Right, which is why I said "We can make other errors, but this specific one isn't possible" to the question: "I wonder how many people make this error while A/B testing their websites".

I'm familiar with the drawbacks of Taguchi methods and the subtle problems by changing distribution, and the problem of checking the G-test continuously and there-by reducing its effectiveness. But for a simple A/B test (and by that I mean challenger versus champion served randomly from the backend at a static distribution (50-50 through out the life of the test, say)), unless I need to hit the books again, this specific problem is not possible if everyone on board trusts the G-Test (the Yates correction on, etc).