You have just pattern-matched to a response without understanding. And you're wr...

in_cahoots · on Sept 20, 2016

Maybe I am not understanding your post, but aren't you just declaring a winner after N trials, even if that is not significant? That seems to be a critical distinction here.

btilly · on Sept 20, 2016

Right. We are trying to make a business decision, not conduct science.

If you go with whichever is ahead, you'll reliably produce correct answers for wins at that you can't reliably measure. Yes, there will be mistakes, but the mistakes are guaranteed to be pretty small.

If you insist on a higher standard, you'll learn which choices you were confident of..but now need to figure out what to do when the statistics didn't give you a final answer.

I think that the first choice is more useful. Evan prefers to clearly distinguish your coin flips from solid results. But in the end you have to make a decision, and it isn't material what decision you make if the conversion rates are close.

in_cahoots · on Sept 20, 2016

I'm sorry, I wouldn't call that framework A/B testing also much as running two experiments and seeing who's ahead after some arbitrary period of time.

btilly · on Sept 20, 2016

Well, that is what A/B testing is. With a decision rule to try to make that determination in some sensible.

Usually the decision rule is stated in terms of a statistical confidence test. Usually the confidence test is done poorly enough that it doesn't mean what people think it does.

And the stopping procedure isn't actually very arbitrary. You choose N based on the most data that you're willing to collect for this experiment. And stop when you're confident about which version will be ahead at that point.

So this procedure leaves you confident of having the best answer that you will get from the most extensive test that your organization is willing to commit to running. And the cost to the organization of running it is capped at sqrt(N) lost conversions.

rawnlq · on Sept 20, 2016

I think this post summarizes it really well: http://bjk5.com/post/12829339471/ab-testing-still-works-sarc...

If you have any sort of decent dashboarding the cost of a wrong decision is really not all that bad compared to the cost of being a purist.

btilly · on Sept 20, 2016

Oh boy. Let's replace statistics with guesses off of graphs!

That visible "zone of stupidity" is based on how long it takes to make a conversion which has everything to do with how your product works, and nothing to do with how statistics works. There is absolutely no difference between the graph you expect that leads to an accidental wrong decision and one that detects a correct difference - the patterns that you think you see don't mean what your brain will decide they do.

And more importantly, if you stop in 1/4 of the time that you would have been willing to run the test, the potential loss when you are wrong is twice the worst errors that you could make if you put more effort into it.

Have you ever been in an organization that rolled out an innocuous looking change that killed 15% of the business? I have. Over the 10 months it took to find the offending subject line, there was, shall we say, "significant turnover in the executive team".

Math exists for a reason. Either learn it, or believe people who have learned it. Don't substitute pretty pictures that you don't understand, then call it an explanation.

rawnlq · on Sept 20, 2016

I admit I don't know the math well so I am curious to know how to fix my intuition:

Let's say you want to figure out the unknown bias on two coins. You flip both continuously and plot the percentage of heads you see. Due to law of large numbers these percentages will eventually converge to true probabilities (which is how I am interpreting the graphs in that blog post).

The bad case is if the two coins are actually "flatlined" in the wrong order so as a pattern matching human you mistakenly believe the rates have converged prematurely. I don't know how to work out the math on this but let's say a "flatline" is visually 100 points or so with no significant slope. Then this should be pretty rare right?

btilly · on Sept 21, 2016

Don't try to do this by visual pattern recognition. Do math. There are plenty of statistical tests that you can use, use them. Any of them is better than looking at a graph and guessing from the shape.

If you want to try to understand what is going on, learn the Central Limit Theorem. That will let you know how fast the convergence is to the laws of large numbers. (There are two, the strong and the weak.)

geoelectric · on Sept 20, 2016

I take it you're the person who wrote the original to which your linked article responds?

btilly · on Sept 20, 2016

Yes.

It took a lot of work to get down to the super simple version. :-)