Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

You have just pattern-matched to a response without understanding. And you're wrong.

See http://www.evanmiller.org/sequential-ab-testing.html for a very similar methodology presented by the very person you are citing. The difference is that I am aiming to always produce an answer from the test, even if it is a statistical coin flip, while he aims to only produce an answer roughly 5% of the time if there is no difference.



Maybe I am not understanding your post, but aren't you just declaring a winner after N trials, even if that is not significant? That seems to be a critical distinction here.


Right. We are trying to make a business decision, not conduct science.

If you go with whichever is ahead, you'll reliably produce correct answers for wins at that you can't reliably measure. Yes, there will be mistakes, but the mistakes are guaranteed to be pretty small.

If you insist on a higher standard, you'll learn which choices you were confident of..but now need to figure out what to do when the statistics didn't give you a final answer.

I think that the first choice is more useful. Evan prefers to clearly distinguish your coin flips from solid results. But in the end you have to make a decision, and it isn't material what decision you make if the conversion rates are close.


I'm sorry, I wouldn't call that framework A/B testing also much as running two experiments and seeing who's ahead after some arbitrary period of time.


Well, that is what A/B testing is. With a decision rule to try to make that determination in some sensible.

Usually the decision rule is stated in terms of a statistical confidence test. Usually the confidence test is done poorly enough that it doesn't mean what people think it does.

And the stopping procedure isn't actually very arbitrary. You choose N based on the most data that you're willing to collect for this experiment. And stop when you're confident about which version will be ahead at that point.

So this procedure leaves you confident of having the best answer that you will get from the most extensive test that your organization is willing to commit to running. And the cost to the organization of running it is capped at sqrt(N) lost conversions.


I think this post summarizes it really well: http://bjk5.com/post/12829339471/ab-testing-still-works-sarc...

If you have any sort of decent dashboarding the cost of a wrong decision is really not all that bad compared to the cost of being a purist.


Oh boy. Let's replace statistics with guesses off of graphs!

That visible "zone of stupidity" is based on how long it takes to make a conversion which has everything to do with how your product works, and nothing to do with how statistics works. There is absolutely no difference between the graph you expect that leads to an accidental wrong decision and one that detects a correct difference - the patterns that you think you see don't mean what your brain will decide they do.

And more importantly, if you stop in 1/4 of the time that you would have been willing to run the test, the potential loss when you are wrong is twice the worst errors that you could make if you put more effort into it.

Have you ever been in an organization that rolled out an innocuous looking change that killed 15% of the business? I have. Over the 10 months it took to find the offending subject line, there was, shall we say, "significant turnover in the executive team".

Math exists for a reason. Either learn it, or believe people who have learned it. Don't substitute pretty pictures that you don't understand, then call it an explanation.


I admit I don't know the math well so I am curious to know how to fix my intuition:

Let's say you want to figure out the unknown bias on two coins. You flip both continuously and plot the percentage of heads you see. Due to law of large numbers these percentages will eventually converge to true probabilities (which is how I am interpreting the graphs in that blog post).

The bad case is if the two coins are actually "flatlined" in the wrong order so as a pattern matching human you mistakenly believe the rates have converged prematurely. I don't know how to work out the math on this but let's say a "flatline" is visually 100 points or so with no significant slope. Then this should be pretty rare right?


Don't try to do this by visual pattern recognition. Do math. There are plenty of statistical tests that you can use, use them. Any of them is better than looking at a graph and guessing from the shape.

If you want to try to understand what is going on, learn the Central Limit Theorem. That will let you know how fast the convergence is to the laws of large numbers. (There are two, the strong and the weak.)


I take it you're the person who wrote the original to which your linked article responds?


Yes.

It took a lot of work to get down to the super simple version. :-)




Consider applying for YC's Fall 2025 batch! Applications are open till Aug 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: