Here's a great article about how Optimizely gets it wrong: http://dataorigami.ne...

emcq · on Jan 9, 2016

The article you mention names RichRelevance, but there are others who implement Thompson Sampling or other forms of Bayesian Bandits, such as SigOpt [0] and Dynamic Yield [1]. Various adtech companies also use it underneath the hood.

[0] https://sigopt.com/

[1] https://www.dynamicyield.com/

yummyfajitas · on Jan 10, 2016

Thompson Sampling is not a replacement for A/B tests. Unfortunately, the real world violates the assumptions of Thompson sampling virtually all the time.

https://www.chrisstucchio.com/blog/2015/dont_use_bandits.htm...

Bandit algorithms do have some important use cases (optimizing yield from a short lived advertisement, e.g. "Valentines Day Sale"), but they are not suitable for use as an A/B test replacement.

Also, I'd steer away from dynamic yield - I've found their descriptions of their statistics to take dangerous (i.e. totally wrong) shortcuts. For example, counting sessions instead of visitors as a way to avoid the delayed reaction problem and increase sample size (as well as completely breaking the IID assumption).

emcq · on Jan 11, 2016

I love your posts, and completely agree that Bayesian Bandits are not replacements for A/B tests.

To be fair though, the realworld issues like nonstationarity and delayed feedback are also concerns for A/B tests (which you also bring up in your great post), and you can tweak the bayesian bandits to handle these cases decently.

How does counting sessions instead of visitors avoid delayed feedback? I read your post [0] but dont remember anything about that. Is it just that they say that after a session is completed (which is somewhat nebulous to measure in many cases), then you have all the data you need from the visit?

[0] https://www.chrisstucchio.com/blog/2015/no_free_samples.html

yummyfajitas · on Jan 11, 2016

Absolutely agree you can tweak Thompson sampling to handle nonstationarity, periodicity and delayed feedback. I even published the math for one variation of nonstationarity: https://www.chrisstucchio.com/blog/2013/time_varying_convers...

(I've also dealt with delayed reactions, but I've never published it, and probably won't publish until I launch it.)

Dynamic yield has the delayed feedback problem because users might see a variation in session 1 but convert in session 2 (days later). They "solve" this by doing session level tracking instead of visitor level tracking - the delayed feedback is now only 20 minutes (same session) instead of days.

The problem is that session A and session B are now correlated since they are the same visitor. IID is now broken.

emcq · on Jan 11, 2016

Makes sense, thanks for the explanation!