In case anyone finds this interesting (as it is so glaringly apropos to this "Buy Now" blog post): I ran an A/B test recently of "Buy Now" vs "Purchase" (which is a drastically different and more subtle question than comparing whether people want to install a demo vs. make a purchase), and found almost no difference.
Unfortunately, due to the usage of a multi-armed bandit algorithm (which attempts to "exploit" already learned knowledge to not lose sales during the test), my data is somewhat "skewed" (in that I have an order of magnitude more tests for one of the hypotheses), but here are the raw results:
"Purchase": 35,715 sales from 3,255,882 impressions (1.097%)
"Buy Now": 4,042 sales from 376,227 impressions (1.074%)
If you compare the confidence interfaces with a Beta distribution it is difficult to feel comfortable claiming Purchase is a winner, but that small benefit is why the algorithm kept trying to use it over the other case. Put differently: despite the large sample, I believe that tiny difference is not statistically significant.
(Additionally, for completeness, and as this is important for anyone who might care about this experiment: my app had previously said "Purchase", so there are likely guides online that tell the user how to buy things, or people may have had memories, which may have caused "Buy Now" to be ever so slightly more confusing.)
Unfortunately, due to the usage of a multi-armed bandit algorithm (which attempts to "exploit" already learned knowledge to not lose sales during the test), my data is somewhat "skewed" (in that I have an order of magnitude more tests for one of the hypotheses), but here are the raw results:
"Purchase": 35,715 sales from 3,255,882 impressions (1.097%)
"Buy Now": 4,042 sales from 376,227 impressions (1.074%)
If you compare the confidence interfaces with a Beta distribution it is difficult to feel comfortable claiming Purchase is a winner, but that small benefit is why the algorithm kept trying to use it over the other case. Put differently: despite the large sample, I believe that tiny difference is not statistically significant.
(Additionally, for completeness, and as this is important for anyone who might care about this experiment: my app had previously said "Purchase", so there are likely guides online that tell the user how to buy things, or people may have had memories, which may have caused "Buy Now" to be ever so slightly more confusing.)