Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Hey I'm the original author!

How much data would you need for confidence? According to my calculations this is 98% confidence. That feels like 'enough' to make a decision for my small startup.

I'm pretty explicit in the blog post that this isn't meant to be universally applicable— it's just what happened to us.



First of all, kudos for quantifying your results instead of hand waving them. Yes, your results look like a ~60% improvement in conversion rate from the A to the B test, with a p value of 0.02 and a statistical power of around 80% for a two-tailed test. So that's good.

However context is important - at this level of significance you'd expect to see a similarly strong, but ultimately spurious, effect going from the A to the B test about 1 in 50 times.

Since you're not working on something safety critical, that's probably an acceptable false positive rate for you. But generally speaking, and in particular here since the absolute numbers and changes are quite small, I would be wary of trusting such a result. It seems promising but inconclusive. Maybe run a few more tests with disjoint (or nearly so) samples of visitors?

There are a few other things that could possibly confound the result - off the top of my head, your screenshots look like different pages between the A and B test. I'm not sure if that's how you ran the experiment or if you just happened to use two different page screenshots, but that would typically disqualify the result and require another test.


The way I'm seeing it is sure the error bars are huge. But it's very unlikely to be a regression. And team likes it better.


> screenshots look like different pages between the A and B test

I was also wondering about that


Thanks for sharing. A couple things:

1) Why did the two groups have such different N sizes? If it was intended to be run as a 50-50, a large delta would make me wonder if there was an exposure bias

2) For the baseline rate (0.4%), this test is underpowered for even a 50% change, meaning you will have a high false discovery rate


I'm somewhat of a layman, but I'd wager A/B pages printed 50:50 (by IP for instance) could lead to a rather solid conclusion if ran long enough. On the other hand, eh, chat bubbles suck and you can quite confidently say they don't help, so might as well keep it this way. On a personal note I do feel like I would be much more prone to click a chat request as another menu option than a bubble.




Consider applying for YC's Winter 2026 batch! Applications are open till Nov 10

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: