Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

My experience is the opposite. Spend time simplifying your thinking, and it stays simple. But it is very, very easy to start overthinking things and then you go down a rabbit hole.

Consider this for an example. If you're testing per session behavior, then you can just use a session cookie. If you're testing logged in behavior, you can use the login id. You've just covered most of the things you want to test.

When you start worrying about cross-device both logged in and not, then you have a world of pain. So treat it as an identity problem, throw away all of the users you find questionable, and work with that. And yes, this is a pain, which is why you do it as seldom as you can!



So treat it as an identity problem, throw away all of the users you find questionable,...

And if questionability is correlated with the thing you are trying to measure, you've just added bias. For example, consider trying to measure engagement or something correlated with it. Are users who connect to your site from 3 different devices more or less engaged than normal? Great - you just threw out your most engaged users.

Similarly, you can't just use a session cookie to test per-session behavior. This introduces correlations between sessions, which violates the IID assumption in all the standard statistical tests.

https://www.chrisstucchio.com/blog/2015/no_free_samples.html

You can fix this if you want by using the weakly mixing central limit theorem or just explicitly putting the mixing into a Bayesian analysis. But that's probably a lot trickier than just using a long term cookie.


You have to know the limitations of the approach you are using.

Also about session cookies, there is no correlation created if the A/B test behavior is tied to the session. The downside is that different users get different behaviors on different days. This may be a bad user experience. The upside is that it is quick and simple for things like landing pages.

In the end there is no solution that avoids actually understanding what your data really says.


There is absolutely correlation between sessions. If visitor 1 (corresponding to sessions 1,2,3) has a high conversion probability, while visitor 2 (corresponding to sessions 4,5,6) has a low conversion probability, then you've introduced correlation between sessions 1,2,3 and sessions 4,5,6. This breaks the CLT and all the usual independence assumptions.

If most of your visitors only have one session this may not matter...but then again with only session cookies you don't even have a way to know this.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: