Hi, and thanks! Well, there's no need to re-invent the wheel, and there already exist fast implementations of DL distance. However, Damerau-Levenshtein distance is just one piece of our string comparison evaluation system that is built in-house and is constantly under development and improvement.
For example, another important aspect of the comparison metrics is our in-house phonetics library that we've built to be sensitive to vowel context, syllabification, diphthongs, stemming and lemmatization, and other language phenomena, and we are fleshing it out to handle other languages including some Eastern European and CJK.
Great article. Designing a/b systems always seems (relatively) simple at the start, but in my experience there are 1,000 things you don't think of until you have massive amounts of worthless results. Add this to the list of things to watch out for.
My experience is the opposite. Spend time simplifying your thinking, and it stays simple. But it is very, very easy to start overthinking things and then you go down a rabbit hole.
Consider this for an example. If you're testing per session behavior, then you can just use a session cookie. If you're testing logged in behavior, you can use the login id. You've just covered most of the things you want to test.
When you start worrying about cross-device both logged in and not, then you have a world of pain. So treat it as an identity problem, throw away all of the users you find questionable, and work with that. And yes, this is a pain, which is why you do it as seldom as you can!
So treat it as an identity problem, throw away all of the users you find questionable,...
And if questionability is correlated with the thing you are trying to measure, you've just added bias. For example, consider trying to measure engagement or something correlated with it. Are users who connect to your site from 3 different devices more or less engaged than normal? Great - you just threw out your most engaged users.
Similarly, you can't just use a session cookie to test per-session behavior. This introduces correlations between sessions, which violates the IID assumption in all the standard statistical tests.
You can fix this if you want by using the weakly mixing central limit theorem or just explicitly putting the mixing into a Bayesian analysis. But that's probably a lot trickier than just using a long term cookie.
You have to know the limitations of the approach you are using.
Also about session cookies, there is no correlation created if the A/B test behavior is tied to the session. The downside is that different users get different behaviors on different days. This may be a bad user experience. The upside is that it is quick and simple for things like landing pages.
In the end there is no solution that avoids actually understanding what your data really says.
There is absolutely correlation between sessions. If visitor 1 (corresponding to sessions 1,2,3) has a high conversion probability, while visitor 2 (corresponding to sessions 4,5,6) has a low conversion probability, then you've introduced correlation between sessions 1,2,3 and sessions 4,5,6. This breaks the CLT and all the usual independence assumptions.
If most of your visitors only have one session this may not matter...but then again with only session cookies you don't even have a way to know this.
I think using them together is crucial. Qualitative research gives you insight into what you should look for through quantitative research. It's important to start with some basic qualitative research (talking to a few customers) and then scale it through quantitative methods (analyzing logs).
i liked poking about think-aloud protocols (https://en.wikipedia.org/wiki/Think_aloud_protocol) - jeff shrager always said to wait until they're swearing while they're doing whatever it is you're having them do and then you know you've started doing a think-aloud protocol properly.
I like the part about arming others to help you. I've found this to be critical. The more your job specs reflect the excitement of working at your company, the more other people can evangelize that for you.
Shutterstock - New York, San Francisco, Remote, Visa
We're hiring all sorts of software engineers and data scientists. We've got some pretty fun problems -- image search, video search, storage scalability, tons of behavioral data to mine -- and an awesome team. We prefer folks to work in one of our offices, but are always willing to consider remote superstars.
I'm a shutterstock employee and one of the interesting things is that there are a variety of technologies in use at shutterstock: node.js (my team), ruby, perl, java ...
Shutterstock - New York, San Francisco, Berlin, Remote, Visa
We're hiring all sorts of software engineers and data scientists. We've got some pretty fun problems -- image search, video search, storage scalability, tons of behavioral data to mine -- and an awesome team. We prefer folks to work in one of our offices, but are always willing to consider remote superstars.
Shutterstock - New York, San Francisco, Berlin, Remote
We're hiring all sorts of software engineers and data scientists. We've got some pretty fun problems -- image search, video search, storage scalability, tons of behavioral data to mine -- and an awesome team. We prefer folks to work in one of our offices, but are always willing to consider remote superstars.
Shutterstock - New York, San Francisco, Berlin, Remote
We're hiring all sorts of software engineers and data scientists. We've got some pretty fun problems -- image search, video search, storage scalability -- and an awesome team.