More

danmccorm · on July 18, 2017

Nice post. How do you customize the Damerau-Levenshtein algorithm? Did you write a new from scratch?

gmossessian · on July 18, 2017

Hi, and thanks! Well, there's no need to re-invent the wheel, and there already exist fast implementations of DL distance. However, Damerau-Levenshtein distance is just one piece of our string comparison evaluation system that is built in-house and is constantly under development and improvement.

For example, another important aspect of the comparison metrics is our in-house phonetics library that we've built to be sensitive to vowel context, syllabification, diphthongs, stemming and lemmatization, and other language phenomena, and we are fleshing it out to handle other languages including some Eastern European and CJK.

danmccorm · on Dec 1, 2015

Great article. Designing a/b systems always seems (relatively) simple at the start, but in my experience there are 1,000 things you don't think of until you have massive amounts of worthless results. Add this to the list of things to watch out for.

btilly · on Dec 1, 2015

My experience is the opposite. Spend time simplifying your thinking, and it stays simple. But it is very, very easy to start overthinking things and then you go down a rabbit hole.

Consider this for an example. If you're testing per session behavior, then you can just use a session cookie. If you're testing logged in behavior, you can use the login id. You've just covered most of the things you want to test.

When you start worrying about cross-device both logged in and not, then you have a world of pain. So treat it as an identity problem, throw away all of the users you find questionable, and work with that. And yes, this is a pain, which is why you do it as seldom as you can!

yummyfajitas · on Dec 1, 2015

So treat it as an identity problem, throw away all of the users you find questionable,...

And if questionability is correlated with the thing you are trying to measure, you've just added bias. For example, consider trying to measure engagement or something correlated with it. Are users who connect to your site from 3 different devices more or less engaged than normal? Great - you just threw out your most engaged users.

Similarly, you can't just use a session cookie to test per-session behavior. This introduces correlations between sessions, which violates the IID assumption in all the standard statistical tests.

https://www.chrisstucchio.com/blog/2015/no_free_samples.html

You can fix this if you want by using the weakly mixing central limit theorem or just explicitly putting the mixing into a Bayesian analysis. But that's probably a lot trickier than just using a long term cookie.

btilly · on Dec 3, 2015

You have to know the limitations of the approach you are using.

Also about session cookies, there is no correlation created if the A/B test behavior is tied to the session. The downside is that different users get different behaviors on different days. This may be a bad user experience. The upside is that it is quick and simple for things like landing pages.

In the end there is no solution that avoids actually understanding what your data really says.

yummyfajitas · on Dec 3, 2015

There is absolutely correlation between sessions. If visitor 1 (corresponding to sessions 1,2,3) has a high conversion probability, while visitor 2 (corresponding to sessions 4,5,6) has a low conversion probability, then you've introduced correlation between sessions 1,2,3 and sessions 4,5,6. This breaks the CLT and all the usual independence assumptions.

If most of your visitors only have one session this may not matter...but then again with only session cookies you don't even have a way to know this.

danmccorm · on Oct 14, 2015

I think using them together is crucial. Qualitative research gives you insight into what you should look for through quantitative research. It's important to start with some basic qualitative research (talking to a few customers) and then scale it through quantitative methods (analyzing logs).

curuinor · on Oct 14, 2015

i liked poking about think-aloud protocols (https://en.wikipedia.org/wiki/Think_aloud_protocol) - jeff shrager always said to wait until they're swearing while they're doing whatever it is you're having them do and then you know you've started doing a think-aloud protocol properly.

danmccorm · on Oct 14, 2015

Cool beans! I like that.

danmccorm · on Feb 6, 2015

Phabricator's awesome, but this list is more for hosted services.

danmccorm · on Aug 13, 2014

I like the part about arming others to help you. I've found this to be critical. The more your job specs reflect the excitement of working at your company, the more other people can evangelize that for you.

danmccorm · on Aug 1, 2014

Shutterstock - New York, San Francisco, Remote, Visa

We're hiring all sorts of software engineers and data scientists. We've got some pretty fun problems -- image search, video search, storage scalability, tons of behavioral data to mine -- and an awesome team. We prefer folks to work in one of our offices, but are always willing to consider remote superstars.

Take a peek at http://www.shutterstock.com/jobs

elliotf · on Aug 1, 2014

I'm a shutterstock employee and one of the interesting things is that there are a variety of technologies in use at shutterstock: node.js (my team), ruby, perl, java ...

freeramonisgood · on Aug 6, 2014

Are you guys open to international remote workers as well, or only US citizens?

danmccorm · on July 1, 2014

Shutterstock - New York, San Francisco, Berlin, Remote, Visa

We're hiring all sorts of software engineers and data scientists. We've got some pretty fun problems -- image search, video search, storage scalability, tons of behavioral data to mine -- and an awesome team. We prefer folks to work in one of our offices, but are always willing to consider remote superstars.

Take a peek at http://www.shutterstock.com/jobs

bambax · on July 1, 2014

> New York, San Francisco, Berlin

There are exactly zero engineering positions outside of the US on http://www.shutterstock.com/jobs/listings

It should read: "New York, San Francisco, Seattle (but mostly New York)".

laxatives · on July 1, 2014

I'd like to apply but your website is telling me that my pdf resume file type is not supported. Can I email you?

danmccorm · on June 1, 2014

Shutterstock - New York, San Francisco, Berlin, Remote

We're hiring all sorts of software engineers and data scientists. We've got some pretty fun problems -- image search, video search, storage scalability, tons of behavioral data to mine -- and an awesome team. We prefer folks to work in one of our offices, but are always willing to consider remote superstars.

Take a peek at http://www.shutterstock.com/jobs

danmccorm · on May 1, 2014

Shutterstock - New York, San Francisco, Berlin, Remote

We're hiring all sorts of software engineers and data scientists. We've got some pretty fun problems -- image search, video search, storage scalability -- and an awesome team.

Take a peek at http://www.shutterstock.com/jobs

canadiancreed · on May 13, 2014

Was having a lookover of your available positions, but none of them stated remote. Are all of them possibility of remote location?

danmccorm · on Dec 12, 2013

Spotify has a great article on this, if you haven't seen it: http://ucvox.files.wordpress.com/2012/11/113617905-scaling-a...

nrs26 · on Dec 15, 2013

Thanks for sharing!