Data Science Challenges at Instacart

minimaxir · on Feb 17, 2016

I'm all for data science, but I think this article romanticizes the field a bit too much (the random pictures of employees being thoughtful do not help).

Listing the problems Instacart has which can be solved through applied statistics is one thing. The other, more important thing, is how exactly data science works to solve problems, with technical detail, as opposed to data science being some mystical unexplained power. (Especially since the intended audience for this post is data scientists Instacart wants to hire and presumably are already knowledgable in data science)

jeremystan · on Feb 17, 2016

I find that great data scientists care a tremendous amount about the problems they tackle and the impact they have, and less so the specific methods used. But we will definitely write more in the future on details. Here I wanted to share the range of problems, how we organize (not commonly discussed) and what we look for in candidates. That can be useful to other startups looking to build data science teams.

randycupertino · on Feb 17, 2016

My friend works at Instacart, she says it totally sucks. Mainly because there's no guaranteed hourly pay and what she gets paid depends entirely on what the customers decide to give out as a tip. So for example sometimes she spends an hour grocery shopping for someone, then 30 minutes to drive to their house, and then the person can just arbitrarily decide to tip her $10. So she makes $10 for 90 minutes of work? That's below minimum wage. She says MOST people will tip $20 or $25 however not all. So all it takes is one cheapskate to not understand how long it takes for you to go and pick out and deliver all their groceries and you get totally screwed.

Apparently when she works in SF proper is the only place with a guaranteed wage, all surrounding areas you are at the mercy of what people decide to tip. Sounds outrageous, all it takes is one cheap idiot to completely ruin your shift and make it not worth it to work there.

callmeed · on Feb 18, 2016

See, this is the problem with the entire business model. Your friend has now put the onus of paying her living wage on the consumer. I don't want to hire a personal delivery person for $10/hour. Unless I've got a $100+ dinner bill, tipping $25 sounds crazy to me. Calling people "cheapskates" for not doing so is completely unfair–especially when these people are already paying a delivery fee. Do you honestly think families in Omaha and Birmingham are gonna pay to have groceries delivered AND want to tip 20%?

To be clear, most of the blame falls on Instacart. But your friend should find a new job. A tip-only contractor job is as bad as a 100% commission telemarketer job (maybe worse).

jsprogrammer · on Feb 18, 2016

The onus of a living wage is always on the consumer. Even if instacart were paying a living wage to the people who perform deliveries, they would need to reflect at least those wages and other expenses in whatever they charge the consumer (unless deliveries are subsidized by VC's or other interested parties).

bmm6o · on Feb 18, 2016

> The onus of a living wage is always on the consumer

In the sense that the money has to come from somewhere, sure. But wages are paid by employers, and it's shitty to underpay your employees under the premise that the customer will make up the difference in tips. If a single customer failing to tip $20 pushes the worker under minimum wage for the day, then the system is broken and the worker is getting screwed, as usual. And make no mistake, this is how it was designed to work - if they actually cared about their workers, they would charge a reasonable delivery fee to the customer and give all of it to the delivery person.

Jtsummers · on Feb 18, 2016

In the restaurant world, the employer would have to make up the difference and ensure their employees make minimum wage if tips aren't enough. While that's still not satisfactory (to me, at least), it at least guarantees that a slow shift, or a lousy-tipping, long-staying, large-group table won't push you down to $2.13/hour level income (though minimum wage doesn't get you much more).

jsprogrammer · on Feb 18, 2016

Of course, the pricing of resources should be complete and incorporate all costs. It is certainly unacceptable for tips to be required to make up a living wage.

However, no one should be under the delusion that the consumer is not responsible for those costs.

bmm6o · on Feb 19, 2016

I'm not sure who in this scenario you think is delusional, or if you're conflating being delusion with being uninformed of the specific employment arrangement between the people you interact with and their employers. Excuse me, they aren't employers anymore, it's a contractor/"logistics match-making company" relationship now.

jsprogrammer · on Feb 19, 2016

It is neither.

Certainly a business should follow the rules of its market, but I think that we agree that the expectation of a consumer is that the price they pay covers all expenses + profit.

bluetidepro · on Feb 17, 2016

As others have mentioned, I blame Instacart and not the consumer. I use and love Instacart's service. I had no idea the shoppers relied on tips as much as you say they do. Instacart makes it seem like it's just a little extra bonus, no big deal. You can't blame the consumer for not knowing that.

randycupertino · on Feb 17, 2016

Completely fair. You're right- my ire should be at the company not the users.

I was just shocked, SHOCKED how little she was actually making vs how much they told her she was making when she took the job. She thought she was making $25 an hour, but after I helped her take a hard look at the math, she was really only making $5-$7 an hour, if that.

ffumarola · on Feb 17, 2016

I definitely understand your frustration, but calling the people cheap idiots seems a bit harsh. Does Instacart give tipping guidance? As a user, I would just assume they pay reasonable wages and a tip should be just that... a tip, not a living wage.

jsprogrammer · on Feb 17, 2016

Indeed, employee expenses (car, fuel, tolls, etc) and time (wage) should be accounted for in the service fee.

Tips should be at the customer's discretion.

bunderbunder · on Feb 18, 2016

As Instacart users, we've experienced some confusion on this front. They don't make it at all clear to their customers how the pickers are paid or how much they might or might not depend on tips.

I want to believe this is something Instacart employees could easily clear up by disseminating some information on the subject rather than hoping that Instacart will handle it.

Pyxl101 · on Feb 18, 2016

This isn't something that a user should have to know. We're going down the wrong track by implying that they should. Instacart shouldn't be expected to share their compensation model for their employees or contractors publicly or with their customers, and users shouldn't be expected to know or care about it in order to make fair decisions. This is a business relationship of customer to service provider. Paying the price you're charged with an optional tip for good service is entirely appropriate.

nightski · on Feb 17, 2016

You call them idiots. But they had items delivered to them for free. In my mind that is smart.

The real problem here is Instacart's model. But is it really a problem if your friend still works for them? Why does he/she do so? Maybe time to quit?

randycupertino · on Feb 17, 2016

Ha, good points. You're right. Once I helped her crunch the numbers and it illuminated how little she was actually making ($5-$7 an hour after expenses) she decided to go back to walking dogs. Well, right now she's on a "mental health" break but I think she will be starting up walking dogs again in a few weeks.

imo it seems like these gig companies really benefit from exploiting underpaid workers who don't know any better and aren't connected/smart enough/empowered to fight back. She really believed that she was going to bring in $25 an hour like they told her she would be, didn't realize she had to account for taxes, calculating all the expenses etc.

nightski · on Feb 17, 2016

Yea it's true and is rather unfortunate. Hope everything works out.

iamse7en · on Feb 17, 2016

Then she should quit. There are certainly other people out there willing to do this type of simple work for a low wage. Instacart is basically passing it to the consumers so that the economics of the business work.

randycupertino · on Feb 17, 2016

Yeah, I'm pretty sure she is going to, I know she and the other people that do it hate it. Shouldn't they be guaranteed minimum wage? I mean, if they don't make minimum wage they can sue, or report them to the Department of Labor, right?

dlgeek · on Feb 18, 2016

Not if they're classified as contractors.

(Whether the contractor classification is reasonable is a whole other ball of wax)

dang · on Feb 17, 2016

Your comment implies that Instacart pays their employees $0. That can't be true.

randycupertino · on Feb 17, 2016

They're not "employees," they're "private contractors"... but from what she told me, their is no absolutely zero base pay unless you're working in the city of SF. Everywhere else, Oakland, the Peninsula is $0 an hour and your only income is from the tips.

I really drilled her on this because at first I couldn't believe it and she's not the most savvy person so I wanted to make sure she realized how after gas expenses, wear and tear on her car and the additional 7.65% in FICA that she's going to have to pay (the half normally covered by employers), she's really only making about $5-$7 an hour, if that. Also the job really eats up all the data on her phone because she has be on her phone for Instacart to scan and check off every grocery item she puts in the basket, then use their GPS system for the deliveries, and it's not like they reimburse her for any of that. Instacart sold her on the job promising $20-$25 an hour and she's absolutely not been making anywhere NEAR that.

By way of comparison, she makes a guaranteed $20 an hour under the table walking dogs, so I think she is going to go back to doing that.

This article seems to verify:

> Instacart declined to confirm whether it offers base pay, and some Instacart workers told HuffPost they are not offered an hourly guaranteed wage.

> “It’s a really strange job, and there are many weeks where you’re just sitting in the car waiting for orders and hoping something comes in, not being paid to be there,” one of Instacart’s personal shoppers

http://www.huffingtonpost.com/2015/02/02/instacart-workers_n...

JasonCEC · on Feb 17, 2016

On this note: My team and I at Analytical Flavor Systems[1] wrote a blog post on how we go about hiring data science interns[2]. It's heavier on the technical details, and suffers from less... romanticism....

[1] www.gastrograph.com

[2] https://gastrograph.com/blogs/gastronexus/interviewing-data-...

minimaxir · on Feb 17, 2016

There is definitely more technical detail and statistical process in that post, although I strongly question the use of a hiring test that intricate and time consuming for an internship.

disgruntledphd2 · on Feb 18, 2016

That is very true. However, I'm all for these kinds of tests in general. I read the assignment (and after lol'ing at the specification of methods), found it reasonably interesting. I reckon it would take me an hour or two to complete (as long as I didn't get sucked in to exploring the data) :).

Then again, I'm not looking for an internship.

JasonCEC · on Feb 18, 2016

You should consider applying for a full time position ;)

JasonCEC · on Feb 17, 2016

It's a 3 month paid internship, and 99% of the students have been from Princeton.

Did you see the work we linked to? That's intern work here - we treat our interns as full members of the team, and they've delivered.

asdfologist · on Feb 18, 2016

So you had 100 interns and 99 of them were from Princeton??

JasonCEC · on Feb 18, 2016

That's now how percentages work....

grayclhn · on Feb 18, 2016

I'm really curious what numerator and denominator you have in mind, then... are you somehow using fractional internship units? :)

(Ordinarily this would be off topic, but....)

asdfologist · on Feb 18, 2016

Yes it does, because I'm pretty sure your company has not hired 200+ interns.

hathym · on Feb 17, 2016

we are hiring would have been enough.

grayclhn · on Feb 18, 2016

> Many of our best data science ideas have come from Instacart employees in the field – working directly with our shoppers in our stores, or interacting directly with our customers.

I have no idea what this could mean. Either you're getting algorithm suggestions from your shoppers and customers, or (more likely) "data science" means "user interface."

jeremystan · on Feb 18, 2016

Agreed, that wasn't worded well - i'll try to explain. By 'employees in the field' we mean people who work in operations and management roles in the cities we operate in. They work with shoppers in stores and respond to shopper and customer feedback. They have ideas about how to improve our logistics and our apps - and while many ideas will be about user interfaces, many others will relate to how the algorithms operate behind the UIs.

CPLX · on Feb 18, 2016

So we no longer use the word scientist to describe people who do science? What a shame, I think science is really neat and scientists deserve unique respect.

As far as I can tell what these people do every day is called "business" or maybe "logistics"

yummyfajitas · on Feb 18, 2016

What do you believe distinguishes "science" from "data science"? I.e., why can't "logistics" be a subset of "science"?

bmm6o · on Feb 18, 2016

There's the old quip that if your discipline has "science" in the name, it probably isn't really a science. More seriously, scientists follow the scientific method, formulating hypotheses and designing experiments to test them. And that's a broad enough definition that it includes A/B testing, so it must apply to some of what they do. But science typically goes another step, generalizing observations and hypotheses into theories; I would be surprised if there was a lot of that going on at Instacart.

Which isn't a slight against them or the field in the least, it's just a debate about definitions.

CPLX · on Feb 18, 2016

Because science is an academic pursuit designed to create and test generalizable hypotheses and add to our collective knowledge, while the people in the article are trying to figure out how to optimize the act of underpaying someone to go grab some cans off a supermarket shelf and bring them to me.

They're not scientists, they're engineers perhaps, or business analysts.

yummyfajitas · on Feb 18, 2016

So a person doing basic biology research for Monsanto isn't a scientist? And whether or not Kantorovich qualifies as a scientist depends on whether he was working for the military or a university at the time he came up with linear programming?

That's an interesting definition.

CPLX · on Feb 18, 2016

No, basic biology research is science of course.

It doesn't depend on where the person works (though sure that's a relevant sign) it matters what they're doing.

Analyzing business related data and optimizing KPI's isn't science. At best it's applied science, which we have names for, such as engineering or statistics or financial analysis.

SixSigma · on Feb 18, 2016

People like shopping

jeremystan · on Feb 17, 2016

Behind the scenes, Instacart is a revolutionary new e-commerce marketplace, an incredibly sophisticated last-mile logistics engine, and a dynamic source of work for thousands of personal shoppers. Each of these aspects of Instacart could be a whole company elsewhere, and data science plays a key role in our success in each endeavor.

In this article, I (our VP Data Science) highlight some challenges the data science team is tackling at Instacart ranging from logistics to personalization. I also go into detail on how we have organized data science to have maximal impact and what we look for when recruiting data scientists.

x0x0 · on Feb 17, 2016

fyi I interviewed for this team and 2 notes:

1 - I've been doing ad optimization / user classification / propensity scoring / product recommendations, warned them that I took a couple stochastic processes classes but haven't used them for a decade, and that I was entirely unsuited for OR type problems. They said that was ok and they where hiring for things I was suited for. Great. My in-person interview was primarily an OR problem best solved with stochastic processes.

2 - they were very responsive at first but after the interview, went radio silent for a week. After promising a response in a day. This was particularly annoying since I told them I had a written offer that I was pushing off for them. My guess is they were waiting to see if another candidate would accept. Which is fine, but the recruiter should have been honest with me. They ignored me for 5 business days after the interview -- 4 after their promised response -- before finally telling me no thanks. I'm not grumpy about being told no -- that's definitely happened before -- but their crappy behavior. Fortunately I'd already accepted the other offer after reading between the lines, but still, the experience left me grumpy.

I debated posting this for a while, but bluntly, I kind of felt like they wasted my time and was really not happy their internal recruiter blew me off after repeated promises otherwise. I'm sure they'll be along to say your experience will be different (and it may well be!) but here's a data point for your consideration. I'm just sharing my experience.

randycupertino · on Feb 17, 2016

> they were very responsive at first but after the interview, went radio silent for a week. After promising a response in a day. This was particularly annoying since I told them I had a written offer that I was pushing off for them.

ugh, I hate that. I was jerked around by an a-list firm like that this fall and it was totally frustrating, especially because I turned down another offer while I was waiting to hear back. Really screwed me over and left me very bitter/annoyed. There's no respect and everyone's looking out for #1.

jeremystan · on Feb 17, 2016

We work hard to get back to candidates quickly - in some cases the same day as they interview, but at least the day after if not. We know the market is very competitive and want candidates to make the best decisions they can - so it's in our best interest to act quickly! We are always working to improve how we screen, interview and respond to candidates in hiring, so will take this feedback to heart. Thank you for providing it.

Regarding the focus on OR, that was definitely the case for our first few years, and while it's still important, we have definitely expanded our focus beyond it.

gk1 · on Feb 17, 2016

Not sure if you realize but all your responses here seem terribly canned. That's probably also why your first comment in this thread is being downvoted.

jeremystan · on Feb 17, 2016

re-reading what I wrote I can see how it comes off as canned - thanks for the feedback

jsprogrammer · on Feb 17, 2016

>“We will take full ownership of our projects. We take pride in our work and relentlessly execute to get things completely finished.”

Does this mean that as an employee, or ex-employee, I can take my owned projects with me and use them for my own purposes?

jeremystan · on Feb 17, 2016

We've worked hard to open source projects whenever we think they'll be useful broadly: https://www.instacart.com/opensource. There is definitely more of this we can (and I hope will) do in the future.

cballard · on Feb 18, 2016

Why don't you pay your workers?

https://news.ycombinator.com/item?id=11121092

sandGorgon · on Feb 18, 2016

How do you guys run R in production? Just getting started with R based datascience and it has been a struggle to figure out how to build a production data science stack.

Do you snapshot the computed models as RData and stream them to s3, etc

jeremystan · on Feb 18, 2016

We use R in production in two ways:

1. For batch processes that run daily, hourly or minutely, where the models are rebuilt on every run, and outputs (often predictions) are written to a database 2. For computation of coefficients in large sparse regularized models, where the coefficients are written to a database and scoring is done in another language in real-time

For situations where we want real-time predictions, recommendations or optimizations, we tend to setup Python services instead. For batch processes, you can definitely store models in S3 to re-use them, and I've done that at other companies. But in general I've found it better to rebuild models frequently and cache them for short periods of time only if they are cost-prohibitive to rebuild.

sandGorgon · on Feb 18, 2016

@jeremy - that's helpful! Could you hint at the way you persist sparse models to the DB in R. Especially if you are changing your variables pretty frequently. do you use something like Postgres JSONB (which is funky in R).

Also about scoring in another language - is this really worthwhile for you ? I have often debated just throwing 128GB of RAM on an R machine and calling it a day. As I figure, your "real time" requirements are probably seconds or even minutes (similar to mine).

jeremystan · on Feb 18, 2016

To persist sparse models to the DB, especially if you use L1 regularization (like Lasso) then many coefficients will be 0, and don't need to be stored or processed. Insofar as you store coefficients and features in a "tall" format (e.g., user, feature_key, feature_value), then space is conserved. Scoring can be done in DB with joins and group by, or in another language with similar operations.

Changing variables frequently can be versioned in the feature and model coefficients tables, but takes care.

I haven't used Postgres JSONB, but if you have problems with JSON in R check out the tidyjson package (I wrote when dealing with Mongo data previously).

Scoring in another language is best avoided if you can. But supporting R "real time" services will also come with many complications. Hence, we use Python when we really want that.

SparkR was completely unreliable when I first tested it over a year ago, but may have improved. Though the Spark Python API has some limitations compared to Scala, so I would guess the latest SparkR is even further behind, but we haven't tested it. Long term I'd love for that to be the answer to these questions.

disgruntledphd2 · on Feb 18, 2016

If I were running R in production, then I'd probably fit models on some kind of batch process and then serve up the predictions/output from a DB or something.

In general, R is not well-suited for DB-backed websites in real-time, but you can certainly use the outputs in production.

You can do it, but I'm not sure it's worth the effort. You could probably provide a predict() interface in real-time if it was reasonably quick.

sandGorgon · on Feb 18, 2016

So I have seen a couple of large data science driven startups (like consumer finance) to throw R on 128gb machines and call it a day. That's reasonably going to be my plan except that I can't make it work very well.

I really wish pandas had a "save workspace" feature - R does that very well. No point in saving to dB if you're going to need the data set in memory anyway.... Or use Hadoop.

sgt101 · on Feb 18, 2016

We run udfs in Hive to invoke R models, which is fine for compiling dashboards and reports but I wouldn't run it for something that needed instant responses.

ves · on Feb 18, 2016

That's what my company does

sandGorgon · on Feb 18, 2016

could you talk a bit more about your production setup ? Any multi-threading problems ?