Data Scientist: The Sexiest Job of the 21st Century

dude_abides · on Sept 19, 2012

Interestingly, just yesterday, I found out that Linkedin Friend Suggest uses, among other things, co-logins from same IP address as a signal. On my test account that I created at work, it eerily showed me all my co-workers in the Friend Suggest list. Later, as soon as I logged in from home, it added my wife to my Friend Suggest list.

I wonder if one of the goals of a good Data Scientist is also to be not too accurate, lest the product create an eerie feeling among users! (remember the Target pregnant girl incident?!)

bravura · on Sept 20, 2012

I think avoiding creepiness requires using intuition from human interaction.

In particular, it's creepy if someone knows something about you and you don't know why.

So if you don't want data mining to be creepy, you have two options:

a) Explain why you know something.

b) Wait until data mining is so commonplace that people take it for granted, and standards of etiquette shift.

siganakis · on Sept 19, 2012

I think that there is a tension between "creepiness" and "effective marketing". This I feel is one of Facebook's core problems, where for them to maximize the value of their dataset, their ad targeting becomes incredibly creepy.

One issue is that from an end users perspective it makes it obvious how much information is being captured about them. While most people are aware that their information is being captured, seeing it plastered all over their facebook feed makes them confront it.

Worse than that are the questions that come with these ads - "Why am I seeing ads for baldness cures?" Is it because I'm a 30+ male, or is it because they have analysed photos I'm tagged in and detected my thinning hair? Sometimes it just feels mean!

This is primarily a challenge of data science working in a marketing environment and doesn't really permeate through all areas of data science, however it is the form of data science that is most visible. Therefore much of data science and the big data we work with gets lumped in with sleazy marketing.

makmanalp · on Sept 19, 2012

Great point! Rather, it is to gather and analyse data as accurately as possible, and then apply it as inaccurately as required :)

001sky · on Sept 19, 2012

Definitely creepy, when travelling. I've seen it, too.

comlag · on Sept 19, 2012

An uncanny valley for data. Really interesting and certainly something I have felt but never could put a finger on.

neutronicus · on Sept 19, 2012

I was pretty weirded out when the fake account I made specifically to use Spotify (and which has absolutely no information about me in the profile), got a friend request from someone from my grad program.

petercooper · on Sept 20, 2012

I'm pretty sure Twitter does this too. I've signed up for accounts in other browsers and then had my other accounts suggested to me as people I should follow.

samstave · on Sept 20, 2012

What is the Target pregnant girl incident?

GuiA · on Sept 20, 2012

http://www.forbes.com/sites/kashmirhill/2012/02/16/how-targe...

Evbn · on Sept 20, 2012

Goals should be to avoid privacy violations by leaking private user data like IP to other users. LinkedIn has no respect for users, though.

gaius · on Sept 19, 2012

Heh, I wonder if back in the 60s, HBR said Business Analyst or Statistician were the sexy jobs of the 20th century.

Because that's all a "data scientist" is... but without the experience to realize there's already a job title for what they do.

disgruntledphd2 · on Sept 19, 2012

This is so very, very true. The major change appears to be one of scale, rather than any qualitative change. Funnily enough, since I put predictive analytics (what does that term even mean, anyway?) on my CV I've gotten much more attention from recruiters and employers. I guess it sounds so much sexier than statistics.

More seriously though, the requirements to be able to hack up a prototype and talk to people are probably what hold back a lot of people who otherwise have the skills to be good "data scientists", or just scientists.

My current employers told me at interview that they had no data, and in the three months I've been there I've been slowly discovering that they have loads of it, unfortunately in multiple incompatible forms and jealously guarded by different departments. It is rather funny, though a little sad that they were essentially drowning in data and didn't realise it.

enos_feedler · on Sept 19, 2012

I agree that being able to hack up a prototype could really make someone stand out as a data scientist. The Insight Data Fellows program mentioned in the article has a 6 week program where the focus is on learning enough software development to hack a prototype by the end of the program. That could be a good way to go.

001sky · on Sept 19, 2012

Its like a "growth hacker", re-branded in a corporate way

3pt14159 · on Sept 19, 2012

gaius, I've been appreciating your comments for a long, long time now. But you are wrong here.

There is a difference. It isn't a difference in fundamentals so much as it is a difference in focus.

Business Analysts give reports to CEOs about customer segments or the projected amounts of signups. They arn't even close to DS or statisticians.

Statisticians tell you about how a drug reacted with a control group or how likely it is that a population feels a certain way given the results of a survey or trial.

Data scientists harness data. They impact every user on a site. "Watch this video" "Follow this user" (recommendations) or "Silently ignore this user's impact on the algorithms that manage where this piece of content should go" (graph analysis) or "What exactly is in this photo" (object recognition) or "What combination of widgets leads to the maximal amount of engagement" (optimization) or "I have this paper that I really like, show me more that are just like it" (recommendations, document classifications, NLP).

It is different. The focus is on users and what they will do or should do or should see. To call them statisticians leads to much less understanding of the value that DS bring. Put me in a room with an actuary from an insurance company. Neither of us could possibly do each others jobs. Neither of us have the others skill set.

Now, both of us could learn and get up to speed on how the other works, but a sys admin and a web developer could swap roles more easily than an actuary and a DS. Yet nobody is complaining that we call devs and sys admins different titles.

timr · on Sept 19, 2012

That's not a counterargument to his point. You're parsing job titles down to the atom, and concluding that "data scientist" is different than "scientist" is different than "statistician", is different than "analyst". Gaius is saying that this job responsibility has been around for a long time, but that people are reaching to find reasons to give it a new name -- exactly what you're doing.

If you ask me, the phrase "data scientist" is recruiter-speak. I have all of the skills required of a "data scientist". I've done the job of a "data scientist". And other than object recognition, I've developed all of the different product features you mention in your comment. You know how I got the skills necessary to do those things? I was trained as a scientist, and there's no such thing as a scientist without data. A person properly trained to analyze data should be able to effectively and fluidly transfer those skills between domains -- otherwise, they're not actually good at it. There's nothing special about internet products that precludes competent people from doing effective data mining on their logs.

I suspect that the real problem here is that "data science" is Internet Hipster for: "someone who has already worked at an internet company, and knows some statistics". Because when it comes right down to it, your average statistician, chemist or physicist is more skilled at data analysis than 99.9% of the "data scientist" types you meet, but they don't easily press the comfort button for hiring managers at consumer internet companies. Why hire the "risky" ex-scientist, when you can hire the guy who claims to be a designer, a software engineer and a statistician?

gruseom · on Sept 20, 2012

Now that you mention it, "data scientist" is almost a tautology.

gammarator · on Sept 20, 2012

Tell that to the string theorists.

adwf · on Sept 20, 2012

I agree that data scientist does smell a lot like recruiter/marketing speak. But on the other hand, just because it's a new title doesn't mean it isn't valid. Reducing everyone down to "Scientist" is no more helpful than saying a physicist isn't really a separate job, but just a specialised branch of mathematics. Or for that matter, CS is just a narrow branch of mathematics.

Eventually you have to distinguish new fields from the old, even if they have a lot of commonalities.

timr · on Sept 20, 2012

Well, yeah...when there's specialized knowledge required for the job (like, say, "physics"), it's obviously a good idea to change the job title.

The problem here is that "data scientist" adds no semantic value above and beyond "scientist". A scientist of data, you say? However will we find such exotic creatures!?

jedbrown · on Sept 20, 2012

You have a comically narrow definition of statistics, cf. "Statistics is the study of the collection, organization, analysis, interpretation, and presentation of data." (http://en.wikipedia.org/wiki/Statistics)

gaius · on Sept 20, 2012

Right, but you are taking a very web centric view there. What would you call the guy who's work impacted every shopper in a supermarket? There were people doing what is now called "data science" with supermarket loyalty cards, credit cards, frequent flyer programmes, etc looooong before there were "data scientists".

carlsednaoui · on Sept 19, 2012

For those that prefer to read the article in one single page: http://hbr.org/2012/10/data-scientist-the-sexiest-job-of-the...

javert · on Sept 19, 2012

IMHO HN stories should always be posted in this format.

tzs · on Sept 19, 2012

In general, that is objectively bad, although for this particular site it is not as bad as it could be.

Here are the three general problems with submitting print views:

1. For most sites, the print view results in a small font and lines that extend all the way across the page. This makes them hard to read. Sometimes, on a desktop, with a bit of fiddling they can actually be made legible to those of us who are older than 40. On mobile, they are often simply not possible for many of us to read.

This particular site is OK in this regard, as they appear to have actually set the line width and the font size so that it comes out reasonable on the screen. In fact, their print view is quite pleasant to read.

2. The print view often omits comments, sidebar links to related stories, links for sharing, and so on. Some people actually might want to use those.

3. There is often no evident link from the print view back to the normal view. Sometimes you can figure it out by playing with the URL, but sometimes the relationship between the print URL and the normal URL is hard to figure out if all you have is the print URL to work with. Note that the normal page, on the other hand, does generally have a link to the print page, so those who prefer the print page can easily go to it.

For these reasons, in almost all cases the submission should be to the normal page, not the print page. Ideally, the submitter can add a comment that gives the print URL to save time for those who do prefer it.

Note that some sites have an "all on one page" option, that puts the whole thing on one page, but leaves comments, social links, and such. That's the best to use if available.

seanconaty · on Sept 19, 2012

But you lose out on all those precious ad impressions!

carlsednaoui · on Sept 19, 2012

Totally agree with you - much easier/ faster to read.

tryitnow · on Sept 19, 2012

There's a recommendation I give to people who are writing their online personals ad: If you're sexy, there's no reason to say that you're sexy.

I think the same applies here.

A data scientist is a fancy way of saying a "statistician who can code (should be required in stats programs now anyhow) and who can communicate effectively"

gaius · on Sept 19, 2012

I'd be surprised if it was even possible to graduate in stats these days and not know R at least, and probably NumPy too.

majormajor · on Sept 19, 2012

This was ISyE, not stats, and it was 5 years ago, but I was amazed by how much extra work some people would do to avoid having to learn anything but Excel (meanwhile, I was messing around with R and whipping programs up to get better results in less time). This was at a highly ranked engineering program, too.

Based on a few people I've kept in touch with, it seems like it hasn't changed all that much at the undergrad level. The grad level was where the problem sizes and difficulty really forced you to use better tools.

crntaylor · on Sept 20, 2012

In case anyone else is wondering, ISyE == Industrial and Systems Engineering.

brianto2010 · on Sept 20, 2012

At RIT at least, R and NumPy aren't in the core curriculum. Instead, there is a "Statistical Computing" class which covers SAS. Most students either use Minitab or Excel. Surprisingly (or not), a lot of in-class work is done using a graphing calculator. Of course, that also carries into a lot of the homework.

I wonder, do statisticians actually use graphing calculators to do stats?

crntaylor · on Sept 20, 2012

I think of my job as essentially that of a statistician (data miner/data scientist/trader/research analyst/whatever you want to call it). I have never used a graphing calculator in my life. If I need to do a quick calculation I open a terminal and boot up R, Python or GHCi, depending on how I'm feeling and how complicated what I need to achieve is.

jwoah12 · on Sept 19, 2012

I love the fact that this was posted a half hour after this: http://d.gould.in/blog/2012/09/18/your-job-is-not-sexy/

donretag · on Sept 19, 2012

It was actually posted much sooner that that, but gained no traction: http://news.ycombinator.com/item?id=4542383

bearmf · on Sept 19, 2012

I remember reading at least 10 articles with nearly the same content during the year. Why are authors so eager to convince everyone of big data's sexiness? Results should speak for themselves. So far Linkedin's Friend Suggest is one of the biggest success stories.

rm999 · on Sept 19, 2012

As a "data scientist" I found this article had much more meaningful content than most articles I've read in the past year. It's not just repeating how data science will be big in the next decade, it discusses who data scientists are and how to hire them.

>So far Linkedin's Friend Suggest is one of the biggest success stories.

I don't agree with this. Google is basically a big data sciences company. 'Data science' may be a new term, but it describes something companies have been doing for decades.

nostrademons · on Sept 19, 2012

Yeah, big data crunching pervades basically everything Google does. I joined Google as a UI SWE (basically a webdev), and find that most of my daily work nowadays involves processing large amount of data to come up with new features. I suppose I made a conscious effort to move back in the stack to more algorithmic back-end work, but even if you stick with UI work, the launch process is so data-driven that you almost need to have a basic familiarity with statistics & data processing.

bearmf · on Sept 19, 2012

I agree that this article is better than average.

>'Data science' may be a new term, but it describes something companies have been doing for decades.

This is not what most articles say. They actually try to frame it as something "new and sexy".

confluence · on Sept 19, 2012

Anything that isn't "normal" news is a PR piece for someone or something - http://paulgraham.com/submarine.html

Looks like a tech company list looking to hire data scientists - essentially a sneaky job advertisement wrapped up in a fluffy HBR (aren't they all?) article written by a consultant who probably wants to get in on the new new thing.

gaius · on Sept 20, 2012

Or to attract a lot of people to the field and drive down salaries.

elchief · on Sept 19, 2012

I teach data mining at a top grad school, and am a data scientist at a startup.

I got one call from a recruiter who thought I was in a different city. Ain't so sexy from where I'm sitting.

binarysolo · on Sept 19, 2012

You probably just need better buzzwords (and ideally the background to back it up) -- NoSQL, big data, MongoDB, Hadoop, etc.

I consulted for a client that used those technologies, updated my LinkedIn profile afterwards, and the amount of incoming requests from recruiters and principals has been nothing short of phenomenal. (Anecdotally, 20 InMails in 10 days, of which 14 of them converted into a phone interview with the principal.)

baltcode · on Sept 19, 2012

> You probably just need better buzzwords (and ideally the background to back it up) -- NoSQL, big data, MongoDB, Hadoop, etc.

Are there as many data scientists who don't work on Big Data?

binarysolo · on Sept 19, 2012

To be pretty honest, prior to my life as a data scientist (and grad school) I was a business analyst. We mined data and threw 10M-100M entries into a MySQL database w/ Rails dashboard and for our non-RT analysis purposes it was tolerable.

There are plenty of data problems out there already warehoused by small-cap and mid-cap firms; I honestly don't see a need to go Web-Scale and all that jazz for its own sake if your use case doesn't need it. There's also shortcuts like sampling to kick the can down the road, but that's another discussion in and of itself.

_delirium · on Sept 19, 2012

I think the keyword "big data" ends up being used in even a lot of smaller cases, because everyone thinks what they have is "big data", I'm guessing because they do all genuinely have much more data than they might have a decade ago. But that still varies widely in size; what some companies think is "big data" is still perfectly analyzable, for non-realtime purposes, on one beefy workstation. Yet, because they'd never seen data with tens of millions of rows! before, and it breaks whatever system they were previously using to analyze stuff (SPSS, etc.), what they want to hire is a "big data" person.

binarysolo · on Sept 19, 2012

COMPLETELY agree. In companies that don't have data as a core competency, "big data" ends up being this business buzzword thrown about because their data is too big for their current set of tools... whether it's R or even Excel or what not.

As a math/stats guy who picked up more programming along the way, I personally think it's MUCH easier to train a DB guy some business sense than it is for a a business analyst to have Hadoop drilled into them. Of course, the downsides of a coder without sufficient savvy are harder to detect than a numbers guy who can't make his program work, and therein lies your problem.

baltcode · on Sept 19, 2012

I agree. I wonder if it is possible to get hired as a data scientist as easily if you haven't worked on big data before.

Or, I guess programmers and engineers could start using the big data tools even though they are not needed. Has anyone ran Hadoop on a single (multi-core) machine for this purpose?

keefe · on Sept 20, 2012

I doubt it, simply because it's so easy to find big datasets to work on. It doesn't have to be professional, that's the nice thing about a data driven profession.

Check out http://www.kdnuggets.com/ for links to large data sets to work on and there are also some on amazon.

Also, yes you can certainly run hadoop on a single instance, but once you get into "real big" sizes you'll need a cluster to demonstrate expertise, be it on your local machines at your house or on a set of VPS or EC2 or whatever.

shblt · on Sept 20, 2012

The Cloudera packages make it extremely easy to get Hadoop up and running, as well as processing sample data available on the web.

elchief · on Sept 19, 2012

Oh, I have buzz-words aplenty. I live in Canada though...maybe that's it.

binarysolo · on Sept 19, 2012

Yeah, H1 season just passed -- that's prolly why.

manoDev · on Sept 20, 2012

Let's keep reinventing job titles to pretend they are new and sexy.

- Business Analyst: Data Scientist - Systems Analyst: Growth Hacker - Public Relations: Social Media Evangelist

What else?

suyash · on Sept 19, 2012

ASK HN? : I'm little confused, can someone please shed some light into this so we can all get a clearer picture. What is the difference between Data Scientist vs Big Data Expert vs Analytics Engineer (Statistics, metrics etc) vs Hadoop Architect vs Machine Learning Expert ? Thanks a lot HN people!

rm999 · on Sept 20, 2012

Every data scientist has to:

* be very good at working with large datasets with computational tools (hadoop is an example)

* be a decent programmer, scripter, and hacker

* have a decent background in statistics

A good data scientist:

* has a good intuition and business sense

* can explain insights to non-technical people (usually through visualization and plotting)

* knows machine learning and predictive analytics

It's a vague term, but purposefully so. There's tons of stuff you can do with data, a data scientist knows what to do and how to do it.

suyash · on Sept 20, 2012

Thanks rm999 :)

mmcdan · on Sept 20, 2012

The Insight Data Science fellows program looks awesome, but it is disappointing that only phd candidates and post-docs can apply. There is some irony with the fact that the cover of their brochure uses the facebook friendship visualization done by Paul Butler, who was an undergraduate intern at facebook when he made it.

jboggan · on Sept 19, 2012

I've found that a lot of companies are looking for data scientists but many of them have very different ideas of what that means. This makes for some interesting interviews.

I recently moved to SF and am currently interviewing for data science positions - particularly ones involving social networks and applied graph theory - so drop me a line if you know anyone who is dealing with that problem space.

binarysolo · on Sept 19, 2012

Just checked out your LI profile (fellow data science guy here) -- I think you basically need a bit more work experience or some github code to show yourself off. The big data guys like Google who have best practices, brand, and provide great onboarding should be your focus IMHO.

tejaswiy · on Sept 20, 2012

Quick question: What do you classify as work ex? I do mostly iOS programming, but I've been playing with Hadoop + the commoncrawl.org crawl data. Basically, I guess, what level of stats do you need to be comfortable with to call yourself a data scientist?

ahuibers · on Sept 20, 2012

Following Gladwell's 10000 hour rule, I would say you could probably call yourself a data science after 1000+ hours experience working with datasets successfully. As far as the math goes you should be able to do regression analysis, you don't need to know tons of stats but you do need to know stats and probability essentials (first few classes at a good school) deeply. I like this Wikipedia entry on "mathematical maturity": http://en.wikipedia.org/wiki/Mathematical_maturity; apart from writing proofs, it is very relevant.

binarysolo · on Sept 20, 2012

At the end of the day, analytics is measured by effectiveness and appropriateness, not complexity. Simple regressions will do fine, but the "art" is to choose the right questions to ask. Typically if you're in a business setting that boils down to efficiency problems and maximizing time/money/happiness/etc. Dealing with these real-world problems = work exp.

jboggan · on Sept 20, 2012

Thanks, I'm trying to find relevant ways to get that experience and working on some more sample code as well. What would you consider an "entry level" data science job?

bitwize · on Sept 19, 2012

Sorry, I don't think data science is going to topple the quadfecta of sexiness: porn star, rock star, sports star, and movie star.

philip1209 · on Sept 19, 2012

Question for HN: I'm graduating this Spring with majors in Systems Engineering and Physics, and I want to work as a data scientist, preferably at a startup. What can I do to position myself for such a job? If any of you work in the field and are willing to provide some 1-on-1 advice, please shoot me an email - mail@philipithomas.com

chubot · on Sept 20, 2012

From my experience, you will spend most your time finding, collecting and cleaning data. Doing anything super algorithmically interesting will be rare. Get familiar with the Unix shell + Python (or similar language). The shell will save you tons of code. awk/sed/cut and friends are very fast for cleaning data (in development time and runtime). And shell scripts are good for grabbing things from different systems.

3pt14159 · on Sept 19, 2012

1. Know programming.

2. Be smart with an eye for economics (there is way more overlap than people give it credit for).

3. Start by talking to people and telling them what you want to do. Most founders want to help people reach their dream.

If you have a github account, email me and maybe you can start with us over at 500px here in Toronto.

jvm · on Sept 20, 2012

What's the Toronto scene like? I'm finishing up a PhD at NYU this year and was planning on breaking in to the field after graduating. NYC is obviously a great place to be but for relationship reasons I was thinking of moving to Toronto (which is honestly a nicer city anyway much as I love NY), but a cursory inspection suggests a lot less demand for data-loving jobs. I would love to be mistaken though; am I?

nachteilig · on Sept 19, 2012

I know I should be excited for the positive innovations data science will bring us, but am I alone in mostly still finding it creepy?