Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I teach data mining at a top grad school, and am a data scientist at a startup.

I got one call from a recruiter who thought I was in a different city. Ain't so sexy from where I'm sitting.



You probably just need better buzzwords (and ideally the background to back it up) -- NoSQL, big data, MongoDB, Hadoop, etc.

I consulted for a client that used those technologies, updated my LinkedIn profile afterwards, and the amount of incoming requests from recruiters and principals has been nothing short of phenomenal. (Anecdotally, 20 InMails in 10 days, of which 14 of them converted into a phone interview with the principal.)


> You probably just need better buzzwords (and ideally the background to back it up) -- NoSQL, big data, MongoDB, Hadoop, etc.

Are there as many data scientists who don't work on Big Data?


To be pretty honest, prior to my life as a data scientist (and grad school) I was a business analyst. We mined data and threw 10M-100M entries into a MySQL database w/ Rails dashboard and for our non-RT analysis purposes it was tolerable.

There are plenty of data problems out there already warehoused by small-cap and mid-cap firms; I honestly don't see a need to go Web-Scale and all that jazz for its own sake if your use case doesn't need it. There's also shortcuts like sampling to kick the can down the road, but that's another discussion in and of itself.


I think the keyword "big data" ends up being used in even a lot of smaller cases, because everyone thinks what they have is "big data", I'm guessing because they do all genuinely have much more data than they might have a decade ago. But that still varies widely in size; what some companies think is "big data" is still perfectly analyzable, for non-realtime purposes, on one beefy workstation. Yet, because they'd never seen data with tens of millions of rows! before, and it breaks whatever system they were previously using to analyze stuff (SPSS, etc.), what they want to hire is a "big data" person.


COMPLETELY agree. In companies that don't have data as a core competency, "big data" ends up being this business buzzword thrown about because their data is too big for their current set of tools... whether it's R or even Excel or what not.

As a math/stats guy who picked up more programming along the way, I personally think it's MUCH easier to train a DB guy some business sense than it is for a a business analyst to have Hadoop drilled into them. Of course, the downsides of a coder without sufficient savvy are harder to detect than a numbers guy who can't make his program work, and therein lies your problem.


I agree. I wonder if it is possible to get hired as a data scientist as easily if you haven't worked on big data before.

Or, I guess programmers and engineers could start using the big data tools even though they are not needed. Has anyone ran Hadoop on a single (multi-core) machine for this purpose?


I doubt it, simply because it's so easy to find big datasets to work on. It doesn't have to be professional, that's the nice thing about a data driven profession.

Check out http://www.kdnuggets.com/ for links to large data sets to work on and there are also some on amazon.

Also, yes you can certainly run hadoop on a single instance, but once you get into "real big" sizes you'll need a cluster to demonstrate expertise, be it on your local machines at your house or on a set of VPS or EC2 or whatever.


The Cloudera packages make it extremely easy to get Hadoop up and running, as well as processing sample data available on the web.


Oh, I have buzz-words aplenty. I live in Canada though...maybe that's it.


Yeah, H1 season just passed -- that's prolly why.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: