Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

If I may give you an advice : Python, R and deep learning are sexy but the most important skill to start in data science is SQL. It will help you get your first data role and will be your main tool to solve 97% of the problems you will ever face.

Bonus point it is very easy to learn.



If I may give you an advice : Python, R and deep learning are sexy but the most important skill to start in data science is SQL. It will help you get your first data role and will be your main tool to solve 97% of the problems you will ever face.

That's a bit of an exaggeration but I will say: most of data science is not the fun stuff. Everyone goes into the field thinking it's all about doing machine learning to uncover astounding insights that will fundamentally transform the business.

In reality about 5% of most data scientists time is spent doing that. Maybe less. The bulk of the work is getting the data and cleaning it up, doing a tiny bit of the sexy stuff, then writing it up into a report or a presentation to give to people who will either not believe it or scoff because they already knew it.


I work as a data scientist and every statement in your post sounds incorrect to me. SQL is useful but it's far from the most important tool. It's certainly unlikely to land you a job. It definitely won't solve 97% of problems. You either have a very skewed perspective of what data science is or you spend a lot of time on linkedin/medium where bad advice like this is parroted a lot.

Think about this - there are a bunch of other software developers who know SQL very well. If your advice was true, then every backend developer would be able to immediately land a data science job and do great at it without having to learn a bunch of math, ML-specific stuff and a whole other tech stack.


A surprisingly vast contingent of software developers do not know SQL, let alone know it very well. And as others have mentioned in this thread, data science looks differently at different companies.

For many, the math and "ML-specific stuff" ends up being a very small part of the process. For them, data quality and data cleaning take up the overwhelming majority of hours in a given project, and SQL chops will take you much farther in that kind of an environment.

Plus SQL is not going anywhere anytime soon. So worst case scenario, OP will learn a skill that's not likely to be dated in a few more tics of the hype cycle.


I think you're both right.

I find it hard to imagine successful data scientists who don't know SQL.

OTOH, I find it hard to imagine (even though I've met some) successful data scientists who only know SQL.

I suppose it's necessary but not sufficient.


I don’t think it’s that surprising. Most web dev is just doing mundane CRUD, and mostly through ORMs or other db abstractions. If you don’t practice something how can you be expected to be good at it?


My job title is data scientist even though you may not think of me as a 'true' data scientist. Just to give you some context I work in an ecommerce startup. Depending on your industry and the size of your company things may be very different.

I maintain one machine learning model that is very core to our business but doing 'machine learning' is a very portion of my job.

> Think about this - there are a bunch of other software developers who know SQL very well. If your advice was true, then every backend developer would be able to immediately land a data science job and do great at it without having to learn a bunch of math, ML-specific stuff and a whole other tech stack.

In some companies Data scientists are very software development oriented but that is not the case of everywhere. Think about this : software developers who know SQL very well usually don't like cleaning data, they don't necessarily have good interpersonal skills required to solve business problems, they are not necessarily interested in solving business problems, and they may tend to think that more software is the solution to all problems.


> there are a bunch of other software developers who know SQL very well. If your advice was true, then every backend developer would be able to immediately land a data science job

I fully disagree. Most backend developers don't know SQL beyond their ORM library or CRUD statements. The business intelligence world has utilized SQL to analyze data and make effective business decisions for 40+ years.

ML is 90% hype to check a box for investors, and the actual business problems could be solved by a semi-competent analyst armed with Excel or SQL, not a bunch of overpaid "scientists" who completed a few Andrew Ng courses.


Totally agree. I believe the set theory thinking one gains with SQL helps to deal with tables(databases) in any framework.

SQL can become super tricky as well (depending on the context), say you want to get the list of users who are active for 'n' consecutive days from a dataset that has daily user activity for an year. It's not very difficult but needs some effort.

However, for a data science beginner, SQL is the best place to start.


"However, for a data science beginner, SQL is the best place to start."

I totally agree with that statement. Being a beginner myself in the DS field, I'm living through this right now in my job. And, as a plus, working with SQL everyday is also helping me a lot to have different perspectives in handling the Python/Pandas DataFrame.


I'd agree with this. I'm not a data scientist but work closely with a few and the jump from the academic world to business/govt seems to have been jarring.

I think previously they'd been used to consuming data from exports and CSVs, scraping websites and plugging into APIs directly. Having to navigate (often messy) database schemas wasn't what they imagined they'd be doing!


SQL is almost never used in algorithmic trading by data scientists.

Learn Python, it is used universally.

Don't learn R.


Why not learn R? In the last year I've spent around 80% of my time working with R, coming from the last five years almost exclusively with Python, and there are some great reasons to use R. Although if the tidyverse didn't exist I'm not sure I'd be saying that. I find that suite of packages together to be a very cohesive set of tools for doing data science.


Three reasons (I know both languages well): (1) R is used much less in the data science industry, and (2) Python is a more universally useful language. If he learns it for data science then he can easily write utility scripts, build a back-end, etc. (3) the overlap between R and Python capabilities is so significant it would be a waste to start with R, I would only suggest picking it up if he needs some niche package that he can't get in python.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: