Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Data Science Interview Study Guide (coriers.com)
141 points by dataguy12 on May 24, 2019 | hide | past | favorite | 19 comments


It bothers me that so many of these guides are aimed at passing the job interview instead of teaching the topic at hand.

- I want to be excited about the possibility of becoming better at the topic, not fantasize about crushing an interview

- If there is a major difference between the skills needed for an interview and the actual job (as the concept of interview perparation suggests), that's bad in all kinds of ways

- The interview prep articles often present the material in an unattractive way, as a list of requirements that need to be fulfilled, without an explanation of how it all fits together.

How about leading the student along, from a small sample project, showing the pitfalls of not understanding statistics properly, to a larger project where the data is put into a database, motivating the study of SQL, to doing some programming related to the toy problem at hand, motivating a deeper dive into programming fundamentals? How about inspiring the student about the beauty of the topic, about the possibilities and powers they get with each new little skill that they acquire?


Welcome to the wonderful world of interviewing in tech! And people wonder why most software is shit.

https://twitter.com/mxcl/status/608682016205344768?s=19


> If there is a major difference between the skills needed for an interview and the actual job...

The problem is that the exact skills required for a job cannot be measured within the timespan of an interview, so instead closely related skills are measured. Until somebody solves this problem interview preparation makes a lot of sense.


I think evaluating the skills is not difficult within the timespan of an interview, if done right.

Let's sit down at a specific problem I am working on. Let me tell you what I'm doing with the tools that I have. Your past work was in some way similar, and you can relate to what I'm doing in some way. Either you can jump right in and take over the keyboard, or you can tell me how you solved a similar problem with your tools and frameworks. I can show you how it's done with my setup, and teach you a few things in the process - which you will easily pick up as you have related experience. You got either questions that are exciting, maybe challenging me how I am doing it, or maybe teach me just a keyboard shortcut I didn't know, proving the time and effort you spent on this. We are both excited to learn a lot in a short amount of time and I know I want you to work with us...


Wouldn't your approach be biased against people who need more time to come up with a solution, and against people who are not used to discussing their solutions with others in real time? You approach also doesn't include an objective measure of performance: Your judgement of the candidate could be highly dependent on their personality, instead of their actual skills. If I recall correctly the book "Thinking fast and slow" shows how easily interviewers are influenced by personality traits. Technical questions like "how do you invert a binary tree" or "what is KL-divergence" are much less personality dependent, and the quality of the answer is measurable.

Here is what I think are some important skills in data science:

Can you understand the principles behind existing solutions and build a new solution from those principles?

(How fast) can you understand a new approach and apply it to a problem?

How long does it take you to translate a solution to working code? How optimal, readable and reusable is it?


My proposal hopefully doesn't put the candidate on the spot, but allows for any and all useful contributions throughout the process, be it fast or slow thinking or anything in between. Hopefully in that hour or so where we sit down, there is enough time to overcome most of the bias that we had when the candidate walked in.

Looking at your list of important skills in data science it seems we mostly agree on what is important...


To me, what you mentioned is pretty obvious. A short conversation with someone can usually fuss out if they're making things up or reading a script they've memorized.

It's far easier for the interviewer and the interviewee. Instead, it's all about covering yourself as the interviewer so you adopt the new garbage industry standard and pretend those are sound metrics. When the metrics are not that telling if a new hire is bad, you have an out that it's the process, not your mistake.


> It bothers me that so many of these guides are aimed at passing the job interview instead of teaching the topic at hand.

This is another manifestation of the "Goodhart's Law"

https://en.wikipedia.org/wiki/Goodhart%27s_law


People come to data science from a mix of CompSci, Business, and Statistics/Research backgrounds, often with some, but not all of the skillset. From what I've seen, it can be hard for people from any of those camps to to look into the other two camps and discern which of the skills are expected/important/relevant to a data science role.

This kind of article can be valuable as efficient diagnostics way to prioritize skills worth learning.

E.g. I'm from a business background, but regularly use SQL and Python. I know I'd need those to work productively in data science. But do I also need to know Git? Best practices to avoid SQL injection? What a monad is? How to implement Djikstra's?

I'm interested to learn all of those things, but I also have a personal life and an entire second list of know-I-don't-knows in statistics. It can be helpful to get confirmation that "yeah, you know enough SQL, but it's worth learning a little bit about graph traversal."


I’m not in data science, but my question is what CS fundamentals like that have to do with DS. I thought it would have to do with data cleansing/storage/access etc, statistics/ML, and then the stuff relevant specifically related to computing statistical or “machine learning” stuff (optimization, how do you get a gradient, basic numerical stuff). And then maybe stuff like parallelization at large scale. And then just knowing how to use your libraries of choice


This is a great question! This again, depends on the company and even team you are interviewing. Some data scientists are asked to answer questions about linked lists because the team might need you to not only research models but also implement them.

Some data scientists take on more of a data engineering role, others data analysts and research scientist. It really depends what the company wants. The beginning of the post discusses various companies and what they might expect based on both personal experience and glass door.


> It bothers me that so many of these guides are aimed at passing the job interview instead of teaching the topic at hand.

Trust me, not many of us fantasize about crushing an interview. The purpose of the guide was to help people use their time efficiently to study for an interview so they could get on to the important stuff.

I think most tech people hate the interview process in general.

I think this interview process does a few things that companies like.

1. Creates a deterrent from leaving. Once you finally get that job...you don't want to have to go through that process again any time soon 2. It create exclusivity..like colleges...by creating a difficult process and reducing acceptance rates, when you do make it, it feels like you really earned it. 3. It's an easy way to fit people through a square hole like the GRE, SAT or ACT

Do we like it...no


You have a point, but I think there is more to it.

I have failed twice the last year to be offered a DS position after going through all the interviews, references and HR personality tests. I would say that ~50% of the issue was me (0 not nailing every question) and the other 50% was the employer not understanding how to screen for the right people (= them lacking answers to questions they didn't ask).

I think there are two distinct skills: the skills you need for the job, and the skills you need to get the job. Maybe these guides cater to latter.


There is, based on what I have seen, a decoupling between tech screens and the ability to perform in a job. I'm a big fan of a takehome assessment if you're really worried about someone's skill, but these days when I am screening teammates for my own team I focus my questions on problem solving, troubleshooting, and analysis. Between assessing those skills and a candidates basic interpersonal communication I have had incredible results and have a wonderful team as a result.


That's because these guides are specifically designed to help someone pass an interview. What you're describing is the experience that one would get from taking an online data science course.


At Kyso (https://kyso.io) we see a lot of people get hired into data-science jobs and the biggest success factor that I've personally seen is having some example's of projects that the candidate has worked on. This can be either public projects online (thats what we started Kyso to help with!) or a description of a project they worked on while studying/working.

Something I've noticed about data-science candidates is that they are very happy to jump into the technical details of an implemented model - but sometimes struggle on is communicating the reasons for the model in the first place and how it can help the company/research project. A lot of data-science projects are smaller ad-hoc jobs where the data scientist is trying to answer some business question and here communication is a vital skill.


“Statistics is a broad concept so don’t get too bogged down in the details of each of these videos. Instead, just make sure you can explain each of these concepts at the surface level.”

Lol, well now, there’s a recipe for success right there.


That's essentially how modern society functions and the type of behavior it rewards, so it really is a recipe for an economic measure of success. Most humans these days tend to equate economic success with success in life as well.

I don't agree with the notion but that's where we are. It's one extreme or the other: extreme generalization with no depth or extreme depth with no generalization. Thats what pays and there's seemingly no reward for the in-between.


I think people are realizing that data scientist without domain knowledge cannot create valuable insights. Enterprises seems to hire less data scientists actually, but they are trying to raise their employees' data skills. I think that's the cause of the growth of self-analytics tools. Below are examples of them.

1. Metatron Discovery : https://metatron.app 2. Metabase : https://metabase.com/




Consider applying for YC's Fall 2025 batch! Applications are open till Aug 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: