Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

In general, domain knowledge is incredibly valuable and will generally trump someone who may have some more technical skills but doesn't know the field. I'd add that a lot of larger companies--heard someone from Shell talking about this a couple months ago--are training up people who have knowledge of the business with "citizen data scientist" skill sets.

Without knowing all the details of your situation, it seems at least a much lower risk path to acquire some data science skills--maybe your company will even pay for it--that you can pair with your existing domain knowledge.



Your comment will forever be under rated due to how spot on it is. It's a fundamental issue in tech that people don't appreciate. While yes, some problems are universal and general tech can solve them without industry experience, I'm pretty sure most people agree that those problems are either solved or there's an army of devs already on it. These days it's not really enough to just be a dev or data sci. You need to know a field and apply your tech knowledge in better ways than the uninitiated would never imagine.


The problem is that real world, physical business data is taken from what one can get, not what one would want. And usually "what the business was collecting for (unrelated purpose)."

This means it has nearly infinite caveats and assumptions. A specifications doc or readout will never sufficiently express all of these. Especially if humans were involved in the data generated.

Consequently, the most useful data products are going to turn on whether of not you (did this small thing) to (correct for this obvious bias or flaw that anyone familiar with the industry knows).


>The problem is that real world, physical business data is taken from what one can get, not what one would want.

Yea, sorry, but part of your job in data sci is to collect the right data. Data doesn't magically exist and we are not stuck with what's out there. A data sci job is to figure this stuff out. Tech has a weird culture of not doing their job. Kind of like the Zip Recruiter ads. "Working as a hiring manager, hiring new people is the worst part of my job." Bitch, that IS your job. If you dont do that, what's the point in keeping you around? Bee keepers collect honey. Yea it's not exactly easy if you're not careful, but they dont bitch about it because they knew what they signed up for.

Data sci/analysis is about collecting and analyzing data, in not straightforward ways. Because if it were easy and didnt require any effort, why are they needed?


You're both right. Sometimes we have to work with the data we have. Other times we have to create or buy the data we need.

Some companies aren't experienced at building infra to collect data and don't know how to do it. Or their environment is too complex or expensive to sample data from. The data scientist's job in such cases is to do their best with what exists, show success and make a business case for investing resources into data collection infrastructure.

In other cases, when the required sensors don't exist and the information is critical to decision making, you can either buy the data or work with with an engineering group or external vendor to integrate and build out the sensors needed. Need foot traffic data? You can buy from a data marketplace like https://datarade.ai, where there exist various vendors (like SafeGraph -- which was recently used in a COVID19 study published in Nature) aggregating foot traffic data from cell phones. There are datasets that can be used as inferential proxies (so called "alternative data") for the actual data one needs.

Need to collect in-store data? I was at the NRF conference (the world's largest retail tech conference) in NYC back in January and there were a boatload of vendors hawking different types of retail analytics sensors.

In certain small scale operations, you can even engage field operations and get the in-store retail staff to help collect data and upload manually. (you'll need a good relationship with the field supervisor of course)

Sometimes the data does exist but is inaccessible, say in the ERP or in some proprietary format -- then you have negotiate with certain business groups or with OEM vendors in order to get the data out.

It all boils down to whether the data has value that exceeds (by a margin) the cost of collecting them. If the answer is yes, there's often a way to do it (albeit sometimes imperfectly).

Is it part of the data scientist's job description to create or participate in creating data collection infrastructure? I guess this depends on the company but for many companies the answer is yes.


I agree with you too. I think it's a mix of exec and management dont fully grasp the job and its implications if you shortcut too much. At the same time, too many data sci are in it for the keyword/sexiness of the job and are not of the personality type to take hardline stands. Inexperience leads to a lack of trust from higher ups. A lack of backbone from the experienced results in performing more incompetence. Which results in more lack of trust. Experienced personnel leave, more inexperienced comes in and do things the cheap, shortcut, buy bad data way, plus no backbone to combat against this when seen... and you see how this can spiral into a shitshow that I've been noticing in some consulting projects I've been in.

But yes, data collection should be part of their job. I'm having a hard time understanding why the person who analyzes the data should have a good word at least in what data is collected.


Have you collected data from and deployed products to a 2000+ store environment?


Okay, how is my argument changed if I answer yes or no? Is what you're talking about a data sci's responsibility or not? If collecting, analyzing and deploying data in reports or db is too difficult for you, data sci isn't for you. I'm not telling them HOW to do their job. I'm clarifying that you have to DO the job if you signed up for it. Dont like it? Get out. We all screwed up by taking jobs we didnt like. Nothing wrong with that. Get out of the kitchen if you dont like the heat or the smell.


I'm curious about the relative experience you're talking from.

You're trying to make a point about the fact that you need to push the business and/or obtain data yourself, but I'm saying that can be a vastly more difficult problem than you think (or just flat impossible) at scale.


Why single data scientist should be responsible for obtaining data from 2000 stores? Data at scale requires people at scale. Data engineers would do this job.


There is a book "Range: Why Generalists Triumph in a Specialized World" which claims there are domain-specific problems that are more likely to be solved by people originally outside that domain.


I don't doubt there are examples where a fresh set of eyes and lack of knowledge about what can and can't be done can break out from "the way we've always done things." But it's probably not the way to bet in the general case.


The book claims that "generalist" are the rule at least if we look at the very top of certain fields e.g., Nobel laureates.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: