Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

What are these data scientists? Most statisticians I know would just use the linear regression unless they needed a neural network for marketing purposes. Statisticians will spend years studying linear regression and variations in graduate school. I thought it’s a CS guy who would be more fascinated with neural networks.


Well there are some fairly distinct camps forming in data science. You are correct that those coming from a statistics background would generally prefer simpler, more parsimonious models. There is a not-insignificant group that seem to be coming into the field via other channels (CS, boot camps, self-teaching, etc.) who view statistics as a field as a bit of a dinosaur and therefore the statistician mindset to be backwards. Simpler models aren't a good thing, they are a bad thing. Any amount of increased complexity is worth even a small amount of improvement in performance.

I think some of this is exacerbated by modern pillars of machine learning and data science. Competition sites like Kaggle are entirely based on maximizing test set accuracy, and so winning submissions these days are huge morasses of ensemble methods that are trained for days and weeks on GPUs, but in the end they are often only marginally better than some of the fairly basic standard approaches. And when companies like Google are building their bots for Go or Starcraft, they are using cutting edge techniques. When people see that and get inspired to get into data science, thats what they want to do, even the the majority of problems are more rooted in data quality, thoughtful understanding of the problem, and more rudimentary methods.

Its also the result of some of the rhetoric of important figures in the field. Yann LeCun has pushed back strongly in the past on criticisms of modern day machine learning's occasionally lack of concern with introspection and model understanding. Judea Pearl, a Turing award winner for his work in machine learning, devotes large portions of his pop-sci The Book of Why attacking the field of statistics on the whole, as well as engaging in multiple attacks on historical influencers in the field with such ferocity it borders on character assassination. He has even rebuffed modern critics, such as the very widely respected Andrew Gelman, by saying they are "lacking courage" by failing to accept his "revolutionary" causal inference methods over the traditional ones used in statistics.

The attitude is driven a lot by the people and institutions at the top, and as someone in the field, I unfortunately encounter this kind of thinking way too often.


Thanks for sharing your expertise. It was very interesting to hear your perspective.




Consider applying for YC's Winter 2026 batch! Applications are open till Nov 10

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: