I think the way courses are taught can give you some needed grounding, like you should always take a good linear regression class. But I think that is as far as it gets you, a theoretical base.
Honestly the issue is that most ML programs are taught as being some kind of additive skill set: the more courses you take the better or selection of the right kind of courses gets you some where.
In reality:
1. most real world problems are also about subtraction knowing what not to try and why it might not work. Like when I ask people about Recommendtaion engines for recommending colocated things, people pile on embeddings, in reality its about finding good false negatives to train datasets, calibration of classifier output and those are really hard problem. Embeddings may be necessary but are the least of your worries.
2. Most companies will not teach you about the fundamentals of stats; you will be lucky if you can get a mentor in a company that has both the theoretical rigour and the practical implementation skill to solve problems.
3. Most ML problems require engineering to work as well, for example you can't use Bayesian MCMC to do most things at scale. Its why Topic models that used statistical models like simulating posterior were crazy expensive on large datasets.
4. Models are taught like an end, but courses don't teach you to mix them for debugging. They are usually a means to an end for example say you are using decision trees and your models are acting up, you could still try some debugging techniques from linear regression like residual analysis or plotting variable slopes of each variable vs Y to debug before jumping for shapley values.
The reason is not that using shapley values is bad, they are great, but you can get a lot of insight by having some base models that are simpler to debug.
I think this is because of a misalignment that is even common in plenty of other subjects as well. You know how once you've gained expertise in something that it is difficult to explain because it is so obvious? Kinda what is happening in education. Let me explain.
The reason a lot of the theoretical basis is taught is because you need to get the skills to learn why things work, when to use them, when they fail, when not to use them, and __most importantly__ their limitations. The problem is, most of this isn't explained explicitly. Maybe just this process happening for a few decades and momentum. Or that teaching isn't a priority and so no one tries to fix it. (there are exceptions to this. You've all probably met professors that are outstanding and make boring things seem fascinating)
But what you're talking about is part of this "when to use, what to use" part. It is also why those classes are so boring, because they aren't properly motivated. But it is also why we're running into so many problems: because evaluation is fucking hard. You see models perform really well on research papers but not in the real world but you'll also see researchers evaluating papers purely on singular benchmarks. "In reality" you're forced to come to terms with the limitations of the limitations of datasets, as datasets are just proxies and what you are about is the actual generalization. But if we're not discussing and evaluating on actual generalization in research then we get this dichotomy.
There's definitely more efficient (tractable) posterior estimators that work at large scale but just a lot of stuff isn't really known unless you're in that niche yourself. Statistics is often taught from the reference of "here's a bunch of tools and when to use them" rather than "here's the problems, our assumptions, and the main tool we use to solve them. It looks different in different settings, but they are actually the same thing." So it is kinda problematic, but then again, to get there requires a lot more work and most people aren't going to bother with things like metric theory. So a middle ground approach is taken and it gets jumbled.
The people with experience also aren't necessarily the ones that end up teaching - which isn't to say the same information can't be conveyed necessarily (e.g. good academics keep up to date with the field in industry) but there is powerful focus that practical experimentation brings.
I know. I love to teach too. But I’d never take it up as a profession. The vast majority of successful people who have a yearning to pass on their knowledge hand select a few protégés or write a book.
I have had one teacher who was like that. He’d been involved in the development of nuclear weapons before he retired. Incredibly smart guy. Unfortunately, he couldn’t teach physics worth a dime. He had the highest drop out rate of any physics teacher at my college.
Those two scenarios cover the vast majority of cases.
> most real world problems are also about subtraction knowing what not to try and why it might not work
This is true in most fields. I view school as giving you a broad overview of everything that you might need in your field, but for any given problem it will be on you to narrow it down to the solutions you actually need and then to learn that specific set of solutions well enough to apply it.
People fresh out of college will usually try to apply everything all at once until they learn—either from a mentor or their own hard experience—to filter it down. It might be that ML has it worse than other fields right now not because it's taught wrong but because it's new enough that there aren't enough mentors with decades of war stories.
I don't know about ML but if you want to learn applied stats I would look up andrew gelman's or one of the newer books on Bayesian Inference ones using Stan and do them cover to cover.
Honestly the issue is that most ML programs are taught as being some kind of additive skill set: the more courses you take the better or selection of the right kind of courses gets you some where.
In reality:
1. most real world problems are also about subtraction knowing what not to try and why it might not work. Like when I ask people about Recommendtaion engines for recommending colocated things, people pile on embeddings, in reality its about finding good false negatives to train datasets, calibration of classifier output and those are really hard problem. Embeddings may be necessary but are the least of your worries.
2. Most companies will not teach you about the fundamentals of stats; you will be lucky if you can get a mentor in a company that has both the theoretical rigour and the practical implementation skill to solve problems.
3. Most ML problems require engineering to work as well, for example you can't use Bayesian MCMC to do most things at scale. Its why Topic models that used statistical models like simulating posterior were crazy expensive on large datasets.