Off topic, but it's very interesting to observe the ratio of upvotes / comments.
On any given chatGPT topic, there are hundreds of comments usually. Here, so far 100 upvotes, only 7 comments.
The book looks great - and given the authors, it almost certainly is (I will buy it for sure). It makes me think though about the state of 'ML / AI / Data Science' - and the cynic part of me thinks that this upvotes / comments ratio kind of reflects the fact that most people interested in AI hype have not really touched a lot of underlying concepts and don't have any deeper understanding of maths / stats behind.
PS. That being said, I didn't do a meaningful comment on the link topic neither.
This kind of books are key to getting started with Machine Learning/AI, and this particular book is a very good one. I started my ML journey with this book.
There is a lot of hype around AI and it is going to be like the dotcom bubble.
Unlike Crypto, AI has real uses right now and I am saying this not taking into account any LLM products. But there is also a lot of hype and wishful thinking, and this bubble is going to burst and hurt a lot of people. But that won't stop people making real money in the short term. Many hypers I know understand this well.
But AI is here to stay. And even after the bubble bursts, there will be real uses of AI all around us.
> and this bubble is going to burst and hurt a lot of people
> But AI is here to stay. And even after the bubble bursts, there will be real uses of AI all around us.
This is, as a ML researcher, exactly where I'm at (in belief). The utility is high, but so is the noise. Rather, the utility is sufficient. The danger of ML is not so much X-risk or malevolent AGI, but dumb ML being used inappropriately. And in general, that is using ML without understanding the limitations and having checks on it to ensure that when hallucinations happen that they don't cause major problems. But we're headed in a direction where we're becoming more reliant upon them and then once we have a few big issues with hallucinations the bubble will burst and can end up setting us back a lot in our progress to creating AGI. Previous winters were caused by lack of timely progress, but the next winter will happen because we shoot ourselves in the foot. Unfortunately, the more people you give guns to, the more likely this is to happen -- especially when there's no safety training or even acknowledgement of danger (or worse, the only discussion is about being shot by others).
Well if you care more about the study than the money (which is nice -- though I'm at the tail of grad school), it makes more sense to chase knowledge than chase metrics. Then again, I don't come from a computer science background, so maybe not having that momentum helps.
This and elements where the intro to ML for me as well.
I understand your sentiment but we also have to accept that for a lot of Ml usecases just calling ChatGPT api is 100x better approach than creating your own Ml model, and thus there is really no need to understand any math.
As an example I am building an Ai nutrition counting app. And I use ChatGPT function calling. I can just add a field that has say an emoji of the good and it automatically classifies any food to the right emoji. There is absolutely no need to know gradient descent or any fundamental property to be able to do that.
As an ML engineer, I look forward to the day OpenAI ups their prices 5x and companies hire people like me as a consultant to replace their expensive API calls with an SVM or random forest that can be run off a smartphone.
That intern who figured out how to make a POST request and accidentally committed the API keys to public GitHub? Long gone. The rise and grind manager who discovered ChatGPT in April? Retired. But we will be there, ready to cut your costs by 95% because people couldn’t be bothered to understand the basics of what they’re using, in exchange for a sizable consulting fee of course.
There's a meme where it shows someone stepping over all the steps to understand how to think about and analyze data directly to BERT. Well now people are stepping past BERT to stable diffusion and ChatGPT. It's been like this for years. Most work environments suffer from it in a bad way. I don't envy practicing data scientists managing expectations.
Interesting seeing job postings wanting 5+ years of LLM experience. Like unless you were at OpenAI working on GPT-1 or Google on BERT there's no one else in the world with that much experience and your shitty Startup/Fortune 500 company can't afford them anyway.
>Interesting seeing job postings wanting 5+ years of LLM experience.
Not really interesting.
Similar to seeing job postings some years back (and even recently), wanting n+ years of Rails experience when DHH had created it significantly less than n years before.
As a ML researcher, I don't think you're far off the point.
w.r.t HN, there's almost all hype and no "science". People have strong convictions but not strong evidence. They happily cite papers, but only read the abstracts and miss the essential nuance. Especially in a field where suggesting limitations puts you at high risk of rejection (reviewers just copy paste that and thank you for the work).
w.r.t academia, it is a bit better, but I find that in general there are a lot of researchers missing math fundamentals. I know or have met people at top universities or top labs that don't know the difference between likelihood and probability. Similarly ones that don't understand probability density. Even ones working on diffusion. But I will say, that in general the most prominent researchers do have these skills. But you'll notice that they aren't publishing as fast and their works might not even be as popular. A lot of research right now goes into parameter tuning and throwing compute at the problem. I've been a bit vocal about this though. Mostly due to it being a barrier to other types of research (because I'll admit that the tuning is needed, but we need to be honest that it isn't high innovation either and that it is hard to prove these are better given that we haven't tuned other models/architectures to the same degree).
tldr: You're pretty spot on. There's a shit ton of noise in ML/AI. Especially on HN
Edit:
I thought I should also suggest Richard McElreath's Statistical Rethinking (https://xcelab.net/rm/statistical-rethinking/), which is a more enjoyable read than ISLR and will also introduce you to Bayesian stats (Lectures are also on youtube). I'd also suggest Gelman's Regression and Other Stories (https://avehtari.github.io/ROS-Examples/).
>don't know the difference between likelihood and probability. Similarly ones that don't understand probability density.
I'm a phd student in a "top university", in a research group primarily focused on data science (NLP, LLMs, blah blah blah). I'm 100% sure I am the only person in the group of ~25 (including profs/postdocs) that knows the difference between f(θ|x) and f(x|θ). In fact I'm pretty sure I'm the only person that has ever even seen f(θ|x) (because I took a stats sequence out of casella+berger). This group puts out dozens of papers a year. My research focus is not data science (compilers).
Well your name says something about who you are (and might mean you can guess at the roots of mine :). I often find PL people are more likely to have good math chops because it is taken seriously in their field.
Fwiw, at CVPR last year I asked every author of a diffusion paper about likelihood or score and only 2 gave me meaningful answers (1 compared their model's density against the data's density which was estimated through an explicit density method. Yeah, parametric vs parametric, but diffusion is not a tractable density method). It is really impressive that people who are working with probability and likelihood every day do not understand the difference (I see many assume they are the same, not just not know the difference).
>and might mean you can guess at the roots of mine :)
your services are required on the busy beaver thread!
>It is really impressive that people who are working with probability and likelihood every day do not understand the difference
i think i come away from the whole experience (the phd, even though i'm not done yet) with a deep skepticism/cynicism of very many things. but it's probably not what you expect. i just don't think the math is at all relevant/important epistemically as long as you can run the experiments efficiently. which is exactly what you see happening - people with access to gobs of compute develop good intuition that leads them towards breakthroughs, and people that don't have access to compute struggle and make do with the formalisms. it's not much different in physics, where the good experimentalists aren't born that way, they're made in the well-funded labs.
i firmly believe that in ML, the math does not matter at all, beyond the tiny bit of calculus and linear algebra you need to kind of understand forwards and backwards. of course everytime i say this on here i'm skewered/debated to death on it, as if i don't know what i'm talking about :shrug:
Bayesians and frequentists should bury the hatchet. They are both useful, and the best tool to use depends on the problem/environment. There's no one-size-fits all in statistics.
> your services are required on the busy beaver thread!
Lol I didn't even see it. I'm assuming this is w.r.t Mutual Information's video?
> the phd, even though i'm not done yet
I'm at about a similar point (last year). Most of my cynicism though is around academia and publishing. Making conferences the de facto target for publishing was a mistake. Zero-shot submissions in a zero-sum game environment? Can't see how that would go wrong...
> it's not much different in physics, where the good experimentalists aren't born that way, they're made in the well-funded labs.
Coming from the experimental physics side (my undergrad), there is a big factor though. Generally the experimentalists who were good at the math and could learn to intuit them (and especially the uncertainty) did better. But you're absolutely correct about the __well funded__ part being a big indicator. When I've worked at gov labs I didn't notice a quality difference in intellect between peers from different schools (of a wide variety of prestige) but what did stand out was simply experience. Your no-name school physicists could pick up the skills fast, but they just never had opportunities like the prestigious school students did. It didn't make too big of a difference, but it is an interesting note, especially since it tells us how to make more of those higher status researchers...
> i firmly believe that in ML, the math does not matter at all
My opinion is that this is highly context dependent. Most research right now is about optimization and tuning, and with respect to that, I fully agree. I'm including in that even some architecture search, such as "replace CNN with Transformer" and such things. This you can do pretty much empirically. The only big point I'll get on here is that people do not understand the limitations of their metrics (especially parametric metrics), biases of the datasets, and the biases of their architectures, so it creates a really weird environment where we aren't comparing things fairly. (It is also why what works in research doesn't always work out well in industry) But if we're talking about interpretability, understanding, novel architecture design, evaluation methods, and so on, then I do think it matters. There's a lot that we can actually understand about ML -- how they work and how they form answers -- that isn't discussed not because it hasn't been researched but because the research has a higher barrier to entry and people don't even understand the results. It isn't uncommon to see a top tier paper empirically find what a theoretical paper (with experiments, but lower compute) found 5-10 years back, where the recent work didn't even know about the prior work. Where the higher level math really helps out is being able to read deeper and evaluate deeper. Fwiw, every time I make a "math is necessary" argument, I get a lot of people pushing back. But I think this is because both groups have two camps. For the pro math I believe there is people who legitimately believe it (like me) -- who usually talk about high dimensional statistics and other things well past calculus -- and people who say it to make themselves feel smart -- people who often think calculus is high level math or say "linear algebra" as if it is just what's in David Lay's book. For the anti-math crowd I think there are the hype people who just don't care and the people who are just doing other things and don't really end up using it. For the latter, I do think they are still benefiting a lot from the intuition about these systems that they gained from those math courses. But then again, the classic thing in math education is that you struggle while you learn it and then after you know it it is trivial.
For research, I firmly believe you need both the high math and the "knob turners." I just think academia and conferencing should be focused around the former and industry should focus around the latter. But the problem is we have these people operating in the exact same space and we're comparing works that use 100+yrs of compute hours to works that have a month or two of compute hours. This isn't a great way to really tell if one architecture is better than another since hyperparameters matter so much. It's just making for bad research and railroading.
There is, naturally, a reason for this. GPT is effectively a nice wrap on an otherwise complicated set of issues. I almost think of it as gui instead of console. Yeah, you lose some of the functionality and control, but a lot of people will take it and run with it simply because it is just so much easier.
Case in point, Google AML AI, which promises to do away with pesky model validation and such ( because it will do everything in a closed box you will not have a reason to investigate ). I am already looking forward to the conversations with regulators.
A lot of people do not need to know about underlying concepts in AI - they just use it. Though it is interesting to see that even on this website this seems to be the case.
This is pure gatekeeping. The math behind LLMs, that is, the math behind Neural Nets, is undergrad freshman level Calculus and some linear algebra. Not really complex at all. Can you deal with derivatives, the chain rule and matrix multiplications? Great you know all the "math" behind Deep Learning.
That's the beginning math, but definitely not "the math behind LLMs". That includes probability theory, metric theory, topology, and more. But most people don't even acknowledge this, but then again, unless you're deep in a subject you don't really know the complexities of that subject. Red flags should go off whenever anyone says "it's just <x>" or calls something simple. It's like the professor saying the proof is trivial, when that's the hardest part of the entire problem.
People like you are hilarious. You're sitting high in your ivory tower thinking that no one without a PhD in CS from Stanford/Berkeley/MIT can do what you do. Meanwhile people will take the Fast.ai course and be training full llm's from scratch in 6 months all the while you moan that "they don't even understand the REAL math". Yawn.
People like you are funny because you don't realize I'm actually also calling out a lot of ivory tower people.
Also, I'm not saying you need the math to train a model. I'm not sure you even need linear algebra to do that, mostly just programming. That's why I called it the beginning. I was directly responding to your claim that this is all you need __to understand__. Because let's be real, you don't need to know backprop (and thus derivatives and chain rule) to train models. The math is about how to analyze your models. You know, specifically what academia is supposed to be doing. Research and engineering overlap but they aren't necessarily the same thing.
Besides, math education is notorious for being essentially free. Who needs Stanford/Berkeley/MIT when textbooks exist widely. (Btw, CS doesn't typically produce mathematicians)
If you want, you can get all of that math at any Top500 university, even in undergrad. I agree that people will be having impact and deploying these models without that understanding, but you don't need to be at Stanford to gain that understanding and it's something desirable if you want to do research instead of deployment.
This is an update to a very popular text which was originally in R. Professors Hastie & Tibshirani are leading educators in statistical learning. They also have a video course following these notes in Stanford Online. Very highly recommended if learning theoretical aspects of classical ML
ISL is the best intro-level textbook of classic ML methods. It‘s theory-oriented yet simple enough to appeal to a wide audience of students (with basic knowledge in stats, linear algebra, and coding).
Having the examples only in R was a pain when teaching with it while using Python. I hope they‘ll now turn this into a series of Jupyter notebooks and distribute it through Colab or similar.
I understand some may say "classical" ML but to me, those "few parameters" methods are very helpful in many cases and much easier to interpret than RNN :-)
I’ve been meaning to do a comparison of lab zero between the two.
I’ve only had the chance to look over the Python lab for a few minutes, but compared to what I remember from the R labs, it is much, much more involved and longer.
I know HN likes to complain about how difficult and confusing R is, but I think that it is an easier language for beginners or stat inclined people to start doing statistical work in.
Python is more natural for programmers learning statistics/ML. R is more natural for statisticians learning programming. Which is not too surprising, since those were the audiences each language was intended for. I think it's good to be accepting of both, and use the one that works better for a given task.
It's worth noting that neither of those books contain any code at all.
I suppose that's what makes the ISLA being translated such a big deal. A sufficiently advanced student in ML/Statistical modeling doesn't really need code at all since it should be fairly trivial to translate the mathematical models into computational ones, and the ability to do so is a prerequisite to understanding these models in the first place.
Recommended Textbooks:
Pattern Recognition and Machine Learning, Christopher Bishop
Machine Learning: A probabilistic perspective, Kevin Murphy
[2] University of Toronto CSC 311: Introduction to Machine Learning
Suggested readings are optional; they are resources we recommend to help you understand the course material. All of the textbooks listed below are freely available online.
Bishop = Pattern Recognition and Machine Learning, by Chris Bishop
ESL = The Elements of Statistical Learning, by Hastie, Tibshirani, and Friedman.
[3] EPFL CS-433 Machine Learning:
Textbooks(not mandatory)
Gilbert Strang, Linear Algebra and Learning from Data
Christopher Bishop, Pattern Recognition and Machine Learning
[4] University of Washington CSE 446: Machine Learning
The required textbook for the course is:
[Murphy] Machine Learning: A Probabilistic Perspective, Kevin Murphy.
The following three texts are also excellent and their PDFs are available for free online.
[B] Pattern Recognition and Machine Learning, Christopher Bishop.
[HTF] The Elements of Statistical Learning: Data Mining, Inference, and Prediction, Trevor Hastie, Robert Tibshirani, Jerome Friedman.
[5] Cornell University ECE4950: Machine Learning and Pattern Recognition
Materials
We will take materials from various sources. Some books are:
Pattern Recognition and Machine Learning, Christopher Bishop
Machine Learning: a Probabilistic Perspective, Kevin Murphy
[6] Princeton University COS 324: Introduction to Machine Learning
Optional Machine Learning Books
[Murphy] Kevin Murphy, Machine Learning: A Probabilistic Perspective, MIT Press.
[Bishop] Christopher M. Bishop, Pattern Recognition and Machine Learning, Springer.
[7] ETH Zurich Introduction to Machine Learning (2023)
Other Resources
K. Murphy. Machine Learning: a Probabilistic Perspective. MIT Press, 2012.
C. Bishop. Pattern Recognition and Machine Learning. Springer, 2007.
[8] TUM (Technical University of Munich) Machine Learning
This award-winning introductory Machine Learning lecture teaches the foundations of and concepts behind a wide range of common machine learning models.
Literature
Pattern Recognition and Machine Learning. Christopher Bishop. Springer-Verlag New York. 2006.
Machine Learning: A Probabilistic Perspective. Kevin Murphy. MIT Press. 2012
[9] MIT Introduction To Machine Learning:
Books: No textbook is required for this class, but students may find it helpful to purchase one of the following books. Bishop's book is much easier to read, whereas Murphy's book has substantially more depth and coverage (and is up to date).
Machine Learning: a Probabilistic Perspective, by Kevin Murphy (2012).
Pattern Recognition and Machine Learning, by Chris Bishop (2006).
[10] UC Berkeley CS-194-10: Introduction to Machine Learning:
Reading List (Preliminary Draft)
The first two books are very helpful, and are available online, so those (in addition to AIMA) will be the primary sources. Bishop has a wide range of solid mathematical derivations, while Witten and Frank focus much more on the practical side of applied machine learning and on the Weka package (a Java library and interface for machine learning).
Trevor Hastie, Rob Tibshirani, and Jerry Friedman, Elements of Statistical Learning, Second Edition, Springer, 2009. (Full pdf available for download.)
Kevin P. Murphy, Machine Learning: A Probabilistic Perspective. Unpublished. Access information will be provided.
Stuart Russell and Peter Norvig, Artificial Intelligence: A Modern Approach, Third Edition, Prentice Hall, 2010.
Christopher Bishop, Pattern Recognition and Machine Learning, Springer, 2006.
Ian Witten and Eibe Frank, Data Mining: Practical Machine Learning Tools and Techniques, Third Edition, Morgan Kaufmann, 2011.
ISL is a more introductory book than Bishop or Murphy. There's no reason not to read all of them, they're all excellent books that cover different topics. I'd also throw in Elements of Statistical Learning from the same authors as ISL(R/P). I've read ISL, ESL, and Bishop, started Murphy but didn't finish it (no real reason, just lost track of it when I got busy). I highly recommend any and all of these texts.
I heard good things about Bishop however I am a SE that would like do know more about what the ML team is doing and maybe work on some ML side projects. Would you recommend Bishop here or is it considerer to theoretical for such a case?
Bishop is going to be more theoretical than ISL. It is true that Bishop is taught as an introduction to ML in many universities, but if you want more hands on to start with, ISL is an excellent option. There is another text called "Elements of Statistical Learning" that pairs well with ISL for a more theoretical treatment. I haven't looked at ESL in a long time, the only concern I'd have is if they aren't covering some introductory deep learning topics. Most of ISL, ESL, and Bishop are more traditional machine learning, covering a wide variety of algorithms, so bear that in mind.
Sidney Siegel, N. John Castellan, Jr.
'Nonparametric Statistics for the
Behavioral Sciences, Second Edition', ISBN
0-07-057357-3, McGraw-Hill, New York,
1988.
So, "nonparametric" means make no assumptions about a probability distribution based on parameters. Or, call the material distribution-free.
E.g., get to see about resampling plans -- tiny assumptions, really simple, darned cleaver, quite generally useful, especially appropriate for computing. Might use resampling to get more information from the data from "A - B" tests.
Huh? I have that book, and it's nothing like ISLR, at all. It's a good book, but ISLR covers topics such as gradient boosted trees, survival analysis, GLMs, etc. Nothing at all like the book you mentioned. If forced, you could say ISLR is more focused on prediction, not inference or hypothesis testing.
Nonparametric Statistics for the Behavioral Sciences,
should make a good contribution to statistical learning. Some of the techniques are so robust, i.e., need such meager assumptions, that they should be especially welcome in automatically applied AI (artificial intelligence) applications.
Actually the book ISLR, Introduction to Statistical Learning, does claim to cover
It can be watched without the book. The coding parts can be skipped. It has some insights missing in the book, and they've got an amazing mix of incredible technical talent and a great ability to distill and explain concepts.
I don't think most people realize this but the "old" stuff often works better, has less churn, and has far lower overhead costs for deployment than the "new" stuff. Depends on the domain and the goal.
To your point, I replaced an LSTM that required ~$100k of infrastructure with XGBoost that required no more infrastructure (we created and used the model at query time on existing infrastructure we already had for query loads) and only lost about 2% accuracy (LSTM: 98%, XGBoost: 96%). This was two years ago and it's still in use.
The python version is great news. I get asked fairly frequently to recommend an intro ML book. I would have suggested this, except they usually only knew python and not R. Now it a perfect first book!
I used to introduce people new to machine learning with a python-converted version of ISL that I was developing. I never finished converting all of ISLR so this is very welcome!
On any given chatGPT topic, there are hundreds of comments usually. Here, so far 100 upvotes, only 7 comments.
The book looks great - and given the authors, it almost certainly is (I will buy it for sure). It makes me think though about the state of 'ML / AI / Data Science' - and the cynic part of me thinks that this upvotes / comments ratio kind of reflects the fact that most people interested in AI hype have not really touched a lot of underlying concepts and don't have any deeper understanding of maths / stats behind.
PS. That being said, I didn't do a meaningful comment on the link topic neither.