An Introduction to Model-Based Machine Learning

plusepsilon · on July 13, 2016

Author calls it "Model-Based" in place for Bayesian.

I transitioned from using Bayesian models in academia to using machine learning models in industry. One of the core differences in the two paradigms is the "feel" when constructing models. For a Bayesian model, you feel like you're constructing the model from first principles. You set your conditional probabilities and priors and see if it fits the data. I'm sure probabilistic programming languages facilitated that feeling. For machine learning models, it feels like you're starting from the loss function and working back to get the best configuration.

Much of the underlying machinery behind Bayesian vs. machine learning models is the same. Hidden Markov Models are Hidden Markov Models whether they have a prior or not. But this difference in feel influences how you build models and hence, the results.

Now that optimization algos for Bayesian models are catching up, Bayesian ML might become a thing.

Cool stuff.

xpe · on July 13, 2016

The blog post author (Daniel Emaasit) wasn't the first to use the "model-based machine learning" phrase. He cites "Model-based machine learning" by Christopher M. Bishop.

jimbokun · on July 13, 2016

What do you mean by "machine learning models", as distinct from Bayesian models?

In the literature, Bayesian models are usually considered a type of machine learning model.

apathy · on July 13, 2016

Think Bayesian vs frequentist. Obviously from the perspective of "optimize against a loss function" they're both special cases, but one is mathematically and philosophically parsimonious, the other... Less so.

vertis · on July 13, 2016

The authors initial feelings around ML are similar to how I feel. It's such a broad subject, it feels like you could read/study for years and not cover everything. Worse still it's ever changing.

chestervonwinch · on July 13, 2016

I don't know about that. It might appear that way, if you only read news articles on ML since these usually are announcements of the state-of-the-art.

However, compare the topics covered in Duda and Hart's text (1st edition, 1973; 2nd edition 1995) to the more recent text by Hastie et al. (current edition ~2013). There isn't a huge difference in subject matter. The latter is slightly more advanced and has more of a statistics perspective, but the foundations are there: Bayes' decision theory, linear methods for classification / regression, naive bayes, neural networks, decision trees, ensembles, clustering, etc...

There is a range of topics that are foundational to ML, and thus, relatively stable. These topics are built upon the even more solid foundations of probability theory and statistics. The biggest advances in ML in the last decade (I would argue) were not due to advances in theory.

Florin_Andrei · on July 13, 2016

> the more recent text by Hastie et al.

Is that 'The elements of statistical learning'?

chestervonwinch · on July 13, 2016

That's what I was referring to, yes.

apathy · on July 13, 2016

You just described all fields worth doing research in.

If we knew the right answers we wouldn't waste our time implementing the wrong ones. It's not a bad thing: it means there is room for you to discover something genuinely new and understand something, however small, that nobody else ever completely understood.

I think that's great. But YMMV.

amelius · on July 13, 2016

I'd like to see a high-level overview of the kind of problems that can be solved by the different kinds of ML, and their applications.

tchalla · on July 13, 2016

The only material I have seen which does this or comes to this is the Andrew Ng Machine Learning class on Coursera and Academic Earth.

jimbokun · on July 13, 2016

Try Pedro Domingos "The Master Algorithm". Good high level overview of the various "schools" of Machine Learning. Not sure if it identifies which problems are best solved by which approach, though. More the history of how they have taken turns as the most successful paradigm.

blahi · on July 13, 2016

What is a good textbook that will take me from 0 to practical proficiency with Bayesian Nets, if I have experience with regression models, hand built and machine learned?

apathy · on July 13, 2016

Do what the author did: take Koller's course and follow the research that interests you. In this case, the propagation through factor graphs is also a good analogy for when you start looking at back propagation in NNs and the autoencoder.

blahi · on July 13, 2016

This is a very fragmented approach and is very frustrating to go down that road. Every field is filled with hype and people pushing their own agenda and publicity to further their careers. Speaking from experience with self learning discriminatve modeling, the overwhelming majority of the books is either superficial or needlessly complicated. In both of those situations the books are badly written as well.

Finding a book that hit the sweet spot for regressions wasn't easy but was doable. I was hoping there would be something similar with Bayesian Nets/Generative Models.

Florin_Andrei · on July 13, 2016

> Finding a book that hit the sweet spot for regressions wasn't easy but was doable.

Could you provide an example?

blahi · on July 13, 2016

Regression Modeling Strategies. You need at least some notion of what regressions and probability are all about, but if you have the basics covered, this book will take you through 80% of the journey and the rest is some googling to figure out some concepts that might be murky.

This is a book that emphasizes practical applications without getting bent on the math details too much. If on the other hand you are a math whiz, Elements of Statistical Learning is THE book but it expects you to be very proficient in math.

Both books are seriously underrated, which is kind of funny to say because you will find only praises about them, but they deserve even more.

makeset · on July 14, 2016

The math in The Elements of Statistical Learning is quite basic, if occasionally tedious. Any college junior in STEM should have taken enough calculus, linear algebra, and probability to work through it.

apathy · on July 13, 2016

I disagree re: ESL and RMS, but Harrell's book is superb. It's really more aimed at biostatisticians and clinicians, though.

If the math in ESL gives you trouble, you might prefer http://www-bcf.usc.edu/~gareth/ISL/ISLR%20First%20Printing.p... (and I'm not just saying this because the first author was one of my advisers, although I do think that he and Daniela are particularly gifted teachers).

If the math in ESL is too trivial for you, there's always https://web.stanford.edu/~hastie/StatLearnSparsity_files/SLS... , which covers some graphical modeling strategies in later chapters and even kicks the tires of the autoencoder (imho perhaps the greatest recent advance in neural networks for practitioners) along the way.

Koller's course and Ng's course are also good.

Ultimately I feel like you have to get the math right or you'll never acquire the intuition that helps you design your own approaches. But you also have to put in the work.

That reminds me, tibshirani's Stanford course (accompanies ISL and ESL) is terrific. Better than those other two, actually. I wish Harrell would offer one.

blahi · on July 14, 2016

You disagree wit what regarding ESL and RMS?

>Ultimately I feel like you have to get the math right or you'll never acquire the intuition that helps you design your own approaches.

But what do you mean by that? Do I really need it if my applications are not as demanding as Netflix? I feel like many people consider anything less than phd-level understanding lol worthy, which is simply not true. Majority of analysts out there are doing just fine with canned procedures. Are there something like canned procedures for Bayesian Nets?

apathy · on July 14, 2016

I disagree that RMS is necessarily better than ESL in some meaningful way. The two are complementary. It's like saying a gravel truck is better than a motorcycle: it all depends on what you want to do with it.

Re: "do I really need it?": hell if I know, I'm not you. But my assertion was specifically that if you want to design your own methods (i.e. do research) you need to understand what they are doing. This doesn't seem like a controversial position; an expert is simply a master of the fundamentals.

blahi · on July 14, 2016

But I thought I made it clear that I am not looking to do research. I'm not even looking for state of the art performance. I am looking for that 30% of the skills which allow me to do 70% of the tasks. Like RMS. Because outside of Google/Microsoft/Amazon et al, domain knowledge beats superior math skills 10 out of 10 times.

apathy · on July 15, 2016

You should do whatever you like, but remember that domain knowledge and math skills are not mutually exclusive. If the problem you need to solve for a major customer or project happens to be in that 30%, it may come in handy.

Linear algebra and calculus (to a lesser degree) are foundational for a great many things. Got missing data? K-NN or nuclear norm matrix completion (or marginalizing over the rest) can help. Systems of differential equations? Use a matrix exponential.

You are free to do whatever you like. A bus driver doesn't need to know how to rebuild an engine. But if you want to race cars you'll get a lot further if you do know how.

blahi · on July 15, 2016

If I have missing values, I can use multiple imputation or even simple averaging and the hit I will take will be negligible. I simply do not work in the remaining 30%. I can tell if the work is going to be above my head and refuse the project in those cases.

So after all this back and forth, I still don't know if there is a book similar to RMS in scope, for Bayesian Nets.

apathy · on July 16, 2016

Try http://www.r-bayesian-networks.org

Florin_Andrei · on July 13, 2016

What are the prerequisites for Koller's course? If you mastered the topics in Andrew Ng's course, would that be enough?

apathy · on July 13, 2016

Possibly, but you'll have to work, hard, to keep up.

tartakovsky · on July 13, 2016

It seems this is a rebranding of probabilistic graphical methods.

et2o · on July 13, 2016

So what did he actually do with this model? Why is it better to do it this way? What is the point? There's no evidence or output here.

xpe · on July 13, 2016

Just to touch on one part of your question -- did you see the case study section? There are five conclusions:

> 1. This approach provides a systematic process of developing bespoke models tailored to our specific problem.

> 2. It provides transparency to our model as we explicitly defined our model assumptions by leveraging prior knowledge about traffic congestion.

> 3. The approach allows handling of uncertainty in a principled manner using probability theory.

> 4. It does not suffer from overfitting as the model parameters are learned using Bayesian inference and not optimization.

> 5. Finally, MBML separates the model development from inference which allows us to build several models and use the same inference algorithm to learn the model parameters. This in turn helps to quickly compare several alternative models and select the best model that is explained by the observed data.

jimbokun · on July 13, 2016

It's just a blog article clearly titled "An Introduction to Model-Based Machine Learning". While the questions you ask are certainly useful, there was no indication the answers would be found in the article.