Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Probably Approximately Correct – A Formal Theory of Learning (2014) (jeremykun.com)
116 points by lnyan on July 5, 2021 | hide | past | favorite | 14 comments


This has been one of my favorite names for a mathematical concept ever since I've heard of it. I can't help but imagine a conversation:

"So, you've got this model here then? Very good! I can count on it as being correct, right?"

"Err... well... no sir."

"Oh my. Well, that's disappointment. Is it at least probably correct?"

"Umm... I do think that would be overstating the matter, sir."

"You can't say that? Is it perhaps approximately correct? I could do with a model that's approximately correct, that's still jolly useful."

"Ummm... strictly speaking sir, no, I can't claim that the model is approximately correct either."

"My goodness, what is this thing actually good for then? What would you claim this model is?"

"Well, sir, it's... um... probably, approximately correct."

beat "Probably, approximately, correct."

"Yes sir."

"So you're telling me this model is approximately correct."

"Quite probably, sir."

"And it's probably correct."

"Only approximately, sir."

"I think perhaps you don't fully understand why I've been paying you those big Machine Learning dollars these past several months."


Humour is such an underappreciated method of imparting fragments of insight. Your script sounds like a refreshed tech-focused version of "Yes, Minister"[1]. I really miss that show.

[1] https://en.wikipedia.org/wiki/Yes_Minister


It feels like there is the premise of an incredible TV show in there somewhere.

Unfortunately, at first glance, there doesn't appear to be much overlap between the cultures of Silicon Valley and Westminster, which we might call "two nations divided by a common language", or the difference between "move fast and break things" and "move slowly and break things".

Having said that, if the Dominic Cummings plan of a "British ARPA" goes ahead, that could prove to be a comedy gold mine for some good satirists. Such a TV series would only need to take some inspiration from "The IT Crowd" and the "Q" character from James Bond, and it would become an instant hit.


A fun anecdote from the time I took a Statistical Learning Theory course - For my course project I presented on a paper that described "Probably Maybe Approximately Correct" learning.


There seems to be a popular book about it from the guy who invented it: https://www.basicbooks.com/titles/leslie-valiant/probably-ap...


Fun fact that I had to prove for my master's thesis: if you have a procedure that (independent from other estimates) estimates a mean with absolute or relative error eps with probability 1/2 + g, then you can boost that to an arbitrary probability 1 - d using the median of O(log(1/d) / g^2) estimates. So repeated often enough, "probably approximately correct" can become "almost surely approximately correct", with an overhead factor linear in the number of zeroes you want in the failure probability.


> (independent from other estimates)

That little phrase is doing a lot of work in your theorem :)


When working with real-world statistics? Absolutely, there is no reason to believe that this assumption holds.

But the context is randomized (quantum) algorithms, where this is trivial to guarantee.


PAC looks, imho, similar to convergence in probability [1]. I don't know if this similarity is exact. The contribution of PAC learning, to me, seems in showing that a large class of problems are PAC learnable, even though they are not exactly learnable.

[1] https://en.wikipedia.org/wiki/Convergence_of_random_variable...


PAC implies convergence in probability but is more powerful. PAC results are non-asymptotic (and in supremum) so for a fixed n you can chose a degree of uncertainty and get the corresponding bound. Usual convergence results in statistics give you the asymptotic rate of convergence at best, nothing else.

That's why PAC / VC results are so useful for ML: you get results for the useful case (limited and fixed amount of samples, estimator that isn't even in the same class of function as the true function), not the theoretical case where you somehow managed to get infinitely many observations.

Of course the price to pay for hypothesis that are so relaxed is weaker results.


PAC is great in some ways because it is one of the best ways of proving things about what you can do with finite samples. But practically the bounds will mostly be too weak for an application, and empirically measured error rates will usually be much better than the PAC bounds.

There are two other good features of a PAC analysis of a problem that often get overlooked:

* you need to precisely define a model for how your data is being generated. This helps you reason about the data source a little better and to quantify your expectations of what you are expecting to see. You can turn this into anomaly detection by identifying highly improbably input data to your model.

* doing a PAC analysis will give you a principled method of ranking different methods for modeling the same data. Without anything else to go on, a ML algorithm with a better PAC bound is probably a better first choice than an algorithm with a weaker or no PAC bound.

All of this provides a better methodology to approaching a new model than the typical one of building random deep learning architectures and then pulling the slot machine arm to see if you hit a jackpot.


PAC is very general concept. PAC learnability can be stated differently using VC dimension or sample compressibility.


Most everything with a real number answer in quantum computing is "probably approximately correct". That is, with probability delta the answer is no more than epsilon away from the correct one.


PAC is an amazingly insightful idea, but that math that makes it work is head-spinning.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: