Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
How Bayesian Inference Works (2016) (brohrer.github.io)
290 points by m1245 on May 27, 2017 | hide | past | favorite | 47 comments


> Bayesian inference is a way to get sharper predictions from your data.

Funny, if I had to summarize it in one sentence I'd describe it in the opposite way: Bayesian inference is a way of making less sharp predictions from your data, with quantified uncertainty.


OK, I would counter-propose (as I tend to work in a dynamic world):

Bayesian inference is an efficient way to track your estimates and uncertainties as you accumulate data.


Often people avoid a fully Bayes treatment of a problem in order to make the problem more efficient. A full Bayes treatment can be much less efficient than a more "shortcut" approach, such as Empirical Bayes.


Considering these priors, Bayesian inference quantitates your ignorance.


Perhaps they meant "data efficient" as opposed to "computationally efficient".​


Quantified error bounds can be added to most machine learning algorithms using Conformal Prediction and similar ideas, see [1]-[2]

[1] - https://scottlocklin.wordpress.com/category/tools/machine-le...

[2] - https://www.amazon.com/Algorithmic-Learning-Random-World-Vla...


Sometimes you are very certain prior to updating your beliefs (in the form of a posterior), which can lead to very sharp predictions from (and possibly in spite of) your data.


If your data have noises/randomness, it most likely does, and you use your statistic to reduce the variance of the noises then it would be sharper would it not?


Quite so. There is a difference between creating uncertainty versus revealing uncertainty.


Or Bayesian inference trades certainty for speed ie how we work in real life.


Sure if having a sharp prediction means ignoring uncertainty.


That's a very strong statement. There's domains where machine learning tends to have better predictive performance than (Bayesian) statistics, but the converse is true in many other domains.

I would summarize it as Bayesian methods work best in areas where there's often not enough data, there exists significant expert knowledge, and you can properly specify a model. And yes, they do quantify uncertainty.


There are lots of discussions and explanations of what it means to be "Bayesian," but I think the best thing to do is jump in and start building models. That is how I came to understand the utility of Bayes.

If you're looking for a place to start I'd go to Andrew Gelman's introduction for the Stan Language: https://www.youtube.com/watch?v=T1gYvX5c2sM

There are Stan implementations in R, Python, Julia or you can run it in C++ since it's written in C++. I think this has greater potential to change how we deal with the unknown than AI or other machine learning.


>"There are lots of discussions and explanations of what it means to be "Bayesian," but I think the best thing to do is jump in and start building models."

I highly agree, just play with Stan or JAGs and you will figure it out. The prose descriptions just cannot convey the power and flexibility of Bayesian stats.

PS, you shouldn't be trying to do a "bayesian t-test" or anything like that. That whole way of thinking about research (asking "is there an effect?") is flawed and can't go away soon enough.


Those are interfaces not implementations of Stan, in R, Python et al. The whole Stan thing is C++, fwiw.


I wonder how many people have reinvented Bayesian inference without knowing it.


At least one guy that I know did this for risk scoring in a payment gateway. He had no clue it even had a name, it just seemed the most obvious way to solve the problem.


Loosely speaking, I've seen Bayesian inference described as a way to update your knowledge when you receive new evidence. In that sense, it's been re-invented since the time of the ancient Greeks.


The ancient Greeks -- do you have anything in particular in mind that they did? They certainly knew combinatorics at a level not guessed at until pretty recently (e.g. Schröder–Hipparchus numbers) but I haven't heard of any evidence for probability.


I was only thinking of the generalizations I've read, that Bayesianism is a "way of thinking," not about the math.


OK. Every now and then I hear of some old Greek fragment that was "ahead of its time", and it's gotten to be a hobby to collect them.


I think it's actually interesting to figure out what things the Greeks hadn't figured out in their time -- possibly a much smaller set. They didn't have algebra (e.g., the quadratic formula eluded them), and something must have happened after their time, to bring us modern empirical science.

Still it remains impressive to me what they were able to accomplish without modern tools.


That is exactly what it is: conditional probability with a feedback loop.


OT question: I am merging (calculating the mean of) 16 short exposures of a night photo with high ISO in order to remove noise and get wonderful night shots.

Now I'm just averaging pixelvalue = (photo1.pixelvalue + photo2.pixelvalue) / numPhotos

Is there a way to make this smarter with a bayesian approach? I'm thinking it couldmake a smarter guess what the actual pixelvalue should be rather than just the average.

Any ideas would be appreciated!


See https://scholar.google.com/scholar?hl=en&q=COMPARAMETRIC+IMA...

This paper discusses exactly the scenario you discuss.


Median would probably be better because outlier pixels could distort the mean.

But how would you use a Bayesian approach? What exactly are you trying to predict? What are the inputs? What is the model?


But how do I incorporate my level of confidence in my prior? I haven't seen any treatment of this question, even though it is a quite essential one: priors that you are not so sure about should be given less weight than priors that you are very certain about.


This is handled by choosing a smoother, higher entropy prior. If you have uncertainty about your prior then basic factorization tells us it's equivalent to integrating over your various priors with respect to the probability you assign each of them.


Silly me, he actually treats this question (towards the end of the article using his dog as an example)


In case anyone is wondering how Bayesian inference works on non-trivial problems:

https://arxiv.org/pdf/1701.02434.pdf



OK, but this "inference" is not a valid substitute for a logical inference because it produces a different type of result - probabilistic, not certain.

The crucial difference is that statistical inference does not consider any causation, its domain is observations only, and observations only cannot establish a causation in principle.

Correlation is not a causation. Substituting a Bayesian inference for a logical inference should result in a Type Error (where are all these static typing zealots when we need them?).

This is, by the way, one of the most important principles - universe exist, probabilities and numbers does not. Every causation in the universe is due to its laws and related structures and processes. Causation has nothing to do with numbers or observations. This is why most of modern "science" are non-reproducible piles of crap.

Any observer is a product of the universe. The Bayesian sect is trying to make it the other way around. Mathematical Tantras of the digital age.


Logical inference is just a special case of more general bayesian inference. Anything you can do with logical inference you can do with bayesian inference. Just imagine the probabilities are 0 and 1. Here's an entire book on the subject: http://bayes.wustl.edu/etj/prob/book.pdf

But true logical inference doesn't exist in the real world. Because you can never be 100% sure of anything, not even mathematical facts. It's just an approximation of superior bayesian inference: http://lesswrong.com/lw/mp/0_and_1_are_not_probabilities/

You can even deduce causation from purely observational data. And here's how: http://lesswrong.com/lw/ev3/causal_diagrams_and_causal_model...


> Logical inference is just a special case of more general bayesian inference

NO. This is a fucking sectarian crap. Logic of any kind is possible only because the Universe has its laws and structure. It always comes first. Logic is a uninterrupted chain of steps of induction which must be validated by tracing back the whole chain to some validated, fundamental principle. It is a universal process. Inductive steps and premises are domain-specific.

Logic could be applied to abstractions like numbers as an exception, because numbers represents valid aspects or properties of reality, not the other way around. Numbers are imaginary. Universe is real. It makes an observer possible, but does not require one, which means that an observer and all his inferences could be excluded completely from the "mechanics" of what is. Time, for example. And numbers, of course.


You are speaking a bunch of incoherent nonsense. Logical inference is just a simplification of proper Bayesian inference. There are no problems you can model with logic that can't be modeled by Bayesian logic. You just take a set of logical premises and set their probabilities to be 0 or 1.

But this is only an approximation, and is fundamentally wrong in principle for any real world problem. You can never be 100% certain of anything, even mathematical proofs. After all, errors are found in published mathematical proofs all the time. And people regularly make mistakes doing even simple arithmetic.

That's the thing, we live in an uncertain world and can never have true certainty about anything. Especially in most real world problems that we care about. All forms or reasoning and inference are part of the mind, not reality. Reality doesn't have to respect your axioms or logical inferences. At any time reality can bite back and say your logic was wrong. And you must change your map, not argue that the territory is incorrect.

Bayesian inference is the process of drawing maps of a territory. And realizing that they are just maps. That we can make more and more accurate maps, but we can never have maps that are 100% perfect and certain. Reality doesn't grant us certainty, and that's ok.


> There are no problems you can model with logic that can't be modeled by Bayesian logic.

"Socrates is a man, therefore Socrates is mortal". Please explain to us, speaking a bunch of incoherent nonsense, how Bayesian logic will prove the necessary "all men are mortal". Notice, that just saying "100% of a sample died" proves nothing. I am not asking about why the sun will rice in the east next morning.

> You can never be 100% certain of anything, even mathematical proofs.

This is some pseudo-intellectual hipster's bullshit, I am sorry to say. One can be 100% certain that DNA is the genetic material and bunch of other things, like for an external observer one and another one constitutes a structure - a pair and a pair introduces the notion of an ordering, etc. This is the good-enough basis of the DNA encoding, (and a Lisp). Notice, that the DNA encoding relies only on exact pattern matching on concrete physical structures - there are no numbers anywhere. The Mother Nature does not count. And this is logic, my friend.

Now take your canonical map-territory metaphor a bit further. The structure of a brain which makes mind possible, and all the other body's organs, of course, including an eye, reflects the physical environment it has been evolved within. A brain is an "implicit map" of the territory, it reflects what is, like a print, or using modern terminology a trained neural network.

The mind is bound by the brain and its sensory and evolutionary conditioning, which is bound by the environment (no matter what idealists, humanists and theologians would say). Everything the mind is capable of, including a valid reasoning (and excluding socially constructed bullshit and sectarian beliefs for a moment) is bound by the structure of the brain which is a representation of reality or so to speak a "map" of the territory. Consulting this map makes logic (and intuitions!) possible, the very same way a correctly trained model could give a reasonable predictions. It is just a form of pattern-matching.

This kind of map is more "valid" than any Bayesian map. There is no objection about uncertainty part as long as it refers to a process of "unfolding" of reality.


>"Socrates is a man, therefore Socrates is mortal". Please explain to us, speaking a bunch of incoherent nonsense, how Bayesian logic will prove the necessary "all men are mortal".

Exactly the same way regular logic does! You can have logical statements like "For all x, 'x is a man', implies 'x is mortal'". Bayesian logic doesn't take anything away from regular logic, it adds to it. It gives you the option of adding probabilities to statements. So you can do:

Socrates is a man, 99.999%

Men are mortal, 99%

->

Socrates is mortal, 98.99901%

>One can be 100% certain that DNA is the genetic material

No, you can't. Scientists could discover something completely different tomorrow. I'll grant you that it's very unlikely, but not literally impossible. It's a common mistake to confuse the two, but they are not the same.

>The Mother Nature does not count.

Take two apples, add two more apples, you have four apples. Nature definitely counts.

>The structure of a brain which makes mind possible, and all the other body's organs, of course, including an eye, reflects the physical environment it has been evolved within. A brain is an "implicit map" of the territory, it reflects what is, like a print, or using modern terminology a trained neural network.

I don't disagree. And what does this have to do with anything? The brain is (approximately) bayesian and weighs different probabilities. The brain is never 100% certain of anything. It can never know reality completely, just become a better map.


> Take two apples, add two more apples, you have four apples.

To whom? To other apples? Intelligent observer which is required to relate absolutely unrelated apples together is a most recent innovation. Atomic structures, to the contrary, are self-sufficient and could be matched without any observer whatsoever. Do you realize the subtle difference?

Molecular biology does not count, have no timers and obviously does not compute probabilities. It relies on pattern matching and message passing so to speak and feed back loops. It is an analog universe, like a clock.

One more time. There is no way to establish proper causation from​ mere observations without a proper rigorous scientific method. The whole human knowledge is based on this statement. Religions​ has been overthrown by it. This is the most important achievement of whole human philosophy. And Bayesians ​is just a sect. ;)


A friend of mine asked me to clarify a bit and to cut out my silly jokes.

The main question of Eastern philosophy (What is real? What is?) is way too far away from being answered adequately. One very old and very naive view is that nothing is real, everything is constructed by the mind. The question is - what is mental and what is real and how​ to get them apart.

Out of this comes a few simple notions, such that, while math and probabilities in particular could be used to produce a model of what is, nevertheless they cannot be the causes of phenomena because math and probabilities does not exist outside people's minds.

Of course there are certain physical constants - an angle between atoms in a water molecule, but there is no way a cell could measure it. It happens that other molecules assume certain positions in a water solution, but there is no notion of an angle anywhere. It requires an intelligent observer, which isn't here.

Same logic applies to numbers. Yes, of course, two apples and two apples would he be four apples, but there is no one to notice this at a molecular level. So, cells does not count. They pattern match, because it requires no observer and interpreter.

These notions could be generalised to a simple rule of thumb - do not try to establish causation with mere abstractions of the mind - they aren't here. Numbers, leave alone probabilities, are abstractions. Out of abstractions one construct simulations. But simulation is not reality the same way a map is not the territory.

Now about logic. It is a path from what is real to what is real, each step of which is validated by all the previous steps. It is a result of a domain-specific heuristic-guided search process, where a heuristic not choices the next step, but validates the current position by tracing it back to what is real.

Nothing much to see here. Just applied Eastern philosophy. To arrive at what is an observer with all his mental constructs has to be removed, similar to removal of the illusory self which obstructs reality. It is an ancient hack.


> Socrates is a man, 99.999% > Men are mortal, 99% > Socrates is mortal, 98.99901%

From where these numbers came from and why?


Bayesian inference is just an extension of logical inference to probability distributions. The "correlation is not causation" reaction is off base here.


No, it is not. Logic of any kind is a step-by-step process of induction, based on a valid premises in each step (just one false premise is enough to refute the whole thing, and it must always be verified) while all the premises must be traced (validated) back to one of the fundamental laws of the universe, to what is. Otherwise it is just a form of bullshit.

Socrates is mortal not because he is a man, but because biology is bound by physics of this universe (in which compound structures are impermanent and what we call energy is eternal, being transformed from one form into another) and because a man is a biological process, he is impermanent or mortal. (Ancient Buddhists got it right, by the way).

Logic is possible because the given Universe has a certain stable (eternal) properties, so processes and structures are possible. Abstract logic is a bullshit (hello, Hegelians!).

Mathematical logic works because each one inductive step rests on the valid premises due to the fact that numbers as an abstraction capture a valid aspect of reality revealed to an observer. But it cannot be applied to anything but numbers. Different types of premises and inductions are required for different domains (while the process is fundamentally the same).

The sun will rise in the sky tomorrow not because of some probabilistic inference but because such a physical process cannot be quickly changed. Causes are always real (physical), not imaginary (mathematical).


fails to mention the implicit assumption of conditional independence in the measurements (weighings)


There are some simple tricks that allow you to measure correlation between separate sets of evidence.

This you can then adjust for.


Priors go in, posteriors come out. Can't explain that!


What you actually want in this context is some code that generates random deviates of probability distributions chosen randomly and a "guesser agent" that tries to guess which distribution was chosen. Then you can ask questions like,

> given some condition on a distribution of distributions, when do we feel that a guesser is taking too long to make a choice?

This is like a person who is taking to long to identify a color or a baby making a decision about what kind of food it wants and waiting for it to do so. For a certain interval, it makes sense, but after a point it becomes pathological.

So for example if we have two distributions,

> uniform distribution on the unit interval [0,1]; uniform distribution on the interval [1,2]

then we get impatient with a guesser who takes longer than a single guess, since we know (with probability 1) that a single guess will do.

Now, if we have two distributions that overlap, say the uniform distribution on [1,3] and [0,2], then we can quantify how long it will take before we know the choice with probability 1, but we can't say for sure how many observations will be required before any agent capable of processing positive feedback in a neural network can say for certain which one it is. As soon as an observation leaves the interval (1,2) the guesser can state the answer.

Now, things can get more interesting when the distributions are arranged in a hierarchy, say the uniform distribution on finite disjoint unions of disjoint intervals (a,b) where a < b are two dyadic rationals with the same denominator when written in lowest terms.

If a guesser is forced to guess early, before becoming certain of the result, then we can compare ways to guess by computing how often they get the right answer.

Observations now give two types of information: certain distributions can be eliminated with complete confidence (because there exists a positive epsilon such that the probability of obtaining an observation in the epsilon ball is zero) while for the others, Bayes theorem can be used to update a distribution of distributions or several distributions of distributions that are used to drive a guessing algorithm. A guess is a statement of the form "all observations are taken from the uniform distribution on subset ___ of the unit interval".

Example: take the distributions on the unit interval given by the probability density functions 2x and 2-2x. Given a sequence of observations, we can ask: what is the probability that the first distribution was chosen?

The answers to these questions can be found in a book like Probability : Theory and Examples.


Did he just assume their gender? Did I just assume his gender?




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: