1. Your list of references proves that Bayesian statisticians have been writing papers across a variety of disciplines.
2. Bayesian methods are not readily computed with today's hardware and software, and my desk is a counter-example.
2.1 Last time I fitted a Bayesian model, I had 600 processors with infinibandy things joining them up and left it for a week.
2.2 None of the software I use does Bayesian by default.
3. In practice, a lot of statistics in industry can be done by barcharts. I know that that is hard to hear. It is a big leap from there to Bayesian; bigger by far than from there to chi-sq tests. Data are so sparse, expert judgement so rich, and time so short...
4. Priors introduce subjectivity - no doubt about it. However, so do utility functions, pretty much cancelling it in my opinion. It is inappropriate to use p-values as a decision making framework for various reasons, but a lot of scientific papers are about recording experimental observations, not decision-making. Policy-decisions should use utility functions and priors; but I am happy with my science papers frequentist.
Absolutely. Not all applications of Bayesian analysis are computationally-intensive. In some cases (an example: finding of single-nucleotide polymorphisms in next-gen sequencing data), Bayesian analysis comes down to multiplying prior probability of a SNP (for humans, 0.001 per genome position) by a few other numbers from the data itself to obtain posterior probability, which can be done in a linear time in a few minutes on tens of gigabytes of NGS data. And the best part is, no Bonferroni adjustment bullshit!
5. Bayesian analysis is too easy. You don't have to transform the data in any convoluted way, you just describe your model and crank the handle. Publishers will no longer be able to sort the sheep from the goats.
Ha ha. Yes, it is a bit like that. If somebody could just replace that pesky mcmc convergence crap with something that worked, or give me a quantum computer, then I would convert my models Bayesian overnight.
I think about this point someone should link to [share likelihood ratios, not posterior beliefs](http://www.overcomingbias.com/2009/02/share-likelihood-ratio...)... the summary is that you can do your Bayesian analysis without specifying the prior and just report the resulting likelihood ratios, telling everyone, "Here are the likelihood ratios, update your beliefs appropriately." Though that may lack some practicality.
Also, this is true, but I think it doesn't disagree with my point.
I just dread the length of my Own Risk and Solvency Assessment (ORSA) after Solvency II (the regulations for insurance companies in Europe) takes hold next year, if I have to explain the origins of my priors.
I actually think we could do some fuck-awesome work in evaluating our risk capital requirements using priors on all of of our inputs, and the computational requirements would not be 'that' frightening, given the valuations are all Monte Carlo anyway. It's not happening, yet, though.
1. The op knows this but is implicitly believing his audience is different from the cited fields often enough to make point (1) relevant. Moreover, it's perfectly correct to say that many journal reviewers are not interested in Bayesian methods.
2.1. This is highly anecdotal and not at all a strong point. I'm sure Efron has knocked out a 600 core cluster doing frequentist bootstrapping (1). I'm also certain that many, many Bayesian methods run near instantly on modern hardware. I'll concede that there are fewer closed form results, though.
2.2 This is very true. Entrepreneurs?
3. "Data so sparse, expert judgement so rich" is exactly where Bayesian analysis is most pertinent. Use a prior to clarify and quantify your expert opinion and then demonstrate that indeed your few observations are worthy to change someone's opinion.
4. Choice of frequentist testing regime introduces subjectivity, too! Moreover, since these have been heralded as "objective" for so long it's pretty difficult to get people to recognize as much. Oftentimes, a frequentist method will be equivalent to a Bayesian method under a maximally uninformed prior. This is still a subjective assumption (though there are benefits of such a prior).
---
Frequentists test are oftentimes very necessary. They have already been highly optimized in many cases and thus are available on low resource computing platforms. They are definitely an important engineering solution! That said, Bayesian methods do a far better job being clear in their assumptions and simple in their logic.
There is certainly room for better software (free or otherwise) to replace BUGS/JAGS/whatever for the largest use cases of statistics in many fields. Also, another point you make about Bayesian methods making life difficult during certification and publication is exactly right, and probably the largest (unspoken) reason why they're not going to be used in core scientific fields for a long while.
But both of those reasons are distinctly practical and unscientific. Bayesian methods do a better job using your data. They do this by allowing expert knowledge to enter into statistics in a sensible fashion. Finally, they introduce an easily understood interpretation on the answers to your statistical questions.
You might not personally want to use them today for practical reasons, but the author of this article is very much in the right to try to encourage more scientists in more fields to take a look.
(1) Sorry, I'm actually not at all sure if this is the case. Bootstrapping is still more computationally efficient than MCMC, I think. I just used the example because I think it's ridiculous to make either point.
3. I agree with your point in some instances - but in others there is a huge risk of building a complicated model that everybody trusts that is just a pile of twaddle; in which case the expert judgement would better stay in the experts' heads, or on the odd bar-chart. I have not noted this point in many statistics courses, but it is all over the actuarial courses.
1. And so the journal reviewers shouldn't be. They are interested in the results, and act like the financial or insurance regulator in making sure those results have been calculated in an unbiased manner.
4. Yes, choice of model introduces subjectivity. That two beers a day are alright doesn't imply I should have ten.
p.s.
Does it not start life as an 'attempt at rebuttal'? 'Distinctly practical and unscientific'?, 'ridiculous'? I know it's the internet - but...
Apologies there, I was reacting to the exaggerated choice in anecdote, but actually meant the "practical" line favorably. Science and practicality can be at ends, but at the end, practicality wins by definition, right?
3. I agree that complex model building is fertile ground for hand-waving, this-is-too-complex-to-understand-so-just-accept-it false justification. I don't think large models are endemic to either Bayesian or Frequentist methods, though Bayesian methods do allow them to be built more easily. In both cases, I think simple sanity checks form a foundation that is necessary for the presentation of complex models.
1. I think the idea of being "interested in results" is a false hope. Experimentation does not always produce results, and when it does you probably don't even need statistics because the results are so obvious. In all other cases, uncertainty has become a major factor and demanding clarity is foolish. Tempting, but foolish.
4. I think under-modeling is dangerous, too. At the very least, it can rob you of the ability to quantify educated conclusions short of doing meta-analysis. At least with priors you're telling everyone you're a bit tipsy instead of stamping a bar chart down and having people swallow assumptions under the stamp of 'objectivity'.
It's kind of like Python 'import this'. Explicit is better.
Mcmc is what takes the time more tham the size of the model.
Definitely large models seem more likely to me in bayesian because it is so much neater to build them.
I notice that you are very fervent in your bayesianism, but please correct me if i read that wrongly. I was once at a conference in which a similar debate was ongoing at the lectern. Dr cox was main guest of the conf and when he was asked where he stood he said, and i misquote probably awfully from poor memory, that it was a bit silly arguing about it because you just used whichever was appropriate to the task at hand. I thought that was pretty cool.
I like Bayesianism because I think it's mathematically and philosophically cleaner. Then again, I think MCMC seems to show that it isn't so computationally clean. In realtime systems I am more than happy to use Frequentist models for their speed.
I suppose believe in a world where Bayesian methods are the primary didactic statistics useful in sciences and communication and Frequentist methods are used when closed form estimators are necessary and coherent with Bayesian estimates.
Or maybe an even better world where we have closed form, useful estimates from both camps.
Just think quantum computing. You could write down everything you know and every piece of data you have, eliciting by hand your utility and priors, and then press go on the mcmc. Awesome.
I agree with the general theme of the OP. Bayesian methods are technically better than frequentist methods.
However, like Betamax vs. VHS, there is more than just the technical correctness. Your point in 2.2 is the big one -- if there was a simple way to switch to Bayesian methods in existing statistics software like SPSS, that would be quite revolutionary. Right now, null hypothesis testing is too easy to do and widely accepted, even though the results may be completely wrong.
1. Start with a model that you think describes your data.
2. Find the parameters of the model that make it fit the observed data most closely.
3. If the model really does not seem to fit the parameters, reject the model. The criteria for it not to fit is usually the p-value, which is the probability of data at least as unlikely as your observations occurring if your model is actually correct. If the p-value is less than .05 (by an arbitrary and accidental custom) the model is rejected. Wrap your head around that, if you dare.
4. If the model is not rejected, use it to predict future outcomes with the best-fit assumptions, ideally allowing for some uncertainty in the values of the parameters. Decisions are made on this basis. The uncertainty in the values of the parameters is quite difficult to allow for in practice.
Bayesian Statistics.
1. Start with a model that you think describes the data.
2. Then get some models to model the parameters of your model, called prior models, describing your uncertainty about the parameters of your model. You get the initial parameters for the prior models from your own head, or the head of an field expert.
3. Use the observed data to refine your estimates of the parameters for the prior models. So you started with educated guesses of these 'prior' parameters, and then you made the guesses better using your observations. In practice, your guesses can become irrelevant very quickly as you add more data.
4. Predict future outcomes using your model, where the uncertainty in the model parameters is modelled explicitly using your prior models as described in 3.
5. Put those predictions of future outcomes into a utility function to make decisions.
Summary:
The main thing is that Bayesian statistics allows you to specify models for your parameter uncertainty, provided you are okay with the educated guesses.
Because hessenwolf and I are philosophically opposed here, I'll give my take. Balance your impressions between us as you choose.
---
Frequentist statistics attempts to answer the question "How will my experimentation appear knowing that there is some hidden, unknown truth to the world which generates it?" The methods then proceed to use a variety of clever arguments to show that seeing a certain experimental result (considering all possible experimental results) constrains the possible underlying reality and gives you a good guess at to what it is (and allows you to estimate how much it might vary).
Bayesian statistics asks the very different question "How does this observation I'm making affect my current knowledge of the world?" It is pretty difficult to look at the methods without seeing an interpretive nod toward the process of learning. To do this update step, Bayesians consider the relative likelihood of all possible underlying realities given that they've seen said experiment.
It's not clear to me that these two methods are at all asking the same question. In particular, they each consider (marginalize, integrate) vastly different properties and their results have different interpretations. However, since both of them fit into the space of quantifying the effect of observation on the parameters of a model of the world they end up in constant conflict.
Moreover, it's easy to construct Bayesian arguments which correspond to exactly the same algorithms as some Frequentist arguments. Bayesians argue then that their path to reach that algorithm is more interpretable and clear, especially to non-mathematician. This method collision serves to further conflate the two methods as enemies.
---
tl;dr: Bayesian statistics is an average over possible realities, Frequent averages over possible experimental outcomes. It's not clear that these are comparable at all, but since they often try to answer the same questions we compare them anyway.
Nope. I agree with what you are saying; and would like my research presented as summary statistics under different models (of which the p-value is one) for sciency stuff, and expected utility values under different utility functions and priors for decision making. I think the priors actually really become a moot point after you tack on the utility function.
I think that choice of model (even nonparametric or empirical distributions) and choice of priors are linked. Both are assumptions based on prior knowledge and analytical approach. Both are overwhelmed by the data in a fertile experiment.
Utility functions are a different beast though. They don't have an update procedure and can wildly affect your decision. I'm also convinced they're the best tool we've got so far, so I take it as an illustration that making informed decisions is just hard.
Presentation of summary statistics is fine. I prefer presentation of full, untransformed, unpruned data as well when feasible. It's, of course, often not feasible. I also demand justification for why you think those summary statistics are meaningful and under what kinds of situations they would fail to capture the conclusion presented. Not saying that this isn't done in a frequentist setting, but I think it's harder.
Honestly, really, truly, honestly, i never bothered with p values as a statistician except for two cases. The first is when performing a test for somebody else to go into a standard article format. The second is when automating reports on complex data.
P values are for people who you do not trust to make decisions. Graphs and arrays of summary statistics fro. Several differentodels are for statisticians.
Also, i disagree that model choice will be overwhelmed by the data.
Hah, I should really write these with more care. I'd feel entitled to demand, but more meekly expect that there's a bit more trust and convention in scientific publication. Though that can be taken too far.
You're right that model choice can still break your analysis given large amounts of data. I was thinking more in terms of a whole inquiry where large amounts of data will help you to locate a model that extracts the maximal information from your observations. If we're able to keep experimenting forever, we pretty much assume we'll eventually get highly accurate maps of the world.
The primary difference was in utility functions where no matter how long you experiment they remain exogenous and static.
Priors aren't essential in some models when you're looking for an unbiased estimator and you have a complete, sufficient statistic. Please don't ask me to tell you when that will happen.
Utility functions are necessary if you want to make a decision based off your knowledge. If your goal is simply to state "given model M, parameter A most likely takes this value based on experimental data" then you don't need a utility function.
I think hessenwolf's point is that priors and utility functions are both largely unconstrained functions over the state space of parameters that need to be specified based on the experimenter/reviewer/reader's beliefs and values (respectively). Formulating them and making everybody happy is still an open research topic.
You seem to be conflating quite a few axes of variation. What you're describing as "frequentist" is descriptive statistics using parametric models, which is only one possible way of using a frequentist interpretation of probability.
Nonparametric statistics is probably the biggest active area of frequentist-statistics research, in which case fitting parameters of models isn't exactly what you're doing (though there is some sort of model-building process, and some arguing about what constitutes a parameter).
In addition, predictive statistics is a quite large area of frequentist statistics, and it does precisely what you call "Bayesian" steps #4 and #5, except within a frequentist framework: you fit a predictive model, which may include uncertainty estimates in its predictions, and then feed that model's output through something decision-theoretic, like a risk function.
Bayesian decision theory and frequentist decision theory do look different, but it's not as if frequentists don't have a decision theory (and a real one, not just "use the model if it has a low enough p-value")...
Conflating - yes, hell-yes, I am simplifying as much as possible. I do use steps #4 and #5 within the predictive model including uncertainty estimates. My step frequentist 4 refers to this. The steps aren't completely aligned.
My comment is a description of the difference as it affects me, and not necessarily capturing all of the effects on others. Please do expand...
I suppose to me it's more of a difference of decision-theoretic approaches, which come up with decision rules that make "best" decisions given the data, under certain definitions of "best", versus descriptive-statistics approaches, which aim to summarize the data, test hypotheses, report significant correlations, etc. I can buy many of the arguments for decision-theoretic approaches (especially if you are in fact making decisions), but that doesn't necessarily tell me why I should use a specifically Bayesian decision-theoretic approach.
If the p-value is less than 0.05 you reject the null hypothesis, because it means there was very little chance of observing the data under the model. Then you end up with your alternative hypothesis, usually implying you can fit an extra parameter and go play.
For the sake of explanation, I have made as if one goes to fit the null hypothesis and can fail. Mine is certainly not a fine example of explication, but it entertained me writing it.
Hold on a second, you originally wrote, "if the p-value is less than .05 (by an arbitrary and accidental custom) the model is rejected." Usually by model we mean an alternative to null hypothesis. So essentially you said, if p < .05, alternative model is rejected and null hypothesis is accepted. Well that's a contradiction to what you just stated in your second post (and to what we both agree on).
Nah, you fit the null hypothesis, and if it fails you reject it. You never, ever accept the null hypothesis; there might just not be enough power in the test.
My goof, of course you don't accept the null model (you generally never accept any models -- you only eliminate ones that are worse at explaining the data).
In basic frequentist stats, you usually have two models -- a simpler one (usually called the null hypothesis), and a more complex one called an alternative model (what makes it more complex is usually one or more extra parameters), and you're usually interested in testing whether the more complex model holds up when compared to the more simple one. You do this most often by a likelihood ratio test: you divide the probability of data given null hypothesis by the probability of data given alternative model, and then you compare the value of the negative log of the resulting ratio to an expected distribution of said statistic assuming the alternative model is false and taking into account degrees of freedom (how many more parameters the alternative model contains). If it turns out that, under the null hypothesis, the probability of the ratio statistic being larger or equal to the one at hand is <= 0.05, the null hypothesis is rejected. The alternative hypothesis is not automatically accepted yet but it is said to explain the data better than the null model.
Basically everything you wrote is correct, it's just that I misinterpreted what you referred to as "model" to mean "alternative model," while you actually meant "null hypothesis" or "null model." Now you should have been more clear on that, otherwise you can confuse people new to this subject (you already confused me!)
Ha ha. I used to repeat the p value interpretation over and over in different ways to the class in the hopes of eventually explaining what it is not. I did dare you to wrap your head around it, so you were warned!! It is an upside down concept.
You know the value is 0.05 because fisher had a tabulation of the zeta function lying around with 0.05, 0.025, and 0.01 in it when he was writing the paper?
The problem is as follows. I give you a sequence of coin flips, for example TTTHHTTT, and your task is to determine whether this coin is fair. Obviously you can't answer this question with certainty.
The frequentist approach seems rather ridiculous to me. They pretend to be able to answer this question without knowing anything about coins. Instead of making the assumptions up front, they hide the assumptions in the method of determining whether the coin is fair. For example one would think that to decide whether a coin I give you is fair, you'd need some idea about what kind of coins I will give you. If we were talking about reality, you would be able to say "no it's not fair" without even looking at the data sequence, because in reality there are no fair coins. If we are in a different universe where fair coins do exist, you'd need some idea how many are fair. So obviously whether a coin is fair depends on which universe you live in, but the frequentist method is not parameterized by the universe. The assumptions are implicitly made inside the method.
The bayesian approach asks you first to state your prior ideas about coins. Then you give the bayesian your data, and he will compute for you the probability that this coin is fair. This cleanly separates the correct mathematical derivation from the subjective assumptions, instead of hiding these assumptions inside the method.
Moreover, the bayesian approach is automatic, in principle. Once you make your assumptions clear, the rest is just mechanical derivation. The frequentist approach requires divine inspiration, and is very easy to get wrong. For example you're not allowed to look at the data before formulating your hypothesis. Of course nobody does this right in practice. I've often seen people cherry pick a frequentist statistical test that proves their hypothesis. Given any data and any hypothesis, I can devise a correct frequentist statistical test that proves the hypothesis with arbitrarily large confidence.
Would that the world were about flipping coins. What's your prior distribution on the volatility parameter of a model of the 3-month implied volatility of the S&P 500 stock index? Now, justify it. What is your prior distribution on the rate at which bugs are found in a software system that runs a nuclear plant? Now, justify it. Personally, my priors are usually crap.
I agree. But with frequentist methods it's not even clear what kind of assumptions you're making. Explicitly stating your assumptions is better than implicitly hiding them in your method.
"We can see that the bayesian approach won't work well here, so lets use frequentist methods because we cannot see that they won't work well."
The problem with the nuclear plant is that we don't have much data on nuclear plant software bugs. However you could still get an idea by using bug rates in other software as your priors, and acknowledge that your priors are not perfect.
OR you could use frequentist methods to make up for the lack of data, and know you will be able to devise your method in such a way that it proves your hypothesis. Or if you use the method correctly (that is don't look at your data, and pick a statistical test beforehand) make no useful predictions at all.
"But with frequentist methods it's not even clear what kind of assumptions you're making."
I don't understand what you mean by this.
For a simple, concrete example: take the problem of fitting a distribution to a sample of real random variables. It seems that a Frequentist would make the following assumptions:
1. The data comes iid from some unknown but fixed distribution.
2. This true distribution is included within some set of distributions: eg, the set of normal distributions with real mean and non-negative real variance.
They would then use some estimator to estimate the parameters. The choice of estimator is perhaps justified by some theoretical properties, eg, consistency etc. This gives another assumption.
3. The best estimator to use is the one which is optimal with regard to properties x,y,z.
If it is not clear what assumptions I am making and I have explicitly stated some, presumably there are others which I have left unstated. Could you explain why you think the above collection is insufficient?
I could not solve the following problem from the article.
Suppose that 40% of the eggs are painted blue, 5/13 of the eggs containing pearls are painted blue, and 20% of the eggs are both empty and painted red. What is the probability that an egg painted blue contains a pearl?
| blue | red |
--------|------|------|-----
pearls | | |
--------|------|------|-----
empty | | |
--------|------|------|-----
| | | 100
We know that 40% of the eggs are blue, so we fill in 40 at the bottom row in the blue column. We now also know that 60% are red. Similarly, 20% of eggs are empty & red, so we fill in 20 there. Now we can calculate that 60% - 20% = 40% are pearls & red:
Now we can answer the question: what is the probability that an egg painted blue contains a pearl? For every 40 blue eggs, 100/3 contain pearls. So the answer is 100/3/40 = 5/6.
It requires a little bit of indirection. You'll find that you could easily solve the equation if you only knew one more quantity. So call the unknown "X" and keep going. You'll be able to form an equation that constrains X to a single value.
FWIW, the answer I get has three non-zero decimal digits.
I more-or-less agree with the OP: I'm a neuroscientist, and it would be nice if I could use Bayesian analyses in my papers without it being a point of contention with reviewers.
But it seems like one of the big advances in frequentist statistics in the last fifty years is the introduction of nonparametric methods, which don't require you to make strong assumptions about the distribution of your data. My understanding is that the field of Bayesian nonparametric inference is still in its infancy.
While I like Bayesian statistics it is NOT a substitute for maximum likelihood in many situations.
Additionally, some of his criticism of NHST is unfair because he criticizes weaknesses Bayesian also has. In the part where he gives the example of the pollster and how uncertain the p value would be because you have to incorporate sample design- well, you have to do that in Bayesian stats too.
Statistics is a big field. Obviously people should expand their tool set, and maybe Bayesian is underused, but that doesn't mean Bayesian stats are right for every experiment and experimental design.
Maximum likelihood estimation falls out of Bayesian statistics with the right utility function. In practice though, that's often not the utility function you want, hence general Bayesian statistics.
2. Bayesian methods are not readily computed with today's hardware and software, and my desk is a counter-example.
2.1 Last time I fitted a Bayesian model, I had 600 processors with infinibandy things joining them up and left it for a week.
2.2 None of the software I use does Bayesian by default.
3. In practice, a lot of statistics in industry can be done by barcharts. I know that that is hard to hear. It is a big leap from there to Bayesian; bigger by far than from there to chi-sq tests. Data are so sparse, expert judgement so rich, and time so short...
4. Priors introduce subjectivity - no doubt about it. However, so do utility functions, pretty much cancelling it in my opinion. It is inappropriate to use p-values as a decision making framework for various reasons, but a lot of scientific papers are about recording experimental observations, not decision-making. Policy-decisions should use utility functions and priors; but I am happy with my science papers frequentist.