Statistical Inference for Everyone

edtechdev · on Nov 21, 2014

Here are some more interactive and effective ways to learn stats online. The Open Learning Initiative's open Probability & Statistics Course out of Carnegie Mellon might just be the most researched and carefully designed course out there. http://oli.cmu.edu/courses/free-open/statistics-course-detai... Students learn more statistics concepts in half the time as a traditional stats course. http://oli.cmu.edu/get-to-know-oli/see-our-proven-results/

The Statistics Online Computational Resource (SOCR) site is also amazing for actually learning and playing with common statistical tests and tools: http://www.socr.ucla.edu/

Collaborative Statistics is a free and interactive statistics textbook: https://www.kno.com/book/details/productId/txt9780983804905

You can also run Sage, R, Python, Octave (Matlab clone) and other tools right in the browser now: https://cloud.sagemath.com/

tlmr · on Nov 21, 2014

What do you think of the CMU course vs this text book in question?

capnrefsmmat · on Nov 21, 2014

The CMU course is very traditional. It covers basic exploratory data analysis (summary statistics, plotting data), basic probability, and hypothesis testing and estimation. There's no programming, nothing Bayesian, and only brief discussion of regression.

(I taught 36-201, the intro stats course that was used to build the OLI course, this summer.)

Statistical Inference, on the other hand, seems to take a Bayesian perspective and is very much not your standard intro stats class. It looks interesting and I'll have to skim through some of it.

daniel-levin · on Nov 21, 2014

This book interesting because it forgoes the traditional approach of most mathematical statistics books. The preface states that it is done like this in order to avoid the "cookbook" approach taken by many statistics students. This is why it is ironic that "Bayes' Recipe" appears 15 times in this text, and on page 131 there is a five step algorithm for parameter estimation, and my favourite, oft-repeated, never explained recipe - "n > 30, you'll be fine". There is no mention of the CLT, MLE, method of moments estimation, biasedness of estimators, convergence in probability, how sampling distributions arise, or any of the theory of distributions that underpin all of the inferential procedures detailed in the book. I think that excluding these topics actually increases the cookbooky-ness of the text.

It is important that students understand the provenance of the inferential techniques they use so that they don't land up doing bogus science (which hurts the world) by not knowing the failure modes of these techniques. Of course not all students of statistics know the requisite mathematics to understand it all, at the very least put the failure modes into a cookbook form.

For the sake of science please don't ever do any inferential statistics without knowing when the method you're using works and when it breaks, what it is robust to, and what assumptions it makes. Statistics is really easy to break when used naively. The mathematics of statistics is not easy, and often results are highly counter-intuitive.

bblais · on Nov 25, 2014

"There is no mention of the CLT, MLE, method of moments estimation, biasedness of estimators, convergence in probability, how sampling distributions arise, or any of the theory of distributions that underpin all of the inferential procedures detailed in the book."

Lot's of good criticisms in this thread, which I'll have to look at. This one, however, is not. :) how many intro stats book, of the traditional kind, mention MLE, method of moments, biased vs unbiased estimators, etc...? None that I've seen. So, you're right, it becomes more "cookbooky" as a result, however, I would argue that all Bayes analysis follows the same recipe, whereas frequentist analysis typically follows many recipes - not obviously connected. It is that part that I criticize, not the fact that there is a recipe for doing things.

daniel-levin · on Nov 27, 2014

>> how many intro stats book, of the traditional kind, mention MLE, method of moments, biased vs unbiased estimators, etc...? None that I've seen

Oh - there are quite a few. Here's a small sample (no pun intended):

- Probability and Statistical Inference by Hogg & Tanis (we used this in my stats course)

- Modern Mathematical Statistics with Applications by Devore & Berk

- Probability and Statistics by DeGroot & Schervish

bblais · on Dec 10, 2014

Ah, yes. I concede the point. What I find interesting in all this is that the term "Introduction" is used is so many ways. When looking, for instance, for an intro bayes book you get things like Lee and Bolstad which, for some is intro. However, if you tried to teach med students or business students from that it would be a disaster.

Personally, MLE I see as just an approximation of MAP - which is superior. Biased vs unbiased also doesn't play into probability theory as logic, except as a consequence of those parameters that maximize the posterior.

hessenwolf · on Nov 21, 2014

n greater than 30:

A quantity that follows normal distribution has two things to estimate, the variability of the quantity (standard deviation), and the mean. Both of these are estimated with uncertainty from a series of observations of the quantity (the data). The t-distribution allows us to make predictions, taking into account both sources of uncertainty for a normally distributed thing.

However, as the number of observations increases towards thirty, the estimate of the standard deviation gets really, really good, so you can happily ignore the uncertainty for that. Then you just need the normal distribution.

grayclhn · on Nov 21, 2014

That is hugely misleading. It's only reasonable if the data are actually independent draws from a normal distribution. IRL, they're not.

Sven7 · on Nov 21, 2014

What books do you recommend?

grayclhn · on Nov 20, 2014

I haven't looked at it carefully, but it's hard to think of a setting where I'd want to teach from this book: it's aimed at stats 101 students, but uses python as the programming language (great language, but far beyond what I'd expect a typical intro stats student to be able to handle); it advocates bayesian statistics, which is a reasonable decision, but seems to take it to such an extreme that "hypothesis test" never appears in the table of contents...

But, it's obviously a labor of love and it's an interesting take on intro to stats. And, from skimming it, I don't see anything in it that's wrong. So this might be a good intro to bayesian stats for most HN readers.

edit: there is a wide range of quality for the graphs, though. Some look great, but some (the histograms especially) are... unappealing. And the formatting for the code sections is quite at odds with the style of the rest of the book. Those are minor, though.

second edit: not to start a license flamewar, but can this book be redistributed? It's licensed under either CC or GNU FDL, but I don't see a way to get the source code. So anyone hosting a copy would also need to license it under the FDL (since they can't remove the FDL licensing from the pdf), which they would then be violating. Am I understanding things correctly, or am I wrong?

sebastianavina · on Nov 21, 2014

The best book I've found about statistical inference is this one: http://www.amazon.com/exec/obidos/ASIN/188652923X/ref=nosim/... it comes with the bonus that you can take the full course (video lectures, recitations, assignments and quizes) on mit: http://ocw.mit.edu/courses/electrical-engineering-and-comput...

findjashua · on Nov 21, 2014

thanks for the links. I've been meaning to kick the tires and read up on some probability/stats stuff, and this seems like the perfect way to ease back into it. Bookmarked!

ced · on Nov 21, 2014

it advocates bayesian statistics, which is a reasonable decision, but seems to take it to such an extreme that "hypothesis test" never appears in the table of contents...

That's not very unusual. It seems to follow the "logic of science" approach from Jaynes. Hypothesis testing is covered in chapters 4 and 6. Other books (Mackay, Jaynes, Murphy) only cover frequentist hypothesis testing to argue against it, so this is rather refreshing.

eli_gottlieb · on Nov 21, 2014

Whether the textbook author wants to preach the Way of Bayes or not, the students, provided they actually become empirical scientists, are going to face journal and conference reviewers who want to see p-values. Failing to teach them how to construct credible intervals and perform Bayesian significance testing based on posterior distributions is failing to teach them skills necessary for our profession.

grayclhn · on Nov 21, 2014

It's very unusual to not cover hypothesis testing in an introduction to statistics class. The students are going to see "testing" again. The passage you quoted was about teaching from the book, not using it for self study.

Perceval · on Nov 20, 2014

Hopefully they can fix the embarrassing typo: "Monte Hall problem" should be "Monty." Not sure how that could have escaped notice. Maybe they were thinking about Monte Carlo simulations when writing that bit, but someone should have caught this.

tjradcliffe · on Nov 20, 2014

Proofreading is one of the great unsolved technological problems. Human attention is a fantastically limited resource, and even multiple layers of checking frequently lets what subsequently appear to be "obvious" errors slip through.

The recent "cite crappy Whoever paper here" goof in a peer-reviewed journal is a typical example, and is notable only in that it is so egregious that it was caught and publicized. It is essentially certain that a large fraction of published papers contain at least one significant typo. I know of one case where two figures in a paper were identical (figure 2 was duplicated in figure 3) and it was missed by the co-authors (one of whom was fanatically careful) the journal editors and the referees.

We are never directly aware of our own inattentiveness, by definition, so the reality of how inattentive we are comes as a constant surprise.

To twist this vaguely back on topic: as well as being attentionally blind, we are also probability blind. I liken this to colour-blindness: we simply do not see probability distributions and have a terrible time thinking about them, yet we are completely immersed in them every day.

Between these two things--attentional blindness and probability blindness--we frequently end up interacting with the universe in ways that make little or no sense, as we behave as if we a) notice everything and b) live in a world of certain outcomes. The modern revolution of treating probability theory as logic is a huge big deal, and people who adopt it are likely to have a considerable advantage in years ahead. For one thing, it makes dealing with our attentional blindness easier, because it helps us understand and represent in our reasoning our imperfect attentional capabilities.

grayclhn · on Nov 20, 2014

A clear link to the LaTeX source file on github would solve a large part of the proofreading problem. Especially since the book is nominally licensed under the GNU FDL.

topherjaynes · on Nov 20, 2014

If you read the citation they're following the original source, which isn't the common usage either, but at least it doesn't seem as glaring now.

http://www.jstor.org/discover/10.2307/2683689?uid=2&uid=4&si...

pash · on Nov 21, 2014

The various spellings arise because the television host's real name was Monte Halparin, and he used the stage name "Monty Hall" [0].

0. http://en.m.wikipedia.org/wiki/Monty_Hall