You brought a smile to my face. I came here to post this same point. The piece i...

arjun810 · on June 22, 2017

I do agree that applying psychometrics would be great, but it's not as simple as it sounds -- the vast majority of work is on multiple choice questions, or binary correct/incorrect. There is some on free response, but much less.

We aren't trying to make a rigorous statement here -- we're trying to draw attention to the fact that the most common metrics do not give much insight into what a student has actually shown mastery of. This is especially important when you consider that the weightings of particular questions are often fairly arbitrary.

I certainly agree that all variability is not meaningful variability, but I'd push back a bit and say that there's meaningful variability in what's shown here. We'll go into more depth and hopefully have something interesting to report.

I've also seen a fair number of comments stating that this is not a surprising result. I'd agree (if you've thought about it), but if you look at what's happening in practice, it's clear that either many people would be surprised by this, or are at least unable to act on it. We're hoping to help with the latter.

closed · on June 22, 2017

IRT modeling doesn't care much whether an item is free response or not, just the scale on which it's scored. Binary and polytomous scoring = IRT model. Continuous scoring = Factor analysis.

If by mentioning free response, you mean students are unlikely to guess the correct answer, even when they don't know it, it's a 2 parameter IRT rather than 3.

Best of luck! :)