Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

You brought a smile to my face. I came here to post this same point.

The piece is kind of making a basic fundamental mistake in measurement, assuming that all variability is meaningful variability.

There are ways of making the argument they're trying to make, but they're not doing that.

Also, sometimes a single overall score is useful. A better analogy than the cockpit analogy they use is clothing sizing. Yes, tailored shirts, based on detailed measurements of all your body parts, fit awesome, but for many people, small, medium, large, x-large, and so forth suffice.

I think there's a lesson here about reinventing the wheel.

I appreciate the goals of the company and wish them the best, but they need a psychometrician or assessment psychologist on board.



I do agree that applying psychometrics would be great, but it's not as simple as it sounds -- the vast majority of work is on multiple choice questions, or binary correct/incorrect. There is some on free response, but much less.

We aren't trying to make a rigorous statement here -- we're trying to draw attention to the fact that the most common metrics do not give much insight into what a student has actually shown mastery of. This is especially important when you consider that the weightings of particular questions are often fairly arbitrary.

I certainly agree that all variability is not meaningful variability, but I'd push back a bit and say that there's meaningful variability in what's shown here. We'll go into more depth and hopefully have something interesting to report.

I've also seen a fair number of comments stating that this is not a surprising result. I'd agree (if you've thought about it), but if you look at what's happening in practice, it's clear that either many people would be surprised by this, or are at least unable to act on it. We're hoping to help with the latter.


IRT modeling doesn't care much whether an item is free response or not, just the scale on which it's scored. Binary and polytomous scoring = IRT model. Continuous scoring = Factor analysis.

If by mentioning free response, you mean students are unlikely to guess the correct answer, even when they don't know it, it's a 2 parameter IRT rather than 3.

Best of luck! :)




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: