Camels and humps: a retraction [pdf]

yanowitz · on July 20, 2014

Very interesting read--the immediate topic and the intertwined mental health discussion.

We generally approach science research as separate from the people making it (except for questions of reputation). There is a Scientific Method and if one follows it, Science results.

Of course, hundreds of small decisions are made by scientists all the time, each potentially introducing a bias (I find the catalog of cognitive biases alone impressive). Even the decision to pursue a particular hypothesis betrays a bias (albeit unavoidable if we want to rationally make use of past learning and accomplish anything).

But I hadn't considered mental health as a key factor--such things are quite rarely discussed (nor do they have a standard reporting mechanism in scientific papers). Apart from general stigma around mental health, I suspect it's also because it undermines the image if scientist as dispassionate seeker of truth. How much more interesting, nuanced and complex would our results be if that changed?

twic · on July 20, 2014

I'd be interested to see the results of applying the test repeatedly at n-week intervals throughout the course.

One interpretation of the original finding that everyone who passes the test passes the course, but not everyone who fails the test fails the course, is that some people who did not have a consistent mental model before the course develop one during it (and nobody who has developed such a model loses it). That should show up in a longitudinal test: you would expect that as people develop a model, they switch from the failing to passing groups.

This would be consistent with the sequential learning model. If you did a basic class, and didn't have a model by the end of it, then when you do a more advanced class, you have even less of a chance of learning anything from it, and less of a chance of developing a model during it.

I wonder if a way of structuring a course would be to start with a sort of mental model bootcamp, where the teaching was aimed specifically at developing a model, and where nobody would progress to the rest of the course without doing so. That way, students who take longer to get ready are not confronted with later stages of the course that they will not be able to make use of, and have a progressively greater share of the teaching resources in the first stage to help them do so.

DanBC · on July 20, 2014

> He didn’t find a way of dividing programming sheep from non-programming goats

Obviously http://blog.codinghorror.com/separating-programming-sheep-fr...

> My physician put me on the then-standard treatment for depression, an SSRI.

2014 is much better. http://www.nice.org.uk/guidance/CG90

> [...] asked some statistician colleagues if they could help us recover more information from his data.

It's a shame more organisations don't have access to statistician helpers to ensure that they are being accurate and honest when seeking, interpreting, and presenting data. Perhaps this is something else that is a result of the dominance of Excel - people have collections of numbers and you can pummel them into a spreadsheet and produce some nice charts and graphs but that leads to people over-interpreting the data.

> After a lot of work, the answers were, by and large, that we couldn’t see any such differences in our data.

This is surprising to me. I remember reading the blogs around the time and it seemed like a sensible claim. I can't remember anyone digging into the data and pointing out flaws. Did they?

I think I believed it because I feel "unteachable".

EDIT: I freaking love this paper because of its discussion of a mistake made during a phase of mental ill health, and the recovery journey afterwards.

gwern · on July 20, 2014

> This is surprising to me. I remember reading the blogs around the time and it seemed like a sensible claim. I can't remember anyone digging into the data and pointing out flaws. Did they?

I'm not sure that was possible. I haven't re-read the original 2006 paper, but it sounds like the claims in the 2006 paper may simply have been false:

> I did a number of very silly things whilst on the SSRI and some more in the immediate aftermath, amongst them writing “The camel has two humps”. I’m fairly sure that I believed, at the time, that there were people who couldn’t learn to program and that Dehnadi had proved it. Perhaps I wanted to believe it because it would explain why I’d so often failed to teach them. The paper doesn’t exactly make that claim, but it comes pretty close. It was an absurd claim because I didn’t have the extraordinary evidence needed to support it. I no longer believe it’s true. I also claimed, in an email to PPIG, that Dehnadi had discovered a “100% accurate” aptitude test (that claim is quoted in (Caspersen et al., 2007)). It’s notable evidence of my level of derangement: it was a palpably false claim, as Dehnadi’s data at the time showed.

twic · on July 20, 2014

> 2014 is much better. http://www.nice.org.uk/guidance/CG90

2014 looks pretty similar.

1.5.1.2 For people with moderate or severe depression, provide a combination of antidepressant medication and a high-intensity psychological intervention (CBT or IPT).

So, for moderate or severe depression, the standard initial treatment is an SSRI and therapy.

For less severe depression, though, the guidance is to start with non-pharmaceutical options, and only move to drugs if those don't work.

DanBC · on July 20, 2014

I'm not saying that 2014 treatment is magical. But there are some important differences:

There's now a recognition of "subthreshold depressive symptoms" - which are troubling and unpleasant but which either would have been missed in the past or would have been treated solely with medication.

Other stuff is much more important now. "A wide range of biological, psychological and social factors, which are not captured well by current diagnostic systems, have a significant impact on the course of depression and the response to treatment."

We're using DSM IV, not ICD10, which "* also makes it less likely that a diagnosis of depression will be based solely on symptom counting.*"

To get the therapy in the UK the person would self-refer to an IAPT (improved access to psychological therapy) style course. That would carry some kind of assessment of need, and the person would thus have another check (the first would be the GP) to see if they need specialist secondary care.

The important stuff here for the OP is much more concentration on therapy not just medication; and much more concentration on how the person is coping with life not just counting symptoms. Of course, some places do this much better than others.

sampo · on July 20, 2014

The 2006 "The camel has two humps" paper: https://www.cs.kent.ac.uk/dept_info/seminars/2005_06/paper1....

rohan404 · on July 20, 2014

Was anyone able to find an online version of the test?

Edit: Closest thing I could find: http://vanisoft.pl/~lopuszanski/public/canihascs/

lukeholder · on July 20, 2014

my result was: "Your answers are consistent on the C0 level." I am not sure that that means though?

__david__ · on July 20, 2014

I poked around the code and found this:

  var C = [
    [[1],[2],[3],[4],[5],[6],[7],[8],[9],[10],[11]],
    [[1 , 2],[3 , 4],[5 , 6],[7 , 8],[9 , 10 , 11]],
    [[1 , 2 , 3 , 4],[5 , 6 , 7 , 8],[9 , 10 , 11]],
    [[1 , 2 , 3 , 4 , 5 , 6 , 7 , 8],[9 , 10 , 11]]
  ];

That's the list of models in various combinations. C0 means C[0]. If you are consistent with a single model in your answers on more than 80% of the test then you are C[0]. If, say, you split between model 1 and 2 and in total they were 80% of your answers, then you'd be C[1]. If you were split between 2 and 3 then you'd be C[2], since that's the first time those are grouped together.

The test doesn't alert() you to what model you are. You have to set a breakpoint and poke around the code. When I took it I was consistent across all 12 questions with model 2.

You can see the what model each answer corresponds to here: http://vanisoft.pl/~lopuszanski/public/canihascs/questions.j...

ejr · on July 20, 2014

I got the same. It may mean we passed the base level in the C languages since the syntax matches.

tinco · on July 20, 2014

It's kind of weird that one of the most logical answers to the first question is not in the answer sheets. I bet that if it was there 100% of non-programmers would tick it, and not be wrong.

The question:

    a = 10;
    b = 20;

    a=b;

The logical answer of course being that the computer should throw an error or return false, because a does not equal b. 14 years of schooling should have hammered that in quite thoroughly.

If you really want to test non-programmers native skill for working with computer, you should at least briefly explain how the computer will read this statements. i.e. the computer interprets the statements sequentially, and reads the '=' symbol as 'becomes', not as 'equals'.

tossandturn · on July 20, 2014

Responding with "false" as the result seems logically incoherent, as it assumes that the first two lines of the the three-line program are "true" when there is no reason to assume that is the case.

Without any previous understanding of what computer programming is, what it does, or how it works, and relying solely on elementary mathematical learnings, is there a particular reason that one would assume that the first two lines are directives and the third line is what we are being asked to validate? I am too far down the rabbit hole to intuitively know if that is the case, can someone else suggest whether this is a plausible conjecture?

x1798DE · on July 20, 2014

It's been maybe 15 or so years, so I'm similarly pretty far down the rabbit hole, but I definitely remember having a lot of trouble with:

  x = x + 1

At the time, it seemed patently obvious that it was a false statement, because there is no single value of x for which this is true.

If the situations are analogous, my guess would be that you would assume that each of these statements is an assertion, and that at least one of them must be false. Intuitively, I'd guess that it's the last one that people would assume would be false, because as you're reading from top to bottom, you've already "accepted" the first two.

Lerc · on July 20, 2014

When teaching JavaScript to Kids about 10 years old. I tend to use x+=1 over x=x+1.

It seems to be much easier for people to attach the new construct += to to a new idea. I don't bring in the x=x+1 form until they have had plenty of use assigning other expressions to x. Kids don't seem to have any problem with x=y+1. They just need a little time for that idea to set properly before they start mixing things up.

demallien · on July 20, 2014

X = infinity?

cbd1984 · on July 21, 2014

Depending on what you mean by "infinity", sure:

If you take it to mean "The cardinality of an infinite set", "X + 1" to mean "The set X with one more element added to it", and "X = Y" to mean "X and Y have the same cardinality", then "X = X + 1" is entirely true.

Mathematics, like programming, is ultimately founded on definitions.

tinco · on July 20, 2014

I don't think infinity equals infinity. Not in the general mathematical sense anyway.

mindslight · on July 21, 2014

> is there a particular reason that one would assume that the first two lines are directives and the third line is what we are being asked to validate?

A much less narrow assumption is required to reach answers of "false". Think of each example as a system of simultaneous equations.

Since this test was seemingly designed relying on the idea of destructive updates, none of the given examples have satisfying assignments. But of course standout easy-pick answers of "no solution" would ruin the test. I'd really like to see a similar study of psychologies that took into account different programming bases. Perhaps such a test would even be a good way to sort students into separate intro classes that used a language suited to their preexisting mental model.

tinco · on July 20, 2014

No, I think one would assume they are all directives, and therefore the computer would complain about the third one which is a contradiction.

twic · on July 20, 2014

Exactly. To a non-programmer, even a mathematician (especially a mathematician), the example is a cousin of:

  Men are mortal
  Socrates is a man
  Socrates is not mortal

Which doesn't assign anything to anyone, it just isn't true.

duskwuff · on July 20, 2014

I wonder if the results would be better with a less idiomatic assignment syntax, like perhaps a throwback from the 60s:

    LET A = 10
    LET B = 20
    LET A = B

or a page out of TI's book:

    10 → A
    20 → B
    B → A

or perhaps something reminiscent of several of Apple's older natural-language programming experiments:

    put 10 into a
    put 20 into b
    put b into a

(The languages I'm referencing here are, respectively, classic BASIC, TI BASIC, and Applescript/Hypertalk.)

andolanra · on July 20, 2014

Several older languages—including Pascal and the Algol languages—will use the := operator for all assignment, on the grounds that assignment is a fundamentally asymmetric operation. In the ML family of languages, there are immutable definitions and mutable reference cells, and different operators for each:

    (* multiple bindings, so the inner x shadows the
     * outer x---indeed, this code would give you a
     * warning about the unused outer x *)
    let x = 5 in
      let x = 6 in x
    (* y is a mutable reference cell being modified *)
    let y = ref 5 in
      y := 6; y

Haskell makes a distinction between name-binding and name-binding-with-possible-side-effect, but still reserves = to mean signify an immutable definition and not assignment:

    -- this defines x to be one more than itself---which
    -- causes an infinite loop of additions when x is used
    let x = x + 1 in x
    -- a different assignment operator is used in a context
    -- in which side-effects might occur, and can be
    -- interspersed with non-side-effectful bindings:
    do { x <- readInt -- does IO
       ; let y = 1    -- doesn't do IO
       ; return (x + y)
       }

ef4 · on July 21, 2014

> "The logical answer of course being that the computer should throw an error or return false"

But that's not the question that was asked. The question is "what are the final values of a and b?". Someone adopting your interpretation would say "a=10, b=20". Otherwise, why else would you be claiming "a does not equal b"?

> "If you really want to test non-programmers native skill for working with computer, you should at least briefly explain how the computer will read this statements. i.e. the computer interprets the statements sequentially, and reads the '=' symbol as 'becomes', not as 'equals'."

That would somewhat defeat the point of the test, which is to gauge what mental models (if any) people have before they've been told anything about programming.

QuantumChaos · on July 20, 2014

That article didn't seem very coherent. E.g. on the original test he says:

"His test is not a very good predictor: most of those who appear to use a model in the pre-course test pass the end-of-course exam but so do many of those who do not."

How many pass, how many do not? That is the information needed to determine a predictive test.

Reading between the lines, I would guess that he got in trouble because his original article wasn't politically correct. One line in the retraction reads "We hadn’t shown that nature trumps nurture. Just a phenomenon and a prediction." I would guess he came under political pressure, and felt pressured to write this article in order to avoid further problems.

pjscott · on July 20, 2014

Your guess is correct. From this (annoyingly politicized) article,

http://retractionwatch.com/2014/07/18/the-camel-doesnt-have-...

... comes the following quote from the author of the camel-humps paper:

"I presented our latest results about 18 months ago at a PPIG workshop/conference in the UK. I felt it was helpful, since the claims I made had provoked hostility to the work, to retract those claims verbally. It had a dramatic effect, to the good. But I found (how I know is the confidential bit) that there are people who didn’t hear that retraction, and who are still hostile; and that hostility is doing harm. So I decided to retract more publicly.

"Interestingly, one person who I would have counted previously as hostile heard (indirectly) of the verbal retraction, and this summer was more than supportive. Research inspired by our work is going forward. So the retraction was worthwhile."

So, confirmed: there was an angry backlash, and this retraction is an attempt to calm people down.

hyperpape · on July 20, 2014

You're right that this article is a bit hard to read in places. But as for your "reading between the lines", give me some reason to believe you're not just projecting your pre-existing assumptions onto this article. What did he actually say that let you read between the lines.

QuantumChaos · on July 20, 2014

The reasons were in my post:

He has no real criticism of his original article. What he says about it is incoherent, not merely hard to read, as I point out in my comment.

And he makes reference to nature vs nurture, which is completely irrelevant to the actual issue, and only makes sense in the context of a political apology.

hyperpape · on July 20, 2014

The whole "our statisticians didn't find my claims supported by the evidence" didn't seem relevant?

QuantumChaos · on July 20, 2014

where does he say that specifically? He refers to statisticians helping to do further analysis on the data, but I don't see where they showed that the main claim (that the test is predictive of success in the course) is refuted by statisticians.

snori74 · on July 20, 2014

Fascinating, and brutally honest.

d0ugie · on July 20, 2014

Google Docs viewer view of the PDF (7 pages): http://docs.google.com/viewer?url=http%3A%2F%2Fwww.eis.mdx.a...

cbd1984 · on July 21, 2014

Didn't the HN site used to do something like this automatically for PDFs?