An Eye Tracking Study on camelCase and under_score Identifier Styles

nerdzero · on July 25, 2013

This is really interesting. Nice find. Here's a very brief summary: Participants were shown a few words and then four different identifiers, one of which matched the words and three of which did not. The example in the paper is:

phrase: "full pathname"

identifiers: fill_pathname, full_mathname, full_pathnum, full_pathname

From the conclusions section: "Although, no difference was found between identifier styles with respect to accuracy, results indicate a significant improvement in time and lower visual effort with the underscore style. The interaction of Experience with Style indicates that novices benefit twice as much with respect to time, with the underscore style. This implies that with experience or training, the performance difference between styles is reduced. These results add to the findings of Binkley et al.’s study"

So, finding the correct identifier was just as likely regardless of style but the correct identifier was found faster if it was using underscores as opposed to camelCase.

I think the underscores acted as "whitespace" in this study which in my experience is the most important factor in how quickly you can visually parse code. However, from a subjective/aesthetic perspective, I still prefer using camelCase over underscores.

nkurz · on July 25, 2013

Your summary is accurate, but I'm not sure why they chose to measure what they did. Rather than matching "full pathname" to 4 identifiers that are not the same (due to the significance of the underscore), it seems to me more appropriate to ask the subjects to match "full_pathname" to "full_pathname".

I consider the underscore and capitalization as part of the identifier, and _want_ to be able to distinguish the (most likely) typo "full pathname" from the desired identifier. Trying their example, most of my time was spent rescanning to find "none of the above", then rereading the directions.

Suggested alternative that I think would be a better study:

Which of the following is the same as "fullPathname":

  1) "fullpathname"
  2) "fullpathname"
  3) "FullPathname"
  4) "fullPathname"
  5) None of the above

Which of the above is the same as "full_pathname":

  1) "Full_Pathname"
  2) "Full Pathname"
  3) "full_pathname"
  4) "full pathname"
  5) None of the above

thristian · on July 25, 2013

If I've understood the abstract and conclusion correctly, the message here is that people who are used to under_score identifiers read under_score identifiers more quickly than camelCase identifiers, and this is probably due to training.

I guess it's good we've got that established, at least.

exelib · on July 25, 2013

I've worked with both. Conclusion: there no difference. But for instance in Python undescores are prefered because of protected/private members convention.