CodeThatDocumentsItselfSoWellItDoesNotNeedComments

IsaacSchlueter · on Sept 23, 2009

There's a lot of "short vs long" going on in the comments here. That seems silly to me.

Code should be written so as to completely describe the program's functionality to human readers, and only incidentally to be interpreted by computers. We have a hard time remembering short names for a long time, and we have a hard time looking at long names over and over again in a row. Additionally, short names carry a higher likelihood of collisions (since the search space is smaller), but are easier to "hold onto" for short periods of reading.

Thus, our conventions for naming things should take into consideration the limitations of the human brain. The length of a variable's name should be proportional to the distance between its definition and its use, and inversely proportional to its frequency of use.

Global config setting that gets specified once and used in 4 places throughout the program? 10-20 characters is probably appropriate. Might wanna go with UPPER_SNAKE_CASE to make it stand out a bit more, even.

Iterator variable that you define in a 3-line for loop and then never see again outside of it? Call it "i".

Another way to look at this: The first time you meet someone, you learn their full name. When discussing them with someone else who knows them, you use just a single name. If they're standing right there, you don't bother using their name, but just make eye contact, and maybe a "Hey". Should be the same way with variables.

grandpa · on Sept 24, 2009

> Another way to look at this: The first time you meet > someone, you learn their full name. When discussing them > with someone else who knows them, you use just a single > name. If they're standing right there, you don't bother > using their name, but just make eye contact, and maybe a > "Hey". Should be the same way with variables.

What an awesome way to think about this. Thank you.

plaes · on Oct 3, 2009

My approach is basically the same:

Long-uppercased names for anything global, settings or security-related (ie COOKIE_NAME, SECRET_HASH)

For dummy loop variables: i, j, k

For everything else I use common sense.

dratman · on Oct 3, 2009

So obviously variables should be able to have more than one name: a long one and a short one. SQL offers the ability to define an alias name.

Unfortunately that would be a kind of kitchen sink feature that no standards committee is likely to accept. As a substitute, one could create a variable with a long name, then put the value into another variable with a short name, but that could be confusing unless very clearly noted in a comment.

philluminati · on Oct 3, 2009

The same point was made in the book "Clean Code"

edw519 · on Sept 23, 2009

Papa Bear:

  for(i=ii;i<iii;i++){
    for(j=jj;j<jjj;j++){
      for(k=kk;k<kkk;k++){
        doSomething();
      }
    }
  }

Mama Bear:

  for(YearCounter=FirstYearOfCycle;YearCounter<LastYearOfCycle;YearCounter++){
    for(MonthCounter=FirstMonthOfYear;MonthCounter<LastMonthOfYear;MonthCounter++){
      for(DayCounter=FirstDayOfMonth;DayCounter<LastDayOfMonth;DayCounter++){
        doSomething();
      }
    }
  }

Baby Bear:

  for(Year=FromYear;Year<ThruYear;Year++){
    for(Month=FromMonth;Month<ThruMonth;Month++){
      for(Day=FromDay;Day<ThruDay;Day++){
        doSomething();
      }
    }
  }

mkyc · on Sept 24, 2009

The moral of the story is not to use shorter names, but to write better code.

  Date start = ;
  Date end = ;
  Iterator i = new DayIterator(start, end);
  while (i.hasNext()) {
    doSomething(i.next());
  }

Your Papa Bear example iterates over a cube in a 3d array. Your Mama Bear example is some seriously broken date iteration code (so you change lastDay from within doSomething?). The Baby Bear is equally broken 'time period' iteration code.

Writing self documenting code isn't about giving everything complete and proper names. It's about choosing comprehensible and trackable names. You introduce Mr. Joseph Harrison as such, but you may thereafter call him Joe.

  BufferedInputStream bufferedInputStream = ... // bad, too formal
  BufferedInputStream bis = ... // bad, too casual
  BufferedInputStream stream = ... // ok
  BufferedInputStream input = ... // good

The Baby Bear code's naming is a bit off too. Is it inclusive or exclusive? I would not be bothered if I saw:

  for (y=firstY; y<=lastY; y++) {

The letter 'y' is obvious (yy/mm/dd), and its scope is small.

edw519 · on Sept 24, 2009

The moral of the story is not to use shorter names, but to write better code.

The scope of the post was variable naming. I used dates and "The Three Bears" for a light example of variable naming that almost anyone could understand. I did not even begin to address "better code". I'll save that for another (or hundreds of other) posts.

Your Mama Bear example is some seriously broken date iteration code (so you change lastDay from within doSomething?).

The code is fine. It only becomes "seriously broken" if you change the loop variable within the iteration, in which case you seriously broke it with poor practice.

The Baby Bear code's naming is a bit off too. Is it inclusive or exclusive?

The naming is fine. So is the code. Your question of inclusivity or exclusivity is meaningless without understanding the assignment of the iteration variables, which again, was outside the scope of the post.

The letter 'y' is obvious

No it's not obvious. That's the whole point. It's extremely poor practice (See Papa Bear example.) You cannot assume the poor sucker who maintains your code will know what "y" means. And Heaven help him if he tries to do a global search for that variable in 2,000 lines of code. Which may lead him to reuse it inside the iteration. Which is just about the only way to "seriously break" it as in your Mama Bear complaint.

Your revisions will cause exactly the problem I was demonstrating to avoid.

mkyc · on Sept 24, 2009

My point is that naming verbosity depends on the code. This is vague but better than saying "cf. Goldilocks" ("choose anything between two crazy extremes"). My other point is that everyday use of names should be a guide for naming in code. The reason Mama Bear is wrong is that she's too formal. She gives a full explanation of the variable each time, but the 'full explanation' is the responsibility of the var setters, not of the name.

Your Momma Bear either misses the last days of several months, or counts extra days. It also fails to count the LastMonthOfYear. Your Baby Bear is nothing like your Momma Bear, though you seem to be refactoring. Momma counts entire months (look at the variable names, last day of month), Baby does something like "sum the totals from the first five days of Feb-May". These are naming bugs, they aren't irrelevant.

The naming depends on the code. If we're iterating a 3d array and the vars are set directly above, Papa Bear is readable and I wouldn't dick around with it (it might be a convention in the codebase).

  // baby bear
  for(y=firstY; y<=lastY; y++)
    for(m=firstM; m<=lastM; m++)
      for(d=firstD; d<=lastD; d++)
        sum += getDailyTotals(y,m,d); // line 4

I'm not writing tutorials here. If you don't understand y, m, d in lines 1-3 (especially having seen line 4), you probably haven't been working with dates enough to be messing with this code anyway.

I'm not saying you should choose single variable names, I'm saying they're sometimes ok, and that it's not as simple as "choose names with 4-8 characters".

joe_the_user · on Sept 24, 2009

I'm shocked the above viewpoint has not gotten the majority of votes, it should.

petewarden · on Sept 23, 2009

I'll come out of the closet and admit that I like descriptive names. There's a point they get ludicrous, but that's also a very clear sign the concept they're representing has become confusing and unwieldy too. If you can't come up with a name that's both clear and short, maybe the function's purpose is also unclear.

After many years of maintaining large codebases written by other people, the comments are very seldom useful, and often actively misleading, thanks to code changes over time. Coders seem a lot more reluctant to change a function so it no longer does what the name implies than they are to modify code without updating the comments.

rapind · on Sept 23, 2009

I'm with you all the way on descriptive names.

As for comments though, in my experience they can be extremely useful. And I don't mean just one liner's but a couple sentences here and there explaining what you're doing and why you're doing it.

Agree with you also though, that often a coder working with someone else's base won't take the time to do it. I think this is because we naturally take less pride in maintenance work then we do in the creation of an application.

abstractbill · on Sept 23, 2009

The only comments I ever find useful are the "unprofessional" ones - things like "WTF, FooCorp are complete dicks and didn't implement the Blah spec properly, so now we have to work around their shit here". Without those kinds of comments it can be hard to understand the motivation behind broken-looking code that is actually broken by necessity.

redcap · on Sept 24, 2009

Just aslong as they say how they have to work around the FooCorp fuck up rather than saying something like "Dunno how we did it, it just works".

pmichaud · on Sept 23, 2009

I would rather support that, than most of the code I've actually been asked to support.

joe_the_user · on Sept 24, 2009

I bet the real code had all the other problems of real code PLUS the absurdly long names.... At that rate, I'd worry about carpal tunnel syndrome.

tetha · on Sept 24, 2009

some letters + (In eclipse) Ctrl+Space, (in Vim) <C-p> or <C-n> in insert mode beat CTS with long names.

rapind · on Sept 23, 2009

Is it really all that difficult to throw in a comment explaining what it is you're trying to do when it's not obvious to someone other than yourself?

hughprime · on Sept 23, 2009

Eliminating _all_ comments is a little extreme, but if faced with the choice between excessively long names and insufficiently descriptive names, I'll go with excessively long every time.

I've spent way too many months digging around in Fortran code where every variable and function name was less than eight characters (actually the eight-character limit has been gone since Fortran 77, but some people still insist on writing Fortran 90 as if it were Fortran 4).

visitor4rmindia · on Sept 23, 2009

Was the code commented in enough detail? At work, we comment heavily and tend to use short variable names (especially local variables) so I'd like to know of outside experiences with similar code.

hughprime · on Sept 23, 2009

If your coding style works for your company, then I won't complain about it.

The code I'm talking about was incomprehensible on many levels -- most of the cryptically-named variables were global, and might have a comment explaining their meaning somewhere in the 200,000+ lines of code spread across 200 or so files, but it was often a struggle to find it. Other times, there would be no explanation at all. Even once you'd found out what it meant it was a struggle to keep it in your head and not confuse it with any of another bunch of similarly named variables (woe bedtide anyone who gets confused between nks, nqs and nkqs!)

This is all pretty typical of large scientific codes, though. Scientists are, as a class, the worst programmers on Earth.

kscaldef · on Sept 23, 2009

It appears to me that it's synchronizing inventory change records between a store front and the physical warehouse. I could be wrong, but I imagine it's something pretty similar to that. And, as much as I do shudder at this code, the names did give me a pretty good idea of the intent.

tetha · on Sept 24, 2009

This is however a major problem I have with comments: If it is obvious to me, my mind never ever spends a conscious second thinking about this. For me, the problem does not exist at all. So, tell me, how am I supposed to comment something which might be not obvious to someone else if I don't know it is there?

andrewljohnson · on Sept 23, 2009

Here here.

EDIT: Hear, hear.

hughprime · on Sept 23, 2009

a) http://en.wikipedia.org/wiki/Hear,_hear

b) The little triangular "upmod" button is perfectly sufficient to express your agreement if you don't have anything else to add.

mr_dbr · on Sept 23, 2009

They should just switch to Objective-C/Cocoa and that code would be perfectly normal!

gruseom · on Sept 23, 2009

Though the names are long, I don't think the naming style is all that bad. The real problem with this code is its non-orthogonal logic. It has duplication and crossed wires all over the place.

mildweed · on Sept 23, 2009

This example is only really annoying because the items were almost full sentences. I prefer small 2-3 word functions.

But really, like pmichaud said, I'd rather support this than, say, reverse engineer Google Analytics code http://www.google-analytics.com/ga.js

NathanKP · on Sept 23, 2009

Obviously the Google Analytics people didn't actually write that terrible mess. They used code compression software to obfuscate it and reduce its size.

But seriously, two to three word functions and variables aren't bad. Technically though C++ variables have a max size of 255 characters. At least that is what I seem to remember. Am I right?

known · on Sept 24, 2009

Check this http://99-bottles-of-beer.net/language-perl-737.html

teeja · on Sept 23, 2009

Blech! I'd rather stare at cockroaches. IfYaKnowWhatIMeanAndImPrettySureThatYouDoItWasADarkAndStormyNight

83457 · on Sept 23, 2009

Great idea for an April Fools Day prank

ilyak · on Sept 23, 2009

Pictures of adorable kittens on my news.yc.

NathanKP · on Sept 23, 2009

Umm... spam? I don't get it.

raganwald · on Sept 23, 2009

Quite possibly an arch way of saying that DailyWTF postings are a little lowbrow for HN?

NathanKP · on Sept 23, 2009

Ah.... true.

Just a second....

"arch" adj - deliberately or affectedly playful and teasing.

Upvoted for teaching me a new word usage. ;)

DannoHung · on Sept 23, 2009

Well, you can tell the developer of that was Indian.

DannoHung · on Sept 23, 2009

I actually wasn't trying to be derogatory. "Do the Same" is a pretty common Indian English phrase.

Just like "Do the Needful".

redcap · on Sept 24, 2009

And if you squint just a little bit you can understand what they're trying to say instead of getting on a rant horse about proper English grammar.