"prefers working code over comprehensive documentation" does not mean "don't do ...

JohnBooty · on Aug 19, 2019

    Documentation is essential. How things work is an important thing to 
    document. Ideally it should be in version control and be generated 
    from the code, because then it's less likely to go out of date.

My solution to this is old and fairly unpopular, but I stand by it: anything in the codebase that's not obvious to a new maintainer should have a brief, explanatory code comment.

Generally, this falls into two categories.

1. Hacks/kludges to get around bugs in hardware, external services, or included libraries. These manifest in code as incomprehensible, ugly bits of code that are difficult to distinguish from code that is simply "sloppy" or uninformed. More importantly, they represent hard-won knowledge. It often takes many programmer-hours to discover that knowledge, and therefore many dollars. Why throw it away? (Tip: include the version of the dependency in the comment, ie)

    # work around bug in libfoo 2.3, see blahblahblah.com/issues/libfoo/48987 for info
    # should go away once we can upgrade to libfoo 3..
    if error_code == 42 reset_buffer()

...so that future programmers (including you) can more easily judge whether the kludge is still needed in the future.

2. Business logic. This too is difficult/impossible to discern from looking at code. Often, one's git commit history is sufficient. But there are any number of scenarios where version control history can become divorced from the code, or require a fair bit of git/hg/svn/whatever spelunking to access. And this of course becomes increasingly onerous as a module grows. If there are 200 lines of code in a given module, it is a significant time investment to go git spelunking for the origins of all 200 lines of code. Some concise internal documentation in the form of code comments can save an order of magnitude or two of effort.

    It still has problems (What do you do when the code and the 
    documentation disagree? Which is correct?), but they're not as 
    severe as the problems that arise when there is no documentation at all.

This is pretty easy to enforce at code review time, prior to merging.

In the first place, only a true maniac would intentionally update

    # no sales tax in Kerplakistan on Mondays
    return nil if country_code==56 and day_of_week==1

...without updating the associated comment. If they do neglect to update it, that's an easy catch at review time.

hnick · on Aug 19, 2019

Count me in as another old timer who agrees. I had a friend once throw the "code should be self-documenting" line at me once and it upsets me. That only really applies for code that is so simple it writes itself, and never has any gotchas hiding (and which useful project is like that?).

Leaning towards commenting "why" not "what" is another good general rule. "Self-documenting code" with sensible function and variables names and logical flow already cover the "what" fairly well.

irishsultan · on Aug 19, 2019

While I still would add a comment about the why, your last bit of code probably should be written without magic constants.

    # Some countries have sales tax rules dependent on the day of the week
    return nil if country_code==KERPLAKISTAN and day_of_week==MONDAY

The exact comment here could probably be more specific (e.g. where do you find these rules), but it also most likely shouldn't repeat the code (and the code should make clear what it represents).

mikekchar · on Aug 19, 2019

If you do the substitution as you suggest and then add a unit test, then you have something ;-) Something on the lines of "describe countries with sales tax dependent upon the days of the week => Kerplakistan doesn't have sales tax on Mondays" So now it's self documented and self testing.

But I agree with your statement that there should be a pointer to the business rules somewhere. Otherwise it's difficult to have a meeting with the business side and ask, "Has anything here changed?" I think that's the biggest thing people miss out -- It's not that hard to find the thing in the code if things change. It's super hard to make sure you are on top of all the business requirement changes.

arethuza · on Aug 19, 2019

But don't do what one memorably awful project I had to maintain did - to use that example they would have done:

country_code==FIFTY_FIVE and day_of_week==ONE

spuz · on Aug 19, 2019

But what if the definition of 55 changes? You'll be glad to have your table of constants then.

arethuza · on Aug 19, 2019

The project also defined HTTP, COLON, SLASH, WWW and DOT so that you would have:

   string url = HTTP + COLON + SLASH + SLASH + WWW + DOT ...

I swear I'm not making this up....

chessturk · on Aug 19, 2019

Reminds me of http://pk.org/rutgers/notes/pikestyle.html

> "There is a famously bad comment style: ...

Don't laugh now, wait until you see it in real life."

rpmisms · on Aug 19, 2019

Sounds like a PHP codebase I'm currently working in. I shit you not, $LI = '<li>' is in the functions file, along with $LI_END.

arethuza · on Aug 19, 2019

It was a very enterprisey Java codebase from the bad old days of J2EE - it had somewhere over 30 layers of abstractions between the code in a JSP and a web service call.

[NB 30 isn't an exaggeration - I think the vast team who wrote it were paid by the abstraction or something].

snovv_crash · on Aug 19, 2019

Well at least a typo sould give a compile time error for some subset of typos.

But in the trade-off in code readability was probably the cause of many other mistakes, so probably ended up further behind.

nitrogen · on Aug 19, 2019

But how else will your compiler tell you that you mistyped 55? /s

JohnBooty · on Aug 19, 2019

That's a very good point. Avoiding magic numbers would have removed the need for an explanatory comment in my example.

richardwhiuk · on Aug 19, 2019

Comments are often a code smell. In lots of examples, better variable naming, breaking something out into a function, or constants often reduces the need for a code comment.

kickopotomus · on Aug 19, 2019

I disagree. Code ages and people move on. 2 years down the line some new guys are maintaining the code base. Some new guy is testing the system and notices that sales tax values seem to be "strange" for Kerplakistan on certain days of the week so they create a ticket for it. Then that goes through the typical pipeline. Another member of the team gets assigned the issue and looks into it. They come across the line:

  return nil if country_code==KERPLAKISTAN and day_of_week==MONDAY

Hmm.. Well that's strange. I don't have a background in Kerplakistan monetary policy so I don't know why we aren't assessing sales tax on Monday. Perhaps Kerplakistan is a special case. Is that being handled somewhere downstream? Then 1-2 hours later, after shuffling through source and eventually just Googling Kerplakistan sales taxes, you discover what someone found out 2 years ago when they wrote that line. Now you resolve the ticket and move on with your day but you just wasted a couple man-hours on a non-issue that could have been resolved instantly from a code comment.

Comments are as much for the next guy as they are for you.

richardwhiuk · on Aug 20, 2019

Without a more concrete example, it's difficult to suggest what the better fix would be.

Code smell doesn't mean you should never do it, just that often there's a better way.

JohnBooty · on Aug 20, 2019

Here's a more real-world example.

I worked on an enterprisey line of business app that assigned sales leads to salespeople.

The algorithm to do this was a multi-step process that was (1) rather complex (2) constantly being tweaked (3) very successful (4) contained a number of weighting factors that were utterly arbitrary even to veterans of this app.

It was full of many `if country_code==KERPLAKISTAN && day_of_week==MONDAY` -style weighting factors. Each represented some hard-won experience And when I say "hard-won" I mean "expensive" -- generating leads is expensive business.

We had a strong culture of informative commit messages, but this file had hundreds if not thousands of commits over the years.

It was the kind of code that resisted serious refactoring or a more streamlined design because it was a recipient of frequent change requests.

A few human-readable comments here and there went a loooong way toward taming the insanity and allowing that module to be worked on by developers besides the original author.

Knowing the why for many of these rules made it much easier to work with, and also allowed developers to be educated about the business itself.

collyw · on Aug 20, 2019

I agree. The most obvious place to find an explanation of a piece of code, is right beside that code. Not hidden away in some git commit message or nested away in confluence.

0xffff2 · on Aug 19, 2019

>anything in the codebase that's not obvious to a new maintainer should have a brief, explanatory code comment

I'm not at all convinced that this is unpopular, but I think it's a whole lot harder than you're letting on. Unless you have a constant stream of new people coming in and you can convince them to give honest feedback, you don't actually know what's not obvious.

base698 · on Aug 19, 2019

Why not:

    return nil if country_code==KERPLAKISTAN and day_of_week==MONDAY

Then you don't need comments and the sync problem goes away?

TeMPOraL · on Aug 19, 2019

Except this doesn't retain the crucial information: why? It looks arbitrary. The thought that "some countries have sales tax rules dependent on the day of the week" may or may not be obvious from the context. At the very least, the comment pins a point in the space of all possible reasons for that piece of code - with it, you know it's related to sales tax and week days, and isn't e.g. a workaround for the bug with NaNs in tax rates that you saw on the issue tracker last week.

JohnBooty · on Aug 20, 2019

This is admittedly a trivial example, but ideally you want developers who understand why we're doing this.

Is this a quick thing somebody hacked in for a special, one-off, tax-free month in Kerplakistan as the country celebrates the birth of a princess?

Is this a permanent thing? Will there eventually be more weirdo tax rules for this country? Will there be others for other countries?

Knowing the "why" would help a developer understand the business, and reason about how best to work with this bit of code... should we just leave this ugly little special case in place? Should we have a more robust, extracted tax code module, etc.?

Commit messages help to accomplish this too, and can offer richer context than inline comments. Each has their place. Sifting through hundreds of commit messages in a frequently-updated module is not a great way to learn about the current state of the module, as the majority of those commit messages may well be utterly stale.

Ultimately the cost of having some concise inline comments is rather low, and the potential payoff is very large.

Remember that the longer term goal (besides the success of the business) of software is to have your developers gain institutional knowledge so that they can make more informed engineering decisions in the future.

jbverschoor · on Aug 19, 2019

Yup. This + some diagrams for models and infrastructure is plenty

ptero · on Aug 19, 2019

> Documentation is essential. How things work is an important thing to document.

I agree with this 100%. However, to be useful it needs to hit the right level of crudity. For most projects, a short (<10 pages) description of goals, design principles, architecture and an overview of interfaces is sufficient.

It is best when this exists as a standalone document which is a required reading for any new developer. After this they can look at module descriptions, function docs, code, etc. and understand how to make sense of it and how to add their code without breaking general principles of the project.

> Ideally it should be in version control and be generated from the code, because then it's less likely to go out of date.

With this, I have some beef. In my experience the best documentation is the one that complements the code. Usually this means a short description by a human that explains what this chunk of code does and assumptions or limitations (e.g., "tested only for points A and B in troposphere") and IME most useful information is not derivable automatically. Auto-generated docs are very useful, but cannot replace clean explanations written by a human. My 2c.

0xffff2 · on Aug 19, 2019

I think there's a lot of ambiguity in the phrase "generated from the code". When I hear it, I think if docs generated from doc-comments embedded in the code, which hare clean explanations written by a human. They just have the advantage of being right next to the code, so they're a lot more likely to be updated when the code changes than an entirely external document.

"Documentation" that is nothing more than the interface definitions in HTML for is worse than useless. I can get all of that from just reading the code.

roland35 · on Aug 19, 2019

I think there is room for this to be two documents if a project is large enough - one which resides inside source control which explains the design and how it works, and one which is external (sometimes managed by corporate level document control) which explains the "so what", including top level requirements and so forth.

These could be just one document if the project is small enough.

abdullahkhalids · on Aug 19, 2019

> Ideally it should be in version control and be generated from the code, because then it's less likely to go out of date

Interestingly, this has been a big point of discussion in the Dota 2 playerbase. Dota 2 is one of the most complex games ever created and it rapidly changes on the order of days or weeks. At one point, the in-game descriptions of spells were months or years out of date because they were being updated manually. After much hue and cry from the community, the developers finally made the tooltips get generated from the same code that determined the spells' effects. Things are a bit better now.

There is still a quite a bit of ways to go though, in terms of generating documentation for all the other mechanics in the game, which are crucial for gaining competency in the game, but which are only available due to third-party community efforts (often via people reading the game's codebase to understand subtleties), instead of being available inside the game.

vharuck · on Aug 19, 2019

It's surprising that wasn't being done in the first place. I used the Warcraft 3 map editor, and it was simple to include references to attribute values in an object's description. Don't know why the DotA2 team didn't port that feature over when moving to the new engine.

roland35 · on Aug 19, 2019

This is a good example of a general rule of thumb I learned, if you need to do something once or twice do it by hand, but if you do something three or more times make it a function! Looks like Dota 2 updated their spells a few more than 3 times ;)

bcrosby95 · on Aug 19, 2019

I use this rule for introducing abstraction: don't do it unless you have at least 3 different use cases you're abstracting, and the test suite doesn't count.

madeofpalk · on Aug 19, 2019

> Ideally it should be in version control and be generated from the code, because then it's less likely to go out of date

Not always - when you want to document the requirements (in whatever format), having them be separate from the code is often a plus. The code might implement the requirements incorrectly, so being able to recognise that is important.

I find this very similar to writing tests that are separate from your implementation. In fact, Cucumber/BDD tests try to make product requirements executable to validate the software has been written correctly to meet the requirements.

mrmonkeyman · on Aug 19, 2019

I never understood why generated API docs are "documentation". That is source, trivial technical info which is easy to find in the source anyway.

I never got documentation about the thought processes, the iterations, the design meeting, the considerations, etc. Which is way, way more important to understanding a system in context than knowing "convertLinear" takes 2 unsigned ints.

collyw · on Aug 20, 2019

> Writing a few hundred pages of specification and handing it over to the dev team is waterfall, and it is _this_ that the Agile manifesto signatories were interested in making clear

That doesn't sound too bad from a dev point of view, better than the opposite - half arsed specifications with no thought given to the important details. Though I can imagine a lot depends on what exactly you are trying to build.

clumsysmurf · on Aug 19, 2019

Thanks for the thoughtful response, this is helpful.

> Ideally it should be in version control and be generated from the code ..

May I ask if you have suggestions for tooling to capture the high level documentation. We use javadoc a little, but it seems best for lower level reference. Also for diagrams, like sequence diagram and/or state machines, how do you capture this?

Thanks.

konstmonst · on Aug 19, 2019

Use graphviz (dot tool for example) for state machines. It is a text format where you list state machine transitions and it generates a visual representation.

Or better yet: generate your state machines from same format you would use to generate visual representation.

PaulRobinson · on Aug 19, 2019

Don't be afraid of building a product specification, and doing it in Markdown and auto-generating a mini-website out of it.

Just build a product specification for how the product works (which is useful documentation), not how the product will work (which is waterfall).

We're experimenting with this a little, and I'm getting into document-driven development a little: if the product spec is in markdown, why not create a pull request on it as part of your story/project planning that shows the changes that would happen as a consequence of your work. Once the story is done, you can merge the pull request, even. We're not quite there with this yet, but I'm optimistic.

Putting design assets into your repo is also acceptable, and also paying time and attention to commit messages can be really, really helpful. I love this talk, for example: https://brightonruby.com/2018/a-branch-in-time-tekin-suleyma...

somtum · on Aug 19, 2019

How does Millet and Tune's DDD book compare with Eric Evans? Are they both worth reading?