Small Functions considered Harmful

sctb · on Aug 26, 2017

Previously: https://news.ycombinator.com/item?id=14988206

manmal · on Aug 26, 2017

It occurs to me that author refutes one dogma (make functions as small as reasonably possible) only to propose another dogma (small functions are harmful). No thanks, I will continue improving my style such that it matches my abilities, and hopefully also the abilities of people reading my code.

Since I'm in the position to use Swift, I often define small functions WITHIN functions, in order to give the behavior of a code block a name. I'm also using LOTS of immediately executed closures (is that the correct term?) to reduce scope and profit from slightly easier control flow. And I don't see how this would be bad.

Meta: I thought we were over this dogma thing. I watched the TDD thing become popular, and then fall from grace - why do people still need to pretend they know better than me, the reader? Do they want to become gurus, or do they soothe their own feeling of "doing it wrong"?

falcolas · on Aug 26, 2017

> why do people still need to pretend they know better than me, the reader

Perhaps they do? Or perhaps they just see a trend starting up in their day-to-day work and want to try and nip it in the bud?

I see this exact same trend forming: in my last code review I saw a function which had exactly one line of code in it - a string format operation. It was called twice, and had no likelihood to be called more.

That was the most gregarious, but not the only, example.

> And I don't see how [lots of small functions] would be bad.

Well, I am not sure about Swift's implementation, but functions are typically expensive to execute. They require pushing values onto the stack, switching control, executing code, popping values off the stack, switching control back, etc. Lots of CPU operations and memory manipulation (not to mention cache misses, etc). There's also the possibility of blowing out your stack if you store too many values on it, or nest too deeply.

From a cognitive point of view, you have to interrupt your current flow of reading, move to a different spot in the current file (or another file entirely), to track down the behavior of a certain function. As the functionality of that function mutates over time, its name is rarely changed to reflect the updated functionality, which means it can create confusion and require even more time to consider the corner cases.

timrichard · on Aug 26, 2017

I know exactly where you're coming from...

I worked at one place where, during pairing, there was pressure to place a single line of code in its own function. Just so that function name could be coerced into being a sort of comment, as comments themselves anywhere under any circumstances were "bad practice".

I put it down at the time to Uncle Bob Cargo Cultism, but found it a bit irritating nonetheless. Similarly annoying that the linter would reject a commit because a function had more than 14 lines of code in it. Under any circumstances.

However, I've been recently delving into some styleguides, and thought I'd refactor a web app server according to principles in the styleguide. So the first pageful is more of an Executive Summary of the module, and everything happens in functions below the fold. As everything wasn't nested more than a couple of levels deep, I was quite pleased with the results. Unlike the crazy level of nesting on the previous project, where I was doing as much branching as the CPU just in order to read the stuff.

I suppose it's the same old, same old; a little is great, a lot is harmful.

Tarean · on Aug 26, 2017

You are not going to blow your stack without recursion, even with a fairly small stack size. Iirc swift has huge stacks for objective-c interop by default so even that shouldn't be a problem.

Overhead is a fair point for jit/interpreted languages if you want to write performance critical code in a jit/interpreted language for some reason. Tiny functions are almost certainly going to be inlined if the compiler does its job, though, so it isn't a huge deal for compiled languages.

And the point is that the function can be so tiny that you would just replace it with a new function. Then you can reason locally about behavior. This is a very functional approach, though, and can break horribly in imperative languages so there definitely is a balance.

falcolas · on Aug 26, 2017

"Almost certainly" means that there will be a number of times where they "almost certainly" will not.

Inlining is hard (especially with languages which allow for overriding functions), and is not done perfectly (or at all). The LLVM compiler toolchain in particular has a lot of properties which can inhibit inlining code; and in some cases you even have to explicitly tell the LLVM compiler when it can inline a function.

majewsky · on Aug 26, 2017

> Since I'm in the position to use Swift, I often define small functions WITHIN functions, in order to give the behavior of a code block a name. I'm also using LOTS of immediately executed closures (is that the correct term?) to reduce scope and profit from slightly easier control flow. And I don't see how this would be bad.

You don't need a language with closures even. Many C-style languages [1] allow you to add curly braces around a couple of statements without using any other control structure (if, for, while, etc.), just to force variables declared inside this block to go out of scope at the end of the block. I've used this for complex functions that do not lend itself to useful modularization, to indicate which variables are short-lived, and which are carried forward to later parts of the function.)

[1] e.g. C++, Perl; not sure about C itself

chatmasta · on Aug 26, 2017

The author is really overthinking this. Small functions, big functions, long variables, short variables... none of this matters. The goal of programming is to express a solution to a domain-specific problem, using a general programming language. The job of a programmer is to arrange a set of logical building blocks (the general programming language) into the domain-specific solution. Good code will be naturally express its domain specific logic in a way that is both readable and maintainable.

Readability is most strongly related to the ease of following all possigle code paths from a given entry point. That is, can I start reading in main() and iteratively follow the possible code paths given a specific input?

The answer to this question is more related to separation of scopes than it is to function size. If a single scope contains many possible branches and code paths, then it will be difficult for a reader to follow a specific code path amongst the noise of irrelevant functions (short or long).

All else equal, the length of a function is primarily a matter of style and preference. The priority of code should be expressiveness and clarity. If small functions enhance expressiveness, then they are the tool for the job. If big functions add clarity or aide in separating scopes, then they are also a tool for the job. Sometimes you need both.

Worrying about anything other than readability, maintainability and correctness is needless dogma.

falcolas · on Aug 26, 2017

Here's the problem - dogma already exists, and it exists in the name of readability, maintainability, and correctness. DRY is the particular bit of dogma which has been taken to the extreme.

And this is the thing about extremes - to get back to the center, you have to push towards the opposite extreme (and hope your push has the right magnitude to not overshoot the proper location). It's why the title is extreme, and the article is not. "Small Functions Considered Harmful" is easy to remember, and a soundbyte which can compete at the same level of "Don't Repeat Yourself".

chatmasta · on Aug 26, 2017

Exactly. The problem is not Dogma A or B, but rather the use of any dogma as a crutch in determining if code is readable.

Here's the thing... code is either readable and maintainable, or it's not. You don't evaluate the readability of code with some kind of dogmatic checklist. You just read it and see if it makes sense.

Dogmas are good for rules of thumb, which are mental shortcuts with safe error margins. Shortcuts are helpful in that they provide cover for knowledge gaps. But they are a short term solution. The long term solution is to develop adaptable instincts that provide better guidance than any dogma or rule of thumb.

Looking at dogma this way, as a shortcut to cover knowledge gaps, its no surprise that adherence to dogmatism appears to be inversely correlated with programming experience. Newbies who haven't seen many contexts will cling to dogmas. But as they gain experience, they find situations where dogmas don't necessarily yield the best solution. Slowly that experience develops into instincts, which replace the crude dogmatic ways of thinking. But instincts only come with experience, and nobody is an expert in every domain, so dogmas have their place. But reliance on them should be a temporary solution, not a long term guiding principle.

marcosdumay · on Aug 26, 2017

That's just too bad. Because such pushes into the other extrema practically always overshoot, and because the author can not anymore push the line that dogmatic programmers are not good programmers.

majewsky · on Aug 26, 2017

It's like politics all over again.

sopooneo · on Aug 26, 2017

"You have to lean into the wind to stand up straight"

andybak · on Aug 26, 2017

> The author is really overthinking this.

Isn't the author trying to correct a dogma, just as you are?

hyperpallium · on Aug 26, 2017

One criterion for decomposition into functions is to mimic the structure of the problem domain. This makes it easier to check against the domain, to reason about one in terms of the other, and to extend. It's especially helpful for interactions with problem domain experts. A "domain" includes math. Efficiency may require a different structure - note though that when efficiency trully is paramount, it licenses outright barbarism.

Another approach is "disproportionate simplicity", and works out similarly to an abstraction, something of a DSL. If you can modularize functionality, such that that aspect becomes much simpler, do it. Note that it might not solve the whole of that problem (like the core of git is a "stupid content tracker", it solves the part it does tackle with simplicity disportionate to the alternatives; but that simple core doesn't solve everything). It's about where to draw bounardies (between modules, with functions being one kind of module).

Yet another is the traditional criterion, based on likelihood of change: boundaries between modules [functions] should be unlikely to change; the stuff that is likely to change should be "hidden" within a module [function]. Unfortunately, I've found prediction to be tricky, particularly when it concerns the future of programs.

jswizzy · on Aug 26, 2017

The main point of small functions is they tend to be more readable. If they are too small then you get into code golf territory and loss readability. It's a balancing act, you need to follow the Goldilocks principle when writing algorithms. Also, the author states at the end that small functions aren't bad so the title is misleading.

falcolas · on Aug 26, 2017

Well, there are small functions (that encapsulate a particular bit of behavior), and small functions (that encapsulate a repeated bit of code). The OP wants to get rid of the second, not the first. The problem is, they're both "small functions".

userbinator · on Aug 26, 2017

IMHO if you look at the history of how computers were programmed, then you will see that functions/subroutines/procedures/callable blocks/etc., as well as other abstractions, were basically invented to serve a very obvious and utilitarian purpose: to reduce duplication of code; and anything beyond this is dubious.

I find "straight-line" code, perhaps in one long function, with comments to provide commentary and "navigation", far easier to read than the dozens-of-tiny-and-verbosely-named-functions style too. The "readability" argument is commonly used, but it's focusing on the wrong thing: a single-line function is certainly going to be easier to understand than a 512-line function, but understanding a single-line function does not make it any easier to understand the system/algorithm/etc. as a whole. The latter is extremely important, because not knowing "the big picture" can lead to very bad decisions overall; I've seen many cases where bugs or accidental and severe inefficiencies (e.g. unnecessary allocations, high-polynomial complexity, multiply duplicate accesses to data, etc.) were created because the author of the code only focused on a tiny piece and neglected to consider its application in the whole.

There are some very insightful posts by an APL(!) programmer here, discussing the topic of complexity overall vs. complexity in parts:

https://news.ycombinator.com/item?id=13565743

https://news.ycombinator.com/item?id=13797797

I suspect part of the motivation for producing "microfunctions" may have come from a misunderstanding of the "decompose the problem" principle --- which is intended to mean that you, as a programmer, decompose the problem into simpler steps --- but not that each step necessarily warrants a function.

The same problem and principles apply to other levels of organisation: classes, structures, files. etc. --- they are intended to reduce duplication and simplify code, but will have the opposite effect if used to excess.

rufius · on Aug 26, 2017

Writing code has many analogs to writing prose. If you were writing a letter and you only used small sentences, it'd be irritating to read and probably difficult to follow.

Rather than dogmatically applying "no small functions" or "break functions down as much as possible" it's more useful to look at how the code communicates the idea (and achieves the goal).

If the function is calling 30 things before it does the 4 things that logically map to the function name, maybe consider refactoring out all or parts of the preceding 30 steps.

pvg · on Aug 26, 2017

Functions and other forms of structural and semantic composition are among the (many) things that make writing prose entirely unanalogous to writing code. You can't sprinkle prose with "descriptiveParagraph(targets, descriptors)" or "dialogScene(characters, lines)".

olalonde · on Aug 26, 2017

This would be 10x more useful with some code examples.

mto · on Aug 26, 2017

I haven't read the whole thing but I'd briefly say: true, there are many codebases where you have to jump through a million strangely-named onliner functions (+5 lines for parameters, braces, modifiers etc... sometimes as declaration and definition) to find out what that thing actually does. It pollutes the name space, fragments the code as hell, fills up a lot of writing space, fills up the stack and the mind. A crash yields a call hierarchy that fills a roll of toilet paper.

Hope that's a good tl;dr for the article ;)

quadcore · on Aug 26, 2017

The problem with this industry, software development, is that it didn't yet get that programmers are beginners for a very long time. Like the human child, it takes 20 years of programming experience to mature a programmer and before that he's just a child.

Let's assume I'm right here for a minute. If most programmers then are intermediate programmers, what kind of advise needs most to be given knowing businesses needs to run? The ones that do "damage control". There is no issue with that, it's ok to do damage control.

Let's take some example. "readable code". Yes for an intermediate programmer it is a good thing if the code is not too dense because those code tend to draw all the programmer's energy when he tries to make sense of it. But as the programmer get experience, he can read more and more dense code to finally be able to get in a glimpse what looks like obfuscated code to others. For a mature programmer, the important point not readability, it is for the code to be as small as possible which generally tend to produce very dense code. The mature programmer draws on his abilities to understand obfuscated code to write even better code.

"write a test first" (beside the fact you're here actually asking a beginner to solve his code problems by writing more code which is at best controversial to me). A mature programmer don't write a test first, he thinks that way already - and yes he delivers code that has less than a percent failure rate. Now, some code may require unit tests, I'm not saying this is wrong in every situation, it's just not right in every situation.

Anyway, what I'm trying to say is: I wish we have a way to finally make a clear distinction between good advises for intermediate programmers and what is actually mature programming because it's very different - the mature programmer gonna make the function as big as it make sense to be, he isn't gonna artificially break up his body into small functions to make it more readable: he can read the code already.