Also, it would be a good idea to add [2010] to the title because some of the optimizations may be available in the latest gcc version. (I'd like to read an update, or a year by year comparison.)
There doesn't seem to be any differences. I've used http://gcc.godbolt.org/, where one can also use Clang and compare the produced assembly by it with GCC's. At least one optimization is done by Clang 3.4.1, but not by GCC 4.2.1 and 4.9.0.
The article is interesting. But in many articles also the HN comments are insightful and interesting, so I like to read them again. This is the reason I usually link to the most voted/commented resubmission.
Tail-recursions are the exact equivalent of loops.
We've lost sight of what "recursion" means. There's a difference between recursive functions and recursive algorithms, which need by definition a stack and there are lots of such problems that can't be optimized to run in constant space. Whether that stack is the call stack, or a manually managed stack (within a loop or a tail-recursion), that doesn't matter, what matters is that the space requirements are definitely not constant.
It's nice that the GCC compiler can optimize that Fibonacci sample. However you're not getting a guarantee that it can optimize every function that is optimizable. I'm also pretty sure that the compiler is extremely limited in what it can do here. And that is dangerous, as running in constant space is often a matter of correctness, not one of optimization. I would rather have the compiler err on me if it sees non-tail recursive calls, rather than try to be smart sometimes.
Scala handles this in quite a nice way: the compiler will optimize tail-recursive functions without needing any special effort, but if you want to ensure a particular function is tail-recursive as a matter of correctness, you annotate it with @tailrec and the compiler will error out if it can't optimize that function.
A recursive algorithm is one that computes a result by way of the same algorithm applied to a smaller input. (Smaller, or else the algorithm will diverge.)
The distinction between "stack" and "parameter" is a minor one. A simple factorial function has an ever-growing (accumulating) parameter.
Tail-recursive algorithms can be translated into loop algorithms. Thet are still recursive.
f x | x `mod` 2 == 0 = f x/2
f x | x `mod` 2 == 1 = f $ 3*x - 1
f 1 = [ whatever ]
is perfectly well defined on positive integers `x`, although many recursive calls actually apply `f` to larger values.
(To be fair, you didn't actually say what 'smaller' meant. I think it's one of the standard term-rewriting theorems that, if given the freedom to define 'smaller' appropriately, then any total recursive function does make only smaller recursive calls.)
> I'm also pretty sure that the compiler is extremely limited in what it can do here. And that is dangerous, as running in constant space is often a matter of correctness, not one of optimization. I would rather have the compiler err on me if it sees non-tail recursive calls, rather than try to be smart sometimes.
I've sometimes wondered why OCaml has "let" and "let rec", but not "let tail rec". It seems that something like this would be good for documentation and avoinding errors.
> Tail-recursions are the exact equivalent of loops.
Not necessarily. In the common case of a procedure calling itself a loop does wind up being equivalent. OTOH with mutually recursive procedures (two or more procedures which call each other), there is no straightforward way to convert the tail calls into a loop.
Your example assumes that you can inline the calls to a and b. If you can't (for example if one of them calls themselves) then your optimization won't work.
This is really only essential in languages which provide recursion as a primitive and not iteration. Scheme is the prototypical example and was, I believe, the first to require tail call elimination in its standard for this very reason. In C it really doesn't matter that much because you can just write the iteration explicitly and then you know you won't blow out the stack since there is no recursive call in the first place. As a bonus, many programmers find it easier to understand.
The advice about enabling -ffast-math is dangerous. Be advised that the fpclassify() family of functions (e.g. checking for NaN or infinity) no longer work as expected. If you really want to go into that territory, be sure to read the compiler documentation: -ffast-math is an option that in fact enables a number of other flags, and you may want to be a bit more picky in which ones you choose.
The first example is really dangerous. We recently had a issue with GCC not optimizing a tail recursion in a relatively large function with parameters passed in as references to std collections. The result - it was running out of stack space at large scale. So if you make it to a habit of using tail recursion extensively GCC can bite you where you don't expect.
Of course adding a unit test for every function saved the day as usual, however it is trivial to convert tail recursion into a loop and not worry about stack overflow.
It's nice to be able to annotate functions that should have some optimization done to it, and then the program fails to compile if it isn't able to perform the optimization.
I've used it when implementing a game "loop" (just continually asking user for input to the game). It's nice to be able use recursion when you want to, and be sure that the call stack won't grow larger just by playing the game for several rounds.
I wonder how that 2nd example optimisation would behave with multiple threads, with one thread reading the string and another thread appending to it. Or would that be considered undefined behaviour? Probably an edge case (and likely to be a race condition bug), but still worth thinking about...
That would be UB. In fact C does not know much about threading at all, the result of a function should be "equivalent" to what it would be if the statements were executed in order, unless you use some synchronization primitives.
But if you call a function, even without suppling s as a parameter, instead of simply result += s[i], then that function could have side effects on the string since the char *s is not "restrict".
I'm surprised by the if chain one. I tried it using cremno's gcc.godbolt.org suggestion, and the newest gcc doesn't turn it into a jump table like the author suggests, but clang does.
Now I think back many years to when I used bit shifts in C++ to 'optimise' some multiplications and realise I probably didn't measure twice before cutting. These days I hope I'm a lot saner!
It feels like I have decent intuitive hunches about what compilers can do; I mostly just ask the question, "is the problem conceptually simple?", and go from there. Of course, I'm missing all of the edge cases and subtleties when I'm doing that. And I don't actively try to do it when it comes to actual code; mostly for quizzes like this.
My intuition seems to fail me, though, when we get to things like escape analysis in languages like Java. It seems like a conceptually simple problem - that is, stack allocating or "unpacking" objects (like value classes/objects)that are __obviously__ going to not escape the scope of a method. I don't mean simple as in determining all objects. And the __obvious__ ones seem much more common than the non-obvious ones, in regular code. But then it seems that this is a fairly recent implementation, which might suggest that it isn't so simple to implement?
The JVM is ...hairy, and poorly understood. Some of its code has not changed since 1996, and much has been very heavily hand-optimized; the people who work on it today are not the people who originally wrote it. Implementing anything in the JVM is not so simple; it's not really a reflection on the complexity of the thing you're implementing.
(It might have got better now - I know the OpenJDK folks were making some efforts in that direction)
Most comments: https://news.ycombinator.com/item?id=1540567 (357 points, 1555 days ago, 60 comments)
Latest: https://news.ycombinator.com/item?id=6823768 (2 points, 329 days ago, 3 comments)
Also, it would be a good idea to add [2010] to the title because some of the optimizations may be available in the latest gcc version. (I'd like to read an update, or a year by year comparison.)