Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Yep, that's exactly the right takeaway. :) I have verified this on ARM64 also, but architectures that pass parameters on the stack will make this technique not really worth it.

Re: PHP, Ruby, etc. yes, computed gotos help some, but I think tail calls would help much more. That was our experience at least. I wrote more about this here: https://gcc.gnu.org/pipermail/gcc/2021-April/235891.html

Yes, there are portability issues with the tail call approach, so there would need to be a fallback on non-x64/ARM64 platorms. This would add complexity. But it's exciting to think that it could unlock significant performance improvements.



If you like tail calls, look into CPS. Many forms of (pure) code can be rewritten in such a way.

Everyone who writes Haskell quickly learns to write their recursive functions so that they are (mutually) tail recursive.


It's a bit of a weird case for Haskell. Because of laziness, you don't always want tail recursion (in fact sometimes you definitely don't want the tail recursive version). The classic example of this is foldl vs foldr.

  -- Tail-recursive so must be good right?
  -- That way lies thunk accumulation and *non-constant* 
  -- additional memory usage for a lot of functions f
  foldl f acc []     = acc
  foldl f acc (x:xs) = foldl f (f acc x) xs

  -- In contrast, this definition often results in constant 
  -- additional space usage despite the non-tail call.
  foldr f acc []     = acc
  foldr f acc (x:xs) = f x (foldr f acc xs)
Tail recursion is an unalloyed good only for strict languages.

Although I'm not sure why you put mutual in parentheses and maybe you meant to address laziness with that?


You are right indeed regarding that there are cases where trail recursion does not work. But in this case I would say that the culprit is the laziness of f.

I put mutual in between parentheses because not all languages support optimising mutually tail recursive functions.


On the matter of portability, I wonder if it would be possible to use some macro magic and/or a code generator to convert the tail-call version back into a more traditional while/switch loop, for the stack-based architectures.

While the tail call version is more architecture dependent, it's nevertheless more portable than assembly language. It's still C.


I haven't implemented the fallback yet, but my thought was to keep most of the structure the same (dispatch function tail calls to the function for an individual operation) but then have the individual operation just return instead of tail calling back to dispatch.

I think this would be portable while still avoiding some of the performance problems associated with a traditional switch (like the bounds check).


I'm curious how this approach would fair for microcontrollers, well cortex m0-m4's actually. They're arm32 bit and may not have enough useful call registers. Still parsing protobuf's or interpreting WASM faster would result directly in better battery performance, etc.




Consider applying for YC's Winter 2026 batch! Applications are open till Nov 10

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: