Yep, that's exactly the right takeaway. :) I have verified this on ARM64 also, b...

spockz · on April 25, 2021

If you like tail calls, look into CPS. Many forms of (pure) code can be rewritten in such a way.

Everyone who writes Haskell quickly learns to write their recursive functions so that they are (mutually) tail recursive.

dwohnitmok · on April 26, 2021

It's a bit of a weird case for Haskell. Because of laziness, you don't always want tail recursion (in fact sometimes you definitely don't want the tail recursive version). The classic example of this is foldl vs foldr.

  -- Tail-recursive so must be good right?
  -- That way lies thunk accumulation and *non-constant* 
  -- additional memory usage for a lot of functions f
  foldl f acc []     = acc
  foldl f acc (x:xs) = foldl f (f acc x) xs

  -- In contrast, this definition often results in constant 
  -- additional space usage despite the non-tail call.
  foldr f acc []     = acc
  foldr f acc (x:xs) = f x (foldr f acc xs)

Tail recursion is an unalloyed good only for strict languages.

Although I'm not sure why you put mutual in parentheses and maybe you meant to address laziness with that?

spockz · on April 26, 2021

You are right indeed regarding that there are cases where trail recursion does not work. But in this case I would say that the culprit is the laziness of f.

I put mutual in between parentheses because not all languages support optimising mutually tail recursive functions.

ufo · on April 25, 2021

On the matter of portability, I wonder if it would be possible to use some macro magic and/or a code generator to convert the tail-call version back into a more traditional while/switch loop, for the stack-based architectures.

While the tail call version is more architecture dependent, it's nevertheless more portable than assembly language. It's still C.

haberman · on April 25, 2021

I haven't implemented the fallback yet, but my thought was to keep most of the structure the same (dispatch function tail calls to the function for an individual operation) but then have the individual operation just return instead of tail calling back to dispatch.

I think this would be portable while still avoiding some of the performance problems associated with a traditional switch (like the bounds check).

elcritch · on April 25, 2021

I'm curious how this approach would fair for microcontrollers, well cortex m0-m4's actually. They're arm32 bit and may not have enough useful call registers. Still parsing protobuf's or interpreting WASM faster would result directly in better battery performance, etc.