Hm. I've heard arguments that C# or Java is slow for multiple reasons, but never because of the minuscule overhead of a virtual method dispatch when using objects behind interfaces (kinds similar to trait objects).
It's interesting that this is seen as significant here. Are we dealing with much shorter timescales, or just being eager to optimise everything?
Virtual dispatch per se is not terribly slow, as long as the branch is predictable by the CPU. The problem is that virtual dispatch prevents the sort of aggressive inlining and interprocedural opimizatios that C++ compilers are known to do. C# and Java JITers get around that via runtime analysis and speculative inlining, but that is done at runtime and eats away some of the precious little time available for optimisations.
Cost of a branch misprediction is 10s of cpu cycles. (1) Measured in gigahertz (10^9 cycles per second).
Time to turn around a web request is, if you're very lucky and have done the work, mainly about getting a value from an in-memory cache at multiple milliseconds (2). That's 1 / (10^3) seconds.
If you're not lucky, 10s or 100s of milliseconds to generate the response.
It seems that the second duration is best case around 10^6 times longer. I would not sweat the first one.
As an example, many real-time systems are often a giant ball of messy asynchronous code and state machines. Futures can help with that, although lately I have found that somtimes the best, cleanest, way to implement a state machine is to make it explicit.
How much do you attribute that to the benefit of creating a high barrier to entry for modifying that code? Could this be summarized as: code that inexperienced devs can't understand, stays performant because they can't figure out how to change it?
I assume at least part of it is that "zero-cost abstractions" is a fairly objective and boolean metric to calculate. "Is this performance impact significant enough to worry about?" would probably result in a lot more bikeshedding.
A tremendous amount of effort has gone into the CLR towards optimizing interface dispatch, because at one time it was slow. Interface dispatches are cached at the call site to avoid real virtual (vtable) dispatch, just like a Smalltalk or JavaScript VM would.
It's interesting that this is seen as significant here. Are we dealing with much shorter timescales, or just being eager to optimise everything?