TF can inline, but if it runs that optimization then the benchmark would be turned into a constant, so it wouldn't be measuring the call overhead of the way arguments are passed.
Inlining is something I would think also eliminates stack frames. So a comparison between an optimization that gets rid of a complicated stack frame and a compiler that has inlining turned off isn't something I would consider ideal.
Their numbers seem cherry picked or is Turbo Fan actually unable to inline code?