> I assume that's for good use of the CPU's instruction and microcode caches. I ...

> I assume that's for good use of the CPU's instruction and microcode caches.

I don't think that is the reason. These are microbenchmark results, where realistically all the code will be hot in caches anyway.

The problem is that a compiler optimizes an entire function as a whole. If you have slow paths in the same function as fast paths, it can cause the fast paths to get worse code, even if the slow paths are never executed!

You might hope that using __builtin_expect(), aka LIKELY()/UNLIKELY() macros on the if statements would help. They do help somewhat, but not as much as just putting the slow paths in separate functions entirely.