Not really, once a program is compiled with -retpoline, new hardware won't bring...

maxerickson · on Jan 4, 2018

Horizontal scaling though. If every individual processor is slower, more are needed.

mike_hearn · on Jan 4, 2018

Not quite - lots of "serious" applications these days are written to target JIT compilers, which would be capable of switching retpoline on and off depending on need.

CmdDot · on Jan 4, 2018

Funnily enough, I ended up not including a PS starting with "A sufficiently smart JIT, however..." ;)

gmueckl · on Jan 4, 2018

I'd rather have linkers go down a similar road that the Linux kernel went on a over a decade ago: provide binary patches in a table (essentially alternative machine code) and have the linker patch the correct alternative depending on the CPU and it's bugs. The Linux kernel already contains an "alternatives" segment which is exactly this kind of list of patches. It would be trivial to add such a table to ELF and PE formats and have the runtime linker process that while it's plowing through the code anyway.

lower · on Jan 4, 2018

Something like this exists with function multi-versioning: https://lwn.net/Articles/691932/

For example, glibc chooses optimised machine code for memcpy depending on the CPU it runs on.

ant6n · on Jan 4, 2018

New CPUs could just convert the retpoline back to the original jump in microcode, and enable the now timing-attack safe branch predictor.

gmueckl · on Jan 4, 2018

But even then a performance hit remains due to the increased code size of the instruction sequence.

ant6n · on Jan 5, 2018

There's a lot of space left in code already to insert trampolines later. And in the end of the day most memory is data, not code.

And eventually, this code will get replaced anyway (just like today there are often multiple code paths in binaries, and a lot of code is compiled for host anyway).

In any case, the performance impact of a couple extra bytes per indirect call is small compared to disabling branch target prediction.