Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Not really, once a program is compiled with -retpoline, new hardware won't bring back reliable branch prediction.

I'd hope maybe, just maybe, this would be enough to put a focus on compilers producing code that ends up using processor-optimized paths chosen at runtime, to avoid "overheads ranging from 10% to 50%".

Though, in this case, that would essentially mean making the entire executable region writable for some window of time, which is clearly too dangerous, so I guess the 0.1% speedups from compiling undefined behavior in new and interesting ways, will continue taking priority.

I mean, it's a compiler flag right, obviously whoever's going to run a program on an unaffected platform will take the effort to recompile everything with the flag removed.

Just the same way every serious application currently provides different executables for running on systems where SSE2, SSE4.1, or AVX2 is present.



Horizontal scaling though. If every individual processor is slower, more are needed.


Not quite - lots of "serious" applications these days are written to target JIT compilers, which would be capable of switching retpoline on and off depending on need.


Funnily enough, I ended up not including a PS starting with "A sufficiently smart JIT, however..." ;)


I'd rather have linkers go down a similar road that the Linux kernel went on a over a decade ago: provide binary patches in a table (essentially alternative machine code) and have the linker patch the correct alternative depending on the CPU and it's bugs. The Linux kernel already contains an "alternatives" segment which is exactly this kind of list of patches. It would be trivial to add such a table to ELF and PE formats and have the runtime linker process that while it's plowing through the code anyway.


Something like this exists with function multi-versioning: https://lwn.net/Articles/691932/

For example, glibc chooses optimised machine code for memcpy depending on the CPU it runs on.


New CPUs could just convert the retpoline back to the original jump in microcode, and enable the now timing-attack safe branch predictor.


But even then a performance hit remains due to the increased code size of the instruction sequence.


There's a lot of space left in code already to insert trampolines later. And in the end of the day most memory is data, not code.

And eventually, this code will get replaced anyway (just like today there are often multiple code paths in binaries, and a lot of code is compiled for host anyway).

In any case, the performance impact of a couple extra bytes per indirect call is small compared to disabling branch target prediction.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: