Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I was thinking about code that relies on TSO but doesn't insert synchronisation primitives for the compiler, and it just happens to violate the expected ordering. E.g. maybe the code would break if you increased the optimisation level or switched compiler.


Generally the way this works is that when you write atomic algorithms, you are doing 2 things. The first is telling the compiler what it's allowed to optimize, and the 2nd is controlling the processor. What this means is that the code that relies on TSO (which is pretty close to the C++ memory model), you add a bunch of information to the code that prevents the compiler from doing some optimizations, and then when the compiler is generating native code, on X86 it will turn into regular loads/stores, but on arm it will have additional fence instructions.


Yes indeed. But it will only add the fences where needed whereas on x86 it basically has implicit fences everywhere which is clearly worse.


it's not actually that clear. hardware can do some pretty neat tricks to make the fences basically free when there aren't multiple cores writing to the same memory. as a result, most of those explicit fences are just extra front end pressure.


You can write data race free code with just compiler reorder fences and extremely limited sync primitives on x86/64, but cannot with most other ISAs.




Consider applying for YC's Winter 2026 batch! Applications are open till Nov 10

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: