I was thinking about code that relies on TSO but doesn't insert synchronisation ...

adgjlsfhk1 · 2025-03-27T00:19:51 1743034791

Generally the way this works is that when you write atomic algorithms, you are doing 2 things. The first is telling the compiler what it's allowed to optimize, and the 2nd is controlling the processor. What this means is that the code that relies on TSO (which is pretty close to the C++ memory model), you add a bunch of information to the code that prevents the compiler from doing some optimizations, and then when the compiler is generating native code, on X86 it will turn into regular loads/stores, but on arm it will have additional fence instructions.

IshKebab · 2025-03-27T11:16:26 1743074186

Yes indeed. But it will only add the fences where needed whereas on x86 it basically has implicit fences everywhere which is clearly worse.

adgjlsfhk1 · 2025-03-27T12:17:28 1743077848

it's not actually that clear. hardware can do some pretty neat tricks to make the fences basically free when there aren't multiple cores writing to the same memory. as a result, most of those explicit fences are just extra front end pressure.

dundarious · 2025-03-26T22:58:50 1743029930

You can write data race free code with just compiler reorder fences and extremely limited sync primitives on x86/64, but cannot with most other ISAs.