While the sentiment is correct as to why compilers makes alignment assumptions, ...

AshamedCaptain · on Aug 20, 2023

Your message is more misleading than the GP.

Many architectures sold today still claim unaligned accesses are optional (e.g. all ARM pre-v7, which includes the popular Raspberry Pi Zero). Not to mention that even if they are supported, not all instructions support it (which is the case today on all ARM cores and even on x86).

From the architectures and instructions which may support it, it may have a performance penalty which may range from "somewhat slower" (e.g. Intel still recommends stack alignment, because otherwise many internal store optimizations start giving up) to "ridiculously slower" (e.g. I once had to write a trap handler that software-emulated unaligned accesses on ARM -- on all 32-bit ARMs Linux still does this for all instructions except plain undecorated LDR/STR when the special unaligned ABI is enabled).

And finally, even if the architecture supports it with decent enough performance, it may do it with relaxed atomicity. E.g. even as of today aarch64 makes zero guarantees regarding atomicity of even atomic instructions on unaligned addresses (yes, really). To put it simply because it is a _pain in the ass_ to implement correctly (say programmer does atomic load/store on overlapping addresses with different alignments). This is whether they cross cache lines or not.

i.e. it's as a bad as the GP is saying. You can't just put one example of one processor handling each case correctly to dismiss this claim, because the point is that most processor's don't bother and those who do bother still have severe crippling limitations that make it unfeasible to use in a GP compiler.

And there is still a lot of benefit to packing things up... but it does require way too much care and programmer effort.

torusle · on Aug 20, 2023

> If you're dealing with very simple CPUs like the > ARM M0, sure. But even the M3/M4 allows unaligned > access.

On ARM M3/M4 you have the same issue with LDRD and STRD instructions which do not allow unaligned access. Even the normal load/stores don't allow unaligned access in all cases. Try this in the peripheral memory region for starters. And things get even more complicated when the memory protection unit shakes up things.

macjohnmcc · on Aug 20, 2023

Yeah even Microsoft's compiler aligns values on appropriate boundaries for performance reasons. DWORDs on DWORD boundaries etc. And if you want to pack the data structure to avoid the gaps in structures there are methods to do so via #pragma options. I think their complaining about what was done for performance reasons shows a great lack of overall understanding. More time researching and less time griping would have served them better.