I think it's pretty clear the GP was asking if the optimizations implemented for x86 that aren't implemented for aarch64 would actually improve performance of generated aarch64 code. It's a question about CPU architecture/microarchitecture. That's a different question as to if the optimizations improve performance of generated x86 code.
For instance, I imagine the x86_64 register allocation does some variant on graph coloring for register allocation, with an additional pass to assign lettered registers (rax, rbx, etc.) to the most heavily used registers, since using higher numbered registers requires a REX prefix byte. In addition, many instructions have more compact encodings when eax/rax is the destination register. At a minimum, excess REX prefixes take up instruction cache space. There's no parallel for aarch64, so there's no sense in implementing logic to try and make sure the low-numbered aarch64 registers are used more. (Though, on 32-bit ARM with Thumb/Thumb2, only a subset of registers are available, so there is a similar optimization for 32-bit ARM targets that support Thumb/Thumb2 when optimizing for space.)
I imagine there are better examples, but my point is that some optimizations are useless on some architectures.
> I think it's pretty clear the GP was asking if the optimizations implemented for x86 that aren't implemented for aarch64 would actually improve performance of generated aarch64 code.
Oh, of course! That must be what was meant. Sorry.
That was(is?) the benchmark for gcc optimizations: it has to pay for itself on the compiler: if the resulting compiler code is faster but the added compilation time is even greater, it s not worth it.