Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

It actually depends on what flags you pass to clang, and for a good reason. 3 term lea uses "complex decoding" and thus has higher latency (and less possible execution ports) on intel arches before icelake. If you run clang -O2 -mtune=icelake-client or -mtune=znver3 (or later architectures) it will generate the single lea instruction.

As always in optimization choices it comes down to cost modelling and trade-offs.



Interesting. I guess GCC’s cost modeling is different, then? Or does it default to a newer machine?




Consider applying for YC's Winter 2026 batch! Applications are open till Nov 10

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: