It seems that M4 was overhyped. Almost of all of the performance improvements, i...

ribit · on May 24, 2024

No, per-clock performance improvements between M3 and M4 range from 0% to 20%, this is ignoring the two subtests that benefit from SME. That Twitter post is moot. GB results show high variation, it is easy enough to cherry pick pairs of results that show any point you might want. You have to compare result distributions. There were some users on anandtech forums who did it and the results are very clear.

dragonelite · on May 24, 2024

Makes one wonder has the apple miracle mostly been first the transition to ARM and having access to TSMC highest end nodes before the rest even comes into the picture. But im glad new competition is coming from qualcomm x elite and Huawei with their Kirin and Ascends chips. Hopefully ARMsrace will be more interesting to follow than the x64 race between intel and AMD.

hajile · on May 24, 2024

Oryon was designed to compete with M1 then the clockspeeds were ramped up to compete with M2. M3 clearly beat it out and M4 has only furthered that lead.

Oryon will still probably beat x86 designs massively in performance per watt which is pretty much the most important metric for most people anyway (as most people use laptops).

EDIT: your username `dragonelite` is quite interesting. You joined 2019, but the coincidence is fascinating.

talldayo · on May 24, 2024

> Technically, Intel has its matrix extensions (Intel AMX), but Geekbench does not support it.

Lmao, and people say Geekbench isn't biased towards ARM

ribit · on May 24, 2024

Geekbench supports Intel AMX and AVX-512. This is all in GB documentation.

alberth · on May 24, 2024

I truly never understood why Apple deprecated Bitcode.

It was a super great idea because it allowed recompilation on the App Store to take advantage of new instructions.

plorkyeran · on May 24, 2024

Bitcode did not allow recompilation to take advantage of new instructions. They dropped bitcode because they never actually managed to do anything with it other than the armvk7 to arm64_32 recompilation, and that required specifically designing arm64_32 around what was possible with bitcode.

Updating apps to use new vector instructions is far more complicated than upgrading to a new compiler version and having it magically get faster.

sgerenser · on May 24, 2024

SME is very specialized, right now no compiler (that I know of) is really able to take general-purpose code and output optimized SME. So for these instructions at least, bitcode wouldn’t be of any benefit.

astrange · on May 24, 2024

Autovectorization doesn't work without extreme levels of handholding, so the optimization idea was basically a myth.

sgerenser · on May 24, 2024

It’s a decent, but not revolutionary improvement. Yes, most of the gains outside of SME are coming from clock increases not IPC. I don’t know if I would call it overhyped, more like misunderstood.