Common knowledge is that going wider quickly hits diminishing returns so general...

ece · on Dec 1, 2020

I'd like to see this explored more. I think AMD/Intel might be historically looking at C++/compiled workloads more than JS/Java/python interpreted workloads. JITs very well could take better advantage of wider cores than compiled code.

Diminishing returns with increasing cache sizes and going wider doesn't seem to hold back the M1.

gpderetta · on Dec 1, 2020

Maybe. JITs and runtimes in general might add some "overhead" code that does book-keeping and verifies assumptions (to rollback conditional optimistic optimizations). Maybe wider engines allow running this overhead without impacting actual runtime.

It probably also help hide the cost of type checks and bound checks.

ece · on Dec 4, 2020

It might just be the combination of better branch prediction, TSMC 5nm, 8-channel memory, 16k pages, along with 8-wide cores+larger structures especially cache are enough to account for the more perf/watt of the M1.