Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Sure, ECC and in particular registered memory does increase the latency a bit. But servers are designed for throughput and have multiple memory channels to better feed the large amount of cores involved, up to 64 cores for the new AMD epyc chips. The amazing thing is that the Apple M1 can fetch random cachelines almost as fast as a current AMD Epyc.


You're confusing throughput & latency here. More channels increases throughput, but doesn't improve latency.

The M1's memory bandwidth is ~68GB/s, which is of course a tiny fraction of AMD Epyc's ~200GB/s per socket.

Epyc's latency isn't even competitive with AMD's own consumer parts, so I'm really not sure why you're surprised that Epyc's latency is also worse than the M1's?


I'm not surprised the latency on the M1 is better than Epyc, but it's near half of any other consumer part, like say the AMD Rzyen 5950x. When accessed in a TLB friendly way (not TLB thrashing) the M1 manages 30ns which is excellent.

Even more impressively is that the random cacheline throughput is also excellent. So if all 8 cores have a cache miss the M1 memory system is very good at keeping multiple pending requests in flight to achieve surprisingly good throughput. Granted this isn't pure latency, so I call it throughput. Getting a random cacheline per 12ns is quite good, especially for a cheap low power system. Normally getting more than 2 memory channels on a desktop requires something exotic like an AMD threadripper.




Consider applying for YC's Winter 2026 batch! Applications are open till Nov 10

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: