In the memcopy benchmark, which is designed to stress both memory I/O as well as caches, Intel’s XEON shows the highest raw performance
I am not surprised by that, given that x86 has a single instruction that will copy arbitrary number of bytes in cacheline-sized chunks --- something that ARM does not have.
I am not surprised by that, given that x86 has a single instruction that will copy arbitrary number of bytes in cacheline-sized chunks --- something that ARM does not have.