>Lest you ignore the fact that infiniband is pretty much on par with top of the line ddr speeds for the matching generation.
You can't go faster than the speed of light (yet) and traveling a few micrometers will always be much faster than traversing a room (plus routing and switching).
Many HPC tasks nowadays are memory-bound rather than CPU-bound, memory-latency-and-throughput-bound to be more precise. An actual supercomputer would be something like the Cerebras chip, a lot of the performance increase you get is due to having everything on-chip at a given time.
Really? How about: "This pointer is valid, has the same numeric value (address) and points to the same data in all threads".
The point is not the latency nor bandwidth. The point is the programming/memory model. Infiniband maybe makes multiprocessing across nodes as fast as multiprocessing on a single node. But it's not multithreading.
I feel sorry for you if you believe this. It's not true physically nor is it true on the level of the cache coherence protocol nor is it true from the perspective of the operating system.
Cache is not shared between cores.
HPCs just have more levels of cache.
Lest you ignore the fact that infiniband is pretty much on par with top of the line ddr speeds for the matching generation.