This is also why hyperthreading yields a benefit for sparse/graph workloads. I was getting good results on the KNL Xeon Phi with its four hyperthreads.
But you easily hit the 'outstanding loads' limit if your hyperthreads all do gather instructions every 4-8 instructions. The architecture needs to balanced around it.
This is also why hyperthreading yields a benefit for sparse/graph workloads. I was getting good results on the KNL Xeon Phi with its four hyperthreads. But you easily hit the 'outstanding loads' limit if your hyperthreads all do gather instructions every 4-8 instructions. The architecture needs to balanced around it.