Exactly. The far larger ROB, great branch predictor, large L1 and significantly decreased memory latency combined with a lower core frequency allcontribute to keep the core fed and prevent it stalling.
Still sustaining 8-wide (or even just 4-wide) is very hard. Apparently most code has an ILP of 1.5 on average.
I guess that the large width helps recovering from stalls (somehow absorbing spikes).
Still sustaining 8-wide (or even just 4-wide) is very hard. Apparently most code has an ILP of 1.5 on average.
I guess that the large width helps recovering from stalls (somehow absorbing spikes).