Let’s look at the problem numerically! Under your proposed scheme, from the time...

hyperman1 · on Sept 7, 2023

I'm not sure this looks at the correct resources. If I read TFA, they have an IPC of 0.85 on the benchmark, but a theoretical IPC limit of 6. So 50% usage would be IPC=3, a lot better than the benchmark. So execution resources aren't the bottleneck.

Also, this analysis skips the easily predictable jumps like loops. I remember 90% prediction being relatively easy to reach in most programs just because of loops. A better way would be to let the predictor not only respond with taken/not taken, but also with high/low confidence. If confidence us low, then a CPU could spend some unused part of IPC.

All of this probably wont work in practice, of course.

adgjlsfhk1 · on Sept 8, 2023

The bigger thing you're missing is that theoretical IPC of 6 can only happen if you are using a bunch of different execution ports. Most branches will have similar instruction workloads afterwards so the different sides of the branch will be fighting each other.