Under your proposed scheme, from the time between a branch being issued and resolved around half the subsequent instructions executed by your CPU will be retired and the other half will be discarded. So, all else being equal, performance is 50% compared to non-branching code.
With branch prediction, after a single branch all the speculatively-executed instructions are either retired or discarded. If a branch predictor randomly guesses for each branch, it will achieve 50% accuracy...so, averaged across many branches, 50% of speculatively executed instructions are retired.
So the worst possible branch predictor has equivalent performance to your proposed scheme, and it’s quite a bit simpler to implement (which translates to increased performance). If your branch predictor is any better than blind guessing, it will achieve better performance (by the benchmark of % of correctly-speculated instructions). Given that modern branch predictors achieve 95%+ accuracy on typical workloads, it’s clear why processor designers choose branch prediction rather than executing both pathways.
I'm not sure this looks at the correct resources. If I read TFA, they have an IPC of 0.85 on the benchmark, but a theoretical IPC limit of 6. So 50% usage would be IPC=3, a lot better than the benchmark. So execution resources aren't the bottleneck.
Also, this analysis skips the easily predictable jumps like loops. I remember 90% prediction being relatively easy to reach in most programs just because of loops. A better way would be to let the predictor not only respond with taken/not taken, but also with high/low confidence. If confidence us low, then a CPU could spend some unused part of IPC.
All of this probably wont work in practice, of course.
The bigger thing you're missing is that theoretical IPC of 6 can only happen if you are using a bunch of different execution ports. Most branches will have similar instruction workloads afterwards so the different sides of the branch will be fighting each other.
Under your proposed scheme, from the time between a branch being issued and resolved around half the subsequent instructions executed by your CPU will be retired and the other half will be discarded. So, all else being equal, performance is 50% compared to non-branching code.
With branch prediction, after a single branch all the speculatively-executed instructions are either retired or discarded. If a branch predictor randomly guesses for each branch, it will achieve 50% accuracy...so, averaged across many branches, 50% of speculatively executed instructions are retired.
So the worst possible branch predictor has equivalent performance to your proposed scheme, and it’s quite a bit simpler to implement (which translates to increased performance). If your branch predictor is any better than blind guessing, it will achieve better performance (by the benchmark of % of correctly-speculated instructions). Given that modern branch predictors achieve 95%+ accuracy on typical workloads, it’s clear why processor designers choose branch prediction rather than executing both pathways.