You should run your critical loops through perf or a similar tool and get statis...

You should run your critical loops through perf or a similar tool and get statistics for branch misprediction. If you have a lot of misprediction, you should think about doing branchless stuff. If you don't have a lot of misprediction, the branchy code will be faster.

Sometimes you can guess whether the data the loop will hit will be predictable or not. The same applies; if you expect it to be predictable, then don't try to remove branches. If you expect it to be unpredictable, try to remove branches.

Note that you will need to teach yourself your expectations; the branch predictor might be better or worse than you expect it to be. Modern branch predictors are extremely complex, often taking more die space than everything except cache. Don't assume that just because you can't see a pattern in the data, the branch predictor won't be able to.