You could also do a predicated conditional move instead. Just do the subtraction every time, and use something like cmov to only do the write if you need to.
I don't know if it would end up being faster, though.
Most likely (very very likely) the branch would be faster. It will almost always be predicted correctly (exceptions on the rollover) and cmov can be moderately expensive.
The general rule is to only use cmov if your test condition is mostly random.
I don't think this is true. I tried the sample out, and gcc, clang, and intel's compiler all generate a cmov for the code instead of a branch with -O2. I don't think all these compilers would have used a cmov instead of a branch if the cmov was more expensive than a branch in this case.
It might have something to do with the branch not being part of a loop so the best it can do is assume the branch is random (think something like modding a hash code when it would indeed be random).
Ad was pointed out in the thread, recent cpus have reduced the latency of cmov to a cycle. So your result could also depend on your architecture.
It does, but it may run out of non-dependent instructions to execute while waiting for the long pole of the cmov dependency chain to finish.
This used to be a problem on Pentium4 where cmov had high latency (4 cycles or more), but today, IIRC, a register-to-register cmov is only one cycle, so it is safe to use whenever a branch could have a non-trivial misprediction rate.
I don't know if it would end up being faster, though.