That may be OK in simple cases where you can easily eyeball it, if you're only interested in aggregated CPU time as a metric, and if you win most from optimizing the obvious function in all modes of the program. That's not necessarily the case in complex scientific codes, for instance, especially parallel ones.