Any reasonable compiler will implement x%256 using bitwise arithmetic, but C semantics require that x == d*(x/d) + (x%d), which means that if x is negative (x%256) must be negative, while (x&0xff) will be positive.
This difference means that the AND is still faster than the MOD when used on signed integers, if the compiler is not smart enough to prove that the input will never be negative.
For example, with gcc 4.7.3 the resulting assembly is
This difference means that the AND is still faster than the MOD when used on signed integers, if the compiler is not smart enough to prove that the input will never be negative.
For example, with gcc 4.7.3 the resulting assembly is
I.e., six instructions instead of one. This is still faster than using a divide.