Wouldn't it be possible to optimize floating point multiplication by 2.0f (or an...

astrange · on July 24, 2010

Only on architectures with such instructions. x86 doesn't have anything like that (well, maybe in x87), and I don't know of any others that do.

nitrogen · on July 25, 2010

So would moving the floating point value into an integer register, shifting, incrementing, ORing, etc. ever be faster than a full FP addition? I imagine that modern desktop CPUs have effectively-single-cycle multiply, but what about something like ARM with VFP but no Neon?

astrange · on July 25, 2010

Only in architectures where FP and integer values are kept in the same registers. This is true in SSE2 and Altivec, so you can do integer operations there - but since SIMD integer operations are limited too, it's pretty much only useful for flipping the sign bit.

Moving values between different register sets is INCREDIBLY slow since it involves at least two memory operations.

And faking FP operations with specialized integer code on something with soft-float like ARM might be worth it. I've never done it so I can't really say.