So would moving the floating point value into an integer register, shifting, incrementing, ORing, etc. ever be faster than a full FP addition? I imagine that modern desktop CPUs have effectively-single-cycle multiply, but what about something like ARM with VFP but no Neon?
Only in architectures where FP and integer values are kept in the same registers. This is true in SSE2 and Altivec, so you can do integer operations there - but since SIMD integer operations are limited too, it's pretty much only useful for flipping the sign bit.
Moving values between different register sets is INCREDIBLY slow since it involves at least two memory operations.
And faking FP operations with specialized integer code on something with soft-float like ARM might be worth it. I've never done it so I can't really say.