PC hardware doesn’t have hardware implementation of negation.
Possible to do in 2 instruction, xorps to make zero, then subps to subtract. Combined, they gonna take 4-5 cycles of latency (xorps is 1 cycle, subps is 3 cycles on AMD, 4 cycles on Intel).
If you do that a lot, a single xorps with a magic number -0.0f gonna negate these floats 4-5 times faster. People don’t pay me because I’m a normal person, they do that because I write fast code for them :-)
On a serious note, I’d rather have the current +0.0 and -0.0 IEEE values to be equal and be the exact zero, and make another one with 0xFF exponent encoding inexact zeroes, +0.0f or -0.0f depending on the sign bit.
Or another option, redefine FLT_MIN to be 2.8E-45, and reuse the current FLT_MIN, which is 1.4E-45 / 0x00000001 bit pattern, as inexact zeroes.
Possible to do in 2 instruction, xorps to make zero, then subps to subtract. Combined, they gonna take 4-5 cycles of latency (xorps is 1 cycle, subps is 3 cycles on AMD, 4 cycles on Intel).
If you do that a lot, a single xorps with a magic number -0.0f gonna negate these floats 4-5 times faster. People don’t pay me because I’m a normal person, they do that because I write fast code for them :-)
On a serious note, I’d rather have the current +0.0 and -0.0 IEEE values to be equal and be the exact zero, and make another one with 0xFF exponent encoding inexact zeroes, +0.0f or -0.0f depending on the sign bit.
Or another option, redefine FLT_MIN to be 2.8E-45, and reuse the current FLT_MIN, which is 1.4E-45 / 0x00000001 bit pattern, as inexact zeroes.