The author admits this ia a contrived demo, but I'm still trying to understand its ostensible logic. The FTZ/DAZ problem is on line 20:
if (x >= nextafterf(0.0, INFINITY)) {
I think the idea here is: test if x is large enough so that the subsequent ceilf() will give you an integer i_x >= 1 (because the actual array index will be PIXBUF_WIDTH - i_x, and the max array index is PIXBUF_WIDTH-1). With FTZ/DAZ, nextafterf(0.0, INFINITY) == 0, which breaks the test.
But in this scenario, wouldn't you just test "if (x > 0)", since that implies that ceilf(x) must be >= 1?
Thank you for creating and sharing this example. I think the fact of merely loading a shared library as modifying the FP behavior is the real shocker, more than any particular FP computation that hinges on denormals.
Still, I think one of the great wins with denormals is that they make "x == y iff x-y == 0" true; absent denormals you can have two normalized x and y, x != y, such y-x evaluates to 0. So for an example, maybe something that involves y > x and yet ceil(y-x) == 0, which is surprising. The only setting I can think of is something like making a histogram (with flipped indices, like your pixel row) of differences of values, so that ceil(y-x) == 0 leads to an invalid histogram index.
Some values are x=1e-32 and y=1.0000011e-32, which parse (via sscanf) to:
x = 0:00010100:10011111011000100011111 = (-1)^0 * 2^( 20-127) * (1+5222687/2^23)
y = 0:00010100:10011111011000100101110 = (-1)^0 * 2^( 20-127) * (1+5222702/2^23)
and then the denormalized difference d = x-y should be:
d = 0:00000000:11110000000000000000000 = (-1)^0 * 2^(1-127) * (0+7864320/2^23)
Having some specific values sidesteps concerns about nextafterf(). In this case I think y is many ulps above x, not just 1.
If I understand correctly, the nextafterf implementation in crtfastmath.o linked in by the empty file compiled with -ffast-math does not handle infinities, so nextafterf(0.0, INFINITY) returns 0.0 (?) and thus i_x = 0 (whereas it should be 1 at minimum since we access PIXBUF_WIDTH - i_x).
Not quite. nextafterf() is trying to return the smallest float after 0.0 in the direction of infinity (infinity is still valid even when the FPU is configured to flush denormal numbers to 0; infinity isn’t a denormal number). But since the smallest float after zero is denormal, when a shared library built with -ffast-math is loaded (which sets the FPU’s FTZ and DAZ bit) denormal numbers get forced to zero, and so nextafterf returns 0.0 and the comparison becomes (x >= 0.0), which means that x=0.0 passes the check and ceilf returns 0.0 rather than 1.0.
`nextafterf` does not in fact return 0. FTZ flushes denormals to zero when they are the result of an FP calculation, but `nextafter` is implemented using integer arithmetic, because it is much easier: simply add 1 as an integer, with special handling for Inf/NaN/-0.
It's only later that it gets treated as zero by DAZ: in the FP comparison, and then the call to ceil(), both of which use the FPU.
edit: confirmed, you can see it in action on this gist. When compiled with -ffast-math then the denormal does not compare greater than zero (it is flushed to zero by the comparison), but its bit pattern is integer value 1, which is nextafter(0).
I don't think this is correct. The return value of nextafterf is a float, and the ABI requires those to be returned in xmm0. So although nextafterf is using integers internally, the flush to zero happens inside of nextafterf when it moves eax into xmm0; even though eax is 1, xmm0 becomes 0.0. You can see that by stepping through it in gdb:
Denormals are not flushed to zero merely by `mov xmm,rax` - they can't be, since xmm is also used for integer arithmetic, and the CPU doesn't "know" whether it's an FP or integer value at the point of the mov.
Your gist demonstrates this: notice in the bottom right that xmm0 contains `v2_int64 = {0x1, 0x0}` so the bottom half containing the result is nonzero. (The top half is zero as it's unused).
The "info" command rounds the value to 0 for compactness, but if you run `p $xmm0.v2_double` you will also see it is nonzero when interpreted as a double (I get 4.94e-324).
Ideally, IEEE 754 would define behavior when subnormals are flushed, and would define nextUp(0) to be the smallest representable value (i.e. min normal for FTZ/DAZ modes).
Unfortunately, the committee has never been willing to define behavior for non-standard arithmetic, despite multiple people (Ian Ollmann has suggested it a few times, and others have as well) requesting it over the years.
I agree that IEEE 754 should define how FTZ/DAZ should work, given that it's a pretty common extension to floating-point processors, and language committees are unlikely to lift a finger to do anything floating-point related that isn't from IEEE 754.
Would it be fair to call this a bug in nextafterf? If this FPU flag is enabled it would seem like returning a denormal here is incorrect as it isn’t representable in this cpu in this configuration.
None of the relevant standards say anything about how math functions are supposed to behave in the non-standard DAZ/FTZ modes (because they're non-standard). The second you set the bit, you step off the path of being able to complain about any behavior not documented by the library.
I do find it bit odd that all the floating point environment control flags are usually pretty well hidden away in most programming languages; I feel those are something that could be more clearly exposed, maybe even as a first-class language feature. But even just having some raii-style scoped environments could be neat
IMO if it was properly exposed in C, or even as a GNU extension, then "fast math" would be part of the function type.
1. A non-fast-math function calling a fast-math function (or the other way around) should result in the floating point environment to be changed on entering the function, and reset on leaving the function.
2. A function calling an other function of the same kind shouldn't touch the floating point environment.
3. Function pointers to different kind of functions shouldn't be convertible to each other (that is `void ()() __attribute__((fast_math))` and `void ()()` are unrelated types).
As for a compiler flag altering the default "fast_math" or "no_fast_math" attribute, I'm not sure about. It's probably a bad idea. Libraries that want to make this configurable can always do their own macro.
Altering the floating point environment for everybody in a global constructor within the library is just horrible.
You can change floating point modes dynamically, and it's commonly done, and then it doesn't matter if the library was dynamically or statically linked.
Depending on the implementation some green threads will save the mxcsr register during context switches (iirc ucontext does this on Linux) but some "fast" context switching libraries will intentionally avoid it to save some cycles since it requires a syscall.
The problem is a constructor in the .so being called as it's loaded. There is no way to guarantee state is safe between calls to functions loaded from shared libraries or when they're linked without expensive checks. The point is that there is state being mutated due to the fact we rely on shared libraries and allow dynamic libraries to mutate that state when they're loaded.
This is not a problem with statically linked libraries. And also note, I made a point of talking about shared libraries - not dynamic linking. They're not equivalent. The issue here is that the libraries are shared and do bad things, not that they're loaded dynamically. It would be fine if the library wasn't shared!
Eric Holder ruined the phrase, "Fast and Furious," for me. Now, whenever I see that phrase, I am reminded of gun-running US attorneys held in Contempt of Congress.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=55522