(a+b)+c has two ops in LLVM: addition is a binop, meaning it has two "arguments", thus (a+b) and adding "c" are separate instructions. You can't directly add three or more values.
Right, the question becomes: if you have `(a + b) + c` and one of the `+` operations allows re-association but the other one doesn't, then what happens? The problem being that associativity is a property that only makes sense for an entire tree of associable operations, not individual ones.
From personal experience, yes: I've seen multiple cases of scientists finding the ultimate cause of their bugs was some fast-math-related optimization.
The problem isn't necessarily the code they wrote themselves: it is often that they've compiled someone else's code or an open source library with fast-math, which broke some internal piece.
I tried to lay out a reasonable path: incrementally test accuracy and performance, and only enable the necessary optimizations to get the desired performance. Good tests will catch the obvious catastrophic cases, but some will inevitably be weird edge cases.
As always, the devil is in the details: you typically can't check exact equality, as e.g. reassociating arithmetic can give slightly different (but not necessarily worse) results. So the challenge is coming up with appropriate measure of determining whether something is wrong.
The key thing about floating point is that it maintains relative accuracy: in your case, if you have say f(x) and g(x) are both O(1e200), and are correct to some small relative tolerance, say 1e-10 (that is, the absolute error is 1e190). Then the relative for f(x)/g(x) stays nicely bounded to about 2e-10.
However if you do f(x) - g(x), the absolute error is on the order of 2e190: if f(x) - g(x) is small, then now the relative error can be huge (this is known as catastrophic cancellation).
Both f(x) and g(x) could be calc'ed to proper machine precision (say, GSL double prec ~ 1E-15). Would this imply, that beyond the machine precision the values are effectively fixed to resp. zero or infinity, instead of carrying around the extreme orders of magnitude?
I'm not exactly sure what you're asking here, but the point is that "to machine precision" is relative: if f(x) and g(x) are O(1e200), then the absolute error of each is still O(1e185). However f(x)/g(x) will still be very accurate (with absolute error O(1e-15)).
In theory, every function should do that to check things like rounding mode etc. But that would be pretty slow, especially for low-latency operations (modifying mxcsr will disrupt pipelining for example).
-ffp-contract=fast will enable FMA contraction, i.e. replacing a * b + c with fma(a,b,c). This is generally okay, but there are a few cases where it can cause problems: the canonical example is computing an expression of the form:
a * d - b * c
If a == b and c == d (and all are finite), then this should give 0 (which is true for strict IEEE 754 math), but if you replace it with an fma then you can get either a positive or negative value, depending on the order in which it was contracted. Issues like this pop up in complex multiplication, or applying the quadratic formula.