Try a worse version of this - comparing the results of a floating-point function call where the left operand gets moved into the 80bit fp unit during computation, and the right operand stays in register.
Obviously there is a precision difference between the two numbers, enough to make a < b and b < a return true in a surprising number of cases. The way I ended up fixing it was by putting the result of the function call in a member variable in every struct, pre-computing all of the results, and comparing based on that value.
I hit this, too. Regression tests were failing when I changed code that obviously shouldn't change the output of the program at all. This happened on a regular basis when we changed numeric code, because of the normal limitations of floating-point arithmetic; we just made sure the numerical results were accurate and updated the regression tests. (The regression tests were quite handy for finding logic errors; they weren't really used to test numerical accuracy.)
But in this case I was just adding some error checks, which weren't even being triggered. Clearly this shouldn't affect the results of our numerical calculations. Since my code shouldn't affect the calculations, I was convinced that our existing numerical code had a subtle memory or timing bug. (I knew that floating-point code was tricky, but clearly I was doing exactly the same operations on exactly the same values.) I spent days staring at code, and then my boss told me to stop working on it since the results were clearly correct in both versions, even if they weren't identical.
A few weeks later I read about how values change when they're copied out of the x87 stack into registers. And I thought, naw, we couldn't possibly be using x87 arithmetic. But we were. Which was horrifying, since floating-point calculations could be a bottleneck under some workloads. But we had been running that way since before I started working on it, so at least it wasn't my fault. I added a compiler option to request sse2 floating point instead of x87 floating point. Voila, predictable floating-point results, plus measurably faster performance on a few tests.
Obviously there is a precision difference between the two numbers, enough to make a < b and b < a return true in a surprising number of cases. The way I ended up fixing it was by putting the result of the function call in a member variable in every struct, pre-computing all of the results, and comparing based on that value.