Wouldn't it be pretty easy to compare a re-implementation of a special function ...

Wouldn't it be pretty easy to compare a re-implementation of a special function against a reference boost implementation? For example, there are only 232 floats and for a given function you can check them all very quickly. It's a little trickier for doubles, but one can imagine doing, say, 240 tests and getting a reasonable level of confidence.