Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I did the same thing for sqrt() seven years ago. It benchmarked at about five times faster than math.h/libm, but the trade was a reduction in the maximum value of the result. My version would not produce accurate results for inputs greater than 2^16. It did work very well for generating real-time laser pointing X/Y coordinates for an Archimedean Spiral.

https://en.wikipedia.org/wiki/Archimedean_spiral

(This was on a MSP430 platform with a FPU that only did multiplication.)



I'm surprised this was faster than an initial guess (half the exponent and mantisa) followed by newton's method.


Newton's method involves division, which is a problem on an embedded platform with a limited FPU that can only do multiplication. I tried several classic approaches and benchmarked all of them against the function in libm. I reviewed what I did and it turns out that my own memory is not that great. My version (nsqrt) is limited to input values of less than 2^14 and not 2^16 as I said above. Also, it's only 50% faster and not 5x as I said above.

Even a 50% increase in speed was well worth my effort though.

* Replacement sqrtf() function. This function runs about 50% faster than

* the one in the TI math library. It provides identical accuracy, but

* it is limited to input values less than 2^14. This shouldn't be a problem

* because the passed argument will always be in the range of

* 0-(max spiral scan period). The spiral scan period is presently about

* 140 seconds.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: