Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Amusingly, (to me at least) there's also an SSE instruction for non-reciprocal square roots but it's so much slower than reciprocal square root that calculating sqrt(x) as x * 1/sqrt(x) is faster assuming you can tolerate the somewhat reduced precision.


I wouldn't be surprised if _mm_rsqrt_ps is actually implemented using the same bit level trick.

Same as Carmack's, we did a single step of Newton's method and it was definitely good enough.


I dunno about Intel and AMD, but ARM and RISC-V use lookup tables for rsqrt. Unlike AMD and Intel, those tables are precisely defined in their respective specs.


Intel provides bit-accurate code. In older chips it used a faithful bipartite ROM:

https://web.archive.org/web/20120124193536id_/http://www.acs...


I don't recall the coprocessor having either reciprocal or reciprocal square root? I didn't do much Intel until later in my career though, so I might be missing something though.

Both _mm_rcp_ps (rcpps) and _mm_rsqrt_ps (rsqrtps) are only good for about half the bits.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: