Amusingly, (to me at least) there's also an SSE instruction for non-reciprocal square roots but it's so much slower than reciprocal square root that calculating sqrt(x) as x * 1/sqrt(x) is faster assuming you can tolerate the somewhat reduced precision.
I dunno about Intel and AMD, but ARM and RISC-V use lookup tables for rsqrt. Unlike AMD and Intel, those tables are precisely defined in their respective specs.
I don't recall the coprocessor having either reciprocal or reciprocal square root? I didn't do much Intel until later in my career though, so I might be missing something though.
Both _mm_rcp_ps (rcpps) and _mm_rsqrt_ps (rsqrtps) are only good for about half the bits.