Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

FSIN only works on x87 registers which you will rarely use on AMD64 systems -- you really want to use at least scalar SSE2 today (since that is whence you receive your inputs as per typical AMD64 calling conventions anyway). Moving data from SSE registers to the FP stack just to calculate FSIN and then moving it back to SSE will probably kill your performance even if your FSIN implementation is good. If you're vectorizing your computation over 4 double floats or 8 single floats in an AVX register, it gets even worse for FSIN.


Moving between x87 and xmm registers is actually fairly cheap (it's through memory, so it's not free, but it's also not _that_ bad). FSIN itself is catastrophically slow.


Fair enough, and I imagine there may even be some forwarding going on? There often is when a load follows a store, if I remember correctly. (Of course this will be microarchitecture-dependent.)




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: