Moving between x87 and xmm registers is actually fairly cheap (it's through memory, so it's not free, but it's also not _that_ bad). FSIN itself is catastrophically slow.
Fair enough, and I imagine there may even be some forwarding going on? There often is when a load follows a store, if I remember correctly. (Of course this will be microarchitecture-dependent.)