In reference to the sibling post that gives more details about pypy, I'd like to...

MBCook · on Aug 13, 2014

The other big problem with MMX was that it was integer only. While that might have been ok for some application 3D games and other software that could really use the boost needed floating point and not only couldn't benefit (since it was integer only) it actually interfered (since, as you said, it reused the registers).

AMD's 3DNow had single precision floating point support, so it was actually somewhat useful. SSE followed 3DNow and added single precision support (as well as fixing the register stuff). SSE2 added double precision support.

sounds · on Aug 14, 2014

Right, thanks for those additional details.

Today, no one would use MMX instructions (since SSE is vastly superior). I expect Intel will continue to add TSX capabilities which will eventually produce some nice results for parallel code.