> For example, PyPy isn't planning on doing a TSX port even with their enthusias...

reitzensteinm · on Aug 12, 2014

Here are a few links:

http://pypy.org/tmdonate.html (Search for "haswell")

http://grokbase.com/t/python/pypy-dev/13bvt3kg70/pluggable-h...

It seems to boil down to:

* The cache size (which determines the amount of memory you can write to in a transaction before having to commit back) is insufficient, causing excessive transaction aborts.

* There is no mechanism to bypass the HTM, writing to memory within a transaction that is not rolled back. This exacerbates the small cache size, since all memory writes have a cost, not just the ones you want rolled back in the case of a transaction abort.

Interestingly, this does not bode well for HTM on a platform with many smaller cores, say a hypothetical 64 core ARM. Each core will have a tiny amount of L1 cache, severely limiting transaction size.

And many smaller cores is exactly where you'd want the benefits of HTM, since the overhead of synchronization is higher in proportion to the work each core can do.

sounds · on Aug 12, 2014

In reference to the sibling post that gives more details about pypy, I'd like to call to your mind the history of the vector extensions for x86.

First revision: MMX. It reused the same registers as the older x87 floating point coprocessor (even though the x87 transistors lived on the same die). As a result, legacy x87 code and MMX code had to transition using an expensive EMMS instruction.

Second revision: (well, ignoring some small changes to MMX) ... SSE. Finally got its own registers, but lacked a lot of real-world capability.

Third revision: SSE2, finally got to a level of parity with competing vector extensions (see, for example, PowerPC's Altivec).

And so forth.

I guess the take-home lesson for me is that these new TSX instructions are indeed fascinating to play around with, but I wouldn't expect it to blow the doors off. Intel will incrementally refine it.

(The incremental approach also gives Intel a chance to study how it's being used and keeps AMD playing catch-up.)

MBCook · on Aug 13, 2014

The other big problem with MMX was that it was integer only. While that might have been ok for some application 3D games and other software that could really use the boost needed floating point and not only couldn't benefit (since it was integer only) it actually interfered (since, as you said, it reused the registers).

AMD's 3DNow had single precision floating point support, so it was actually somewhat useful. SSE followed 3DNow and added single precision support (as well as fixing the register stuff). SSE2 added double precision support.

sounds · on Aug 14, 2014

Right, thanks for those additional details.

Today, no one would use MMX instructions (since SSE is vastly superior). I expect Intel will continue to add TSX capabilities which will eventually produce some nice results for parallel code.