Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

One could argue that one of the reasons why SIMD instructions, and indeed GPUs, are popular, is because they amortise the (transistor and power) cost of decoding over more compute units, in the case of GPUs over many more.

There are also other considerations, like rolling back state in OOO machines, or precise exceptions. All this becomes more complex with an x86-style instruction set.



There are also other considerations, like rolling back state in OOO machines, or precise exceptions. All this becomes more complex with an x86-style instruction set.

It's not really more complex, because those are backend concerns and work on the uop level, after the instructions have already been decoded into uops.


I was under the impression SIMD is due to clock speed not scaling. Instruction parallelism is hard, so there's a lot to gain from just making instructions wider.


SIMD is orthogonal to clock-speed.

There is indeed a lot to gain from having a single instruction trigger more complex behaviour, for example better instruction density, less instruction decoding needed, but all of this is independent of clock frequency.

I think, but am not sure, that the Thinking Machines' CM-2 from 1988 was a 4096 or 8192 wide SIMD machine. Surely, at the time, clockspeeds where low.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: