One could argue that one of the reasons why SIMD instructions, and indeed GPUs, ...

userbinator · on May 14, 2023

There are also other considerations, like rolling back state in OOO machines, or precise exceptions. All this becomes more complex with an x86-style instruction set.

It's not really more complex, because those are backend concerns and work on the uop level, after the instructions have already been decoded into uops.

ben-schaaf · on May 15, 2023

I was under the impression SIMD is due to clock speed not scaling. Instruction parallelism is hard, so there's a lot to gain from just making instructions wider.

mafribe · on May 15, 2023

SIMD is orthogonal to clock-speed.

There is indeed a lot to gain from having a single instruction trigger more complex behaviour, for example better instruction density, less instruction decoding needed, but all of this is independent of clock frequency.

I think, but am not sure, that the Thinking Machines' CM-2 from 1988 was a 4096 or 8192 wide SIMD machine. Surely, at the time, clockspeeds where low.