You are correct. Just a nit: it's usually referred to as "instructions per cycle...

hermitdev · on April 30, 2020

Operations per cycle also matters. Circa P4, an instruction took around 40-50 cycles to complete. Yeah, it was +5GHz, though.

Core architecture brought it down to around 8 cycles to complete an op. Clock speeds dropped, but more shit got done at lower clock speeds.

CyberDildonics · on April 30, 2020

That is still IPC, which refers to instructions per clock as an average throughput and not the instruction latency.

> Circa P4, an instruction took around 40-50 cycles to complete

Different instructions can have wildly different latencies. Even then an instruction taking 50 cycles sounds like double precision division or an 80 bit floating point operation. Most operations on the P4 had a latency of 1 - 7 cycles, but the P4's high clocks made memory latency and branch mispredictions a bigger issue.

Some instruction latency might have been part of the overall pipeline shortening that made the core architecture fast, but this is an oversimplification, and the numbers here don't apply to the vast majority of common instructions. Caches, deep out of order buffers, prefetching and branch prediction all play a part.

exikyut · on May 5, 2020

Late reply, but TIL and thanks.

All this time I thought everyone was going on about interprocess communication improvements :)