Hacker News new | past | comments | ask | show | jobs | submit login

I'm not familiar with modern CPUs, but I remember that most instructions take more than one cycle to execute, without counting memory or cache delays. So, expecting to see 1:1 IPC is ... a fantasy?



Yes, that's true, instructions generally take more than one cycle to execute, although the most common ones are fast. But processors these days can execute more than one instruction at a time (this has nothing to do with multiple cores, this is on a single core) and aren't bound to the order the instructions appear in the code, it will make sure that instructions that are dependent on each other will be executed in the right order, otherwise it can opportunistically execute whatever it has room for. Modern processors even speculatively executes code from branches it doesn't yet know will be taken, which caused the whole Meltdown/Spectre security headache.


Many processing units can offset this via superscaler methods, that is tricks to have more than one instruction processing at the same time. pipelining, speculative execution, smt, etc.

The article does not go into great depth about it, but does say that the 1 ipc ratio number is based off more gut feel than anything else. I assume the idea is that the superscaler bits(greater than 1 ipc ratio) help compensate for the slow bits(less than 1 ipc ratio) normalizing out at around a 1 ipc ratio when your code is good.


They do take more than cycle (how many depend on how you count), but they are fully pipelined so they can start executing an ( independent) instruction before a previous one has finished.

They are also superscalar so they have multiple (pipelined) units that can start executing instructions at the same time.


Yes but also many instructions are in flight at the same time, so just because the total pipeline is many cycles long doesn’t mean you can’t have a high throughput.


That's why the number is 1 and not 4. The processor is 4 wide so if it was doing the absolute theoretical maximum you'd get an IPC of 4. Brendan's "rule of thumb" of 1 is taking the multi-cycle thing into account.


yes, most instructions are multicycle. but multiple instructions can execute in parallel in the CPU pipeline.

Moreover, while latency is multiple cycles, most ALU units in CPU can start a new operation every cycle i.e. throughput is 1 operation per cycle.




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: