I find their explanation of the pipelining issue slightly confusing, probably be...

til · on Feb 21, 2018

> If you can split "step 3" across two clock cycles, couldn't you effectively subdivide it into two steps that could run in parallel?

There is a difference between splitting "step 3" across two clock cycles and splitting "step 3" into two separate steps. The underlying assumption here is that "step 3" is indivisible. E.g. say "step 3" was memory access and the latency for that is 500 picoseconds, it's not like you can just split it into two steps and make it load faster.