That is still IPC, which refers to instructions per clock as an average throughput and not the instruction latency.
> Circa P4, an instruction took around 40-50 cycles to complete
Different instructions can have wildly different latencies. Even then an instruction taking 50 cycles sounds like double precision division or an 80 bit floating point operation. Most operations on the P4 had a latency of 1 - 7 cycles, but the P4's high clocks made memory latency and branch mispredictions a bigger issue.
Some instruction latency might have been part of the overall pipeline shortening that made the core architecture fast, but this is an oversimplification, and the numbers here don't apply to the vast majority of common instructions. Caches, deep out of order buffers, prefetching and branch prediction all play a part.