Hacker News new | past | comments | ask | show | jobs | submit login

Also P4 comes with NetBurst architecture.

"The Willamette and Northwood cores contain a 20-stage instruction pipeline. This is a significant increase in the number of stages compared to the Pentium III, which had only 10 stages in its pipeline. The Prescott core increased the length of the pipeline to 31 stages."

https://en.wikipedia.org/wiki/NetBurst

And many of that tricks actually works for long pipelines.




Many of the tricks do not work the same way due to how instructions are now broken down by the decoder into microops. You may end up with worse RISC code than what Intel or AMD microcoded. The CPU can optimize it as well if it sees CISC. And less cache pressure can still be valuable.

Speculation and branch prediction got vastly sped up since.

Compilers themselves got way better since as well, so you can sometimes get away with just intrinsics.




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: