"The Willamette and Northwood cores contain a 20-stage instruction pipeline. This is a significant increase in the number of stages compared to the Pentium III, which had only 10 stages in its pipeline. The Prescott core increased the length of the pipeline to 31 stages."
Many of the tricks do not work the same way due to how instructions are now broken down by the decoder into microops. You may end up with worse RISC code than what Intel or AMD microcoded. The CPU can optimize it as well if it sees CISC. And less cache pressure can still be valuable.
Speculation and branch prediction got vastly sped up since.
Compilers themselves got way better since as well, so you can sometimes get away with just intrinsics.
"The Willamette and Northwood cores contain a 20-stage instruction pipeline. This is a significant increase in the number of stages compared to the Pentium III, which had only 10 stages in its pipeline. The Prescott core increased the length of the pipeline to 31 stages."
https://en.wikipedia.org/wiki/NetBurst
And many of that tricks actually works for long pipelines.