Anecdotally, the DEC Alpha was the first chip to move from "clock is provided at the edge of the chip" to "clock is provided in the middle of the chip", due to a desire to minimise the skew from one edge of the CPU to the other.
How true it is? Not the faintest, but maybe someone here was in a/the Alpha design team and can say yes/no.
I think it's more like a lot more detailed modeling involved during place&route, since we can no longer get away with simplifications.
There's definitely a lot more playing with clock domains too though. Run this bit at half clock, etc. But you still wouldn't want to have entirely async things within an execution unit, because bringing them back in sync costs a couple clock cycles on its own.
I do wonder how the tooling used in leading edge VLSI design rates on software quality. If the open FPGA tooling community has taught us something, it's that vendor tooling is hideously inefficient and just plain bad software. If VLSI design is on the same level (and it wouldn't be surprising if it is; in fact some FPGA tooling is done by the same companies and a similar codebase), then they might be leaving performance, design time, or both on the table just because the damn P&R and timing analysis take much longer to run than they would if the software weren't terrible. Of course I'm sure they just throw more machines at the problem for high stakes chip design, but...
> If the open FPGA tooling community has taught us something, it's that vendor tooling is hideously inefficient and just plain bad software
Just plain bad software and hilariously overengineered sure, but inefficient is a big word to throw out. I don't think any non-vendor tool has yet proven able to play in the same league, yet. It's like comparing tinycc with gcc.