VLIW is pretty successful for loopy DSP and now AI/ML. However, Itanium was tryi...

bbatha · on July 3, 2024

Adding on, VLIW is only successful in AI/ML because GPUs are incapable of doing branching in a good way let alone prediction. I would guess the same story applies to DSPs. If someone figures out how to stick a branch predictor in those pipelines Im guessing the VLIW nature of those platforms will disappear overnight.

CalChris · on July 3, 2024

The defining character of VLIW is to have the brilliant compiler software schedule dumb parallel hardware instructions statically and then not depend on power/transistor expensive dynamic branch prediction and OOO execution.

In a perfect VLIW world that would mean you don't spend any transistors or power on branch prediction or out of order instruction searches. Indeed the original VLIW paper [1] spends vastly most its paragraphs on solving the (hard) compiler instruction scheduling problem with trace scheduling which is still used. The VLIW hardware itself is dead simple.

[1] https://safari.ethz.ch/architecture/fall2022/lib/exe/fetch.p...

So if VLIW fits the problem it has fantastic performance characteristics. If it doesn't fit, and far and away most don't, then VLIW is terrible. VLIW is very brittle.

I need to make a caveat about the Mill CPU which is a VLIW but I see I've written too much already.