VLIW is pretty successful for loopy DSP and now AI/ML. However, Itanium was trying to work for general purpose code and then as you say, it constantly ran into weird edge cases. It seemed as if VLIW succumbed to the Peter Principle, it had risen to its level of incompetence. But as long as you use it appropriately, VLIW is a best practice and LLVM supports it.
BTW, CS252 at Berkeley and Onur Mutlu's ETH lectures give a conventionally disparaging view of VLIW without pointing out its successes.
Adding on, VLIW is only successful in AI/ML because GPUs are incapable of doing branching in a good way let alone prediction. I would guess the same story applies to DSPs. If someone figures out how to stick a branch predictor in those pipelines Im guessing the VLIW nature of those platforms will disappear overnight.
The defining character of VLIW is to have the brilliant compiler software schedule dumb parallel hardware instructions statically and then not depend on power/transistor expensive dynamic branch prediction and OOO execution.
In a perfect VLIW world that would mean you don't spend any transistors or power on branch prediction or out of order instruction searches. Indeed the original VLIW paper [1] spends vastly most its paragraphs on solving the (hard) compiler instruction scheduling problem with trace scheduling which is still used. The VLIW hardware itself is dead simple.
So if VLIW fits the problem it has fantastic performance characteristics. If it doesn't fit, and far and away most don't, then VLIW is terrible. VLIW is very brittle.
I need to make a caveat about the Mill CPU which is a VLIW but I see I've written too much already.
BTW, CS252 at Berkeley and Onur Mutlu's ETH lectures give a conventionally disparaging view of VLIW without pointing out its successes.