I wonder if one could generate multiple implementations of a code block and select at runtime which is the best one given the current state of the CPU. Obviously this would require some architecture changes to have a multi-address-select jump or whatever, but this fundamentally seems like a problem only solvable with information known at runtime.
...though, come to think about it, this would be pretty easy with a tracing JIT.
That is an entirely different functionality from what I am suggesting. The select would select based on some other internal state to the cpu than a register.
...though, come to think about it, this would be pretty easy with a tracing JIT.