The CISC just became RISC on the inside thing is greatly overstated.
CISC always decoded to simple more or less single cycle ops internally, that's how microcode works. The RISC shtick was to get rid of that decoding into simple ops in the first place. Originally that didn't make sense because those ops' fetch bandwidth would be competing with data bandwidth. But notice how RISC popped up the same time as ubiquitous instruction caches? They solved the same problem in a more general way; the I$ means that your I fetch isn't competing with data on hot paths. You can also see this in how all of the early CISC archs would have single instruction versions of memset/memcpy/etc. The goal here is to get the cycle by cycle instructions out of the main bus data path by sticking them in microcode.
> CISC always decoded to simple more or less single cycle ops internally, that's how microcode works. The RISC shtick was to get rid of that decoding into simple ops in the first place
Having written microcode myself as a wee lad I would call that a gross oversimplification. And the micro machines inside a modern CPU are themselves fiendishly complex; my point is that microcode is no longer completely hand-crafted which is the only level to which the “RISC survived” argument might (IMHO) hold.
For the reasons above I also don’t agree with your final point.
I have to say I am not familiar with the microarchitechture of any of the early large-scale single-chip CISC CPUs (my microcode forays were for much larger machines) so we may be speaking to some degree at cross purposes. But again I think you mischaraterize the 801 and it’s descendants.
I've written microcode too, and there's two type of micorcode. Vertical and horizontal.
Horizontal is your wide microcode like seen on what I programmed, the KB11A CPU inside a PDP-11/45. It was somewhere around a hundred or so bits wide, and you could pretty clearly see "ok, these five bits just latch into this mux over here, these over here", etc. in the micro-architecture. I've seen between 96-bit adn 256-bit wide singel instructions here.
Vertical microcode is what you see in designs that the 801 was trying to get away from having a full CISC decoder for. Much smaller fixed length instructions that represent higher level ops, and are what RISC was trying to get rid of mainly.
The non ascetic CISC machines would normally have at least two microcode ROMs: one in the decoder, and at least one in the backend, maybe more depending on how the separated out their execution units.
So for instance 68K had:
* Decoder microcode of 544 17bit instructions
* Execution unit "nanocode" of 366 68bit instructions
An ARM1 had:
* No decoder microcode (but 32bit wide, fixed width, aligned ISA instructions with a I$)
* Execution unit microcode of 42 36bit instructions
CISC always decoded to simple more or less single cycle ops internally, that's how microcode works. The RISC shtick was to get rid of that decoding into simple ops in the first place. Originally that didn't make sense because those ops' fetch bandwidth would be competing with data bandwidth. But notice how RISC popped up the same time as ubiquitous instruction caches? They solved the same problem in a more general way; the I$ means that your I fetch isn't competing with data on hot paths. You can also see this in how all of the early CISC archs would have single instruction versions of memset/memcpy/etc. The goal here is to get the cycle by cycle instructions out of the main bus data path by sticking them in microcode.