I can't really comment on the tradeoffs between specific ISAs since I've mainly worked on micro-arch research (which is ISA agnostic for most of the pipeline).
As for the questions on research into looking at decode complexity v instruction density tradeoff - I'm not aware of any recent work but you've got me excited to go dig up some papers now. I suspect any work done would be fairly old - back in the days when ISA research was active. Similar to compiler front-end work (think lex, yacc, grammar etc..) ISA research is not an active area currently. But maybe it's time to revisit it?
Also, I'm not sure if Huffman encoding is applicable to a fixed-size ISA. Wouldn't it be applicable only in a variable size ISA where you devote smaller size encoding to more frequent instructions?
Fixed instruction block was referring to the Huffman encoding. Something like 8 or 16kb per instruction block (perhaps set by a flag?). Compilers would have to optimize to stay within the block, but they optimize for sticking in L1 cache anyway.
Since we're going all-in on code density, let's go with a semi-accumulator 16-bit ISA. 8 bits for instructions, 8 bits for registers (with 32 total registers). We'll split into 5 bits and 3 bits. 5-bits gives access to all registers since quite a few are either read-only (zero register, flag register) or write occasionally (stack pointer, instruction counter). The remaining 3 bits specify 8 registers that can be the write target. There will be slightly more moves, but that just means that moves compress better and seems like it should enforce certain register patterns being used more frequently which is also better for compression.
We can take advantage of having 2 separate domains (one for each byte) to create 2 separate Huffman trees. In the worst case, it seems like we increase our code size, but in more typical cases where we're using just a few instructions a lot and using a few registers a lot, the output size should be smaller. While our worst-case lookup would be 8 deep, more domain-specific lookup would probably be more likely to keep the depth lower. In addition, two trees means we can process each instruction in parallel.
As a final optimization tradeoff, I believe you could do a modified Huffman that always encoded a fixed number of bits (eg, 2, 4, 6, or 8) which would half theoretical decode time at the expense of an extra bit on some encodings. it would be +25% for 3-bit encoding, but only 16% for 5-bit encoding (perhaps step 2, 3, 4, 6, 8). For even wider decode, we could trade off a little more by forcing the compiler to ensure that each Huffman encoding breaks evenly every N bytes so we can setup multiple encoders in advance. This would probably add quite a bit to compiling time, but would be a huge performance and scaling advantage.
Immediates are where things get a little strange. The biggest problem is that the immediate value is basically random so it messes up encoding, but otherwise it messes with data fetching. The best solution seems to be replacing the 5-bit register address with either 5 bits of data or 6 bits (one implied) of jump immediate.
Never gave it too much thought before now, but it's an interesting exercise.
As for the questions on research into looking at decode complexity v instruction density tradeoff - I'm not aware of any recent work but you've got me excited to go dig up some papers now. I suspect any work done would be fairly old - back in the days when ISA research was active. Similar to compiler front-end work (think lex, yacc, grammar etc..) ISA research is not an active area currently. But maybe it's time to revisit it?
Also, I'm not sure if Huffman encoding is applicable to a fixed-size ISA. Wouldn't it be applicable only in a variable size ISA where you devote smaller size encoding to more frequent instructions?