Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I had VROOM! in mind (https://github.com/MoonbaseOtago/vroom) because I remembered it aims for 4 IPC avg with a width of 8. Though looking again it's 8 compressed 16 bit instructions or 4 uncompressed 32 bit instruction.

So you could argue a real mix of instructions is not going to be all 16 bit but some 16 and some 32, so the 8 is rarely achieved in practice, and also the block diagram only shows 4 decode blocks. But it can in fact peak at 8 instructions decoded per clock, so I'll call that 8 wide decode.

(You could even argue it's especially impressive, since RISC-V technically qualifies as variable-length encoding like x86, it's just that only the 16/32 instructions encoding are really in use at the moment)



As you point out current VROOM! is 4-8 into the decoder depending on the mix of incoming instructions, each decode block can decode one 32-bit instruction, or 2 16-bit ones.

A VERY general rule of thumb is that every 5th instruction is a branch, decoding more than 8 may be pointless

However VROOM! can replay 8 instructions per clock out of the trace cache .... (they're already decoded and get pushed into the pipe after the decoders)


Hey! I looked at your blog and loved your bug analysis, it's extremely rare to see this stuff publicly but I'm excited for more as the FOSS CPU scene speeds up. Hopefully the wide decode will be more useful when trace cache lands.

Is VROOM primarily targeting ASICs of FPGAs? I know BOOM is primarily meant for ASICs but I feel like there is still a lot of room to build a good OoO FPGA core, particularly since us mere mortals can't get a spot on a wafer big enough to support even the smallest configs of BOOM (or presumably VROOM) since the efabless Sky130 shuttles, while "affordable", are hardly big enough to most OoO cores. I've been thinking about taking the ideas from this paper [1] and adding it to BOOM to try and make it more FPGA friendly since that seems like the most realistic way to get a very high performance custom core at this point.

[1] https://ieeexplore.ieee.org/document/8977924/


I've previously been an ASIC designer, so that's what I've been aiming it at. FPGAs are just a great way to find bugs (you'll notice I've spent no time on performance) though now the design is too big for even that

I've been assuming that anyone building one would be building hand built data paths/register files/caches/TLBs/etc largely because I've come from that world




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: