I wonder, which instruction fusions are implemented, as it is key to high-perfor...

jabbany · on March 26, 2023

Frankly, these days machine translation is more than enough to reproduce most technical documents at an acceptably high level of accuracy.

Fwiw, this is the relevant link: https://xiangshan-doc.readthedocs.io/zh_CN/latest/frontend/d...

blacklion · on March 27, 2023

Hmm, only strange combinations of arithmetic operations, no conditionals...

samvher · on March 26, 2023

Why is that, that they are key to performance with RISC-V? And why are they the reason not to have conditional moves? Would love to know more!

blacklion · on March 27, 2023

Branches are very expensive in modern high-performance CPUs, even with good branch predictors (and good branch predictor is expensive too, in transistor budget and consumed power). Many short branches can be eliminated by using conditional instructions, cmov is one of them (most often used). RISC-V doesn't contain such instructions, and authors of ISA says, that it is not needed because it make small (not high-performance) implementations cheap (they don't need to implement complex conditional instructions), and big high-performance implementations can implement them via instruction fusing (recognize common patterns of short branches and replace with "internal" conditional moves and alike).

panick21_ · on March 27, 2023

B extention does add have 'cmov' see:

https://raw.githubusercontent.com/riscv/riscv-bitmanip/maste...

If you do instruction fusion, you might be able to transform the typical pairs that would create a short branch and simply convert them into a cmov micro-instruction. That it does not make the branch predictor much more complex.

brucehoult · on March 27, 2023

That's a draft, containing instructions under consideration for inclusion.

The ternary instructions didn't pass the test for utility vs cost and were not ratified.

For example `cmix` only replaces three other instructions, two of which can be done in parallel. Emulating `cmov` is slightly more expensive: generate an all-1s or all-0s mask from your condition (e.g. `slt mask,a,b; neg mask,mask` and then do a `cmix`.

Applications that want `cmov` for constant execution time for security reasons can cope with it using a couple of extra instructions. Applications that want `cmov` for average performance because the condition is unpredictable are mostly mistaken with modern branch prediction :-)

There is a fairly strong possibility that a pair of instructions might be added to put `cmov` emulation on the same footing as `cmix`:

    Rd = Rs2 ? Rs1 : 0
    Rd = Rs2 ? 0 : Rs1

These instructions only need two register inputs.

panick21_ · on March 28, 2023

You are right, it was removed and is no longer in the newer spec:

https://wiki.riscv.org/display/HOME/Recently+Ratified+Extens...

snvzz · on March 27, 2023

>Would love to know more!

The specs themselves come with a lot of rationale for decisions taken.

But, as a starting point (from zero), I recommend the "RISC-V Reader" book, by the main authors of the ISA, as it makes a great introduction to the design of RISC-V.