Ariane RISC-V CPU

nickik · on Feb 17, 2018

This effort is part of the PULP Platform [1]. They have a number of cool chips and projects.

Its a pretty large effort and they are releasing more and more of their work into open source. Here [2] you can see what they have already released and their plans going foreward.

Project like LowRisc Open-SoC [3] using some of the smaller PULP cores as minion chips for things like IO offloading, security and so on. They can also reuse a lot of the work done for PULP.

[1] https://www.pulp-platform.org/

[2] https://www.pulp-platform.org/release-plan/

[3] http://www.lowrisc.org/

rwmj · on Feb 17, 2018

The lack of floating point and atomics is an "interesting" choice for a Linux-capable chip. It might run Fedora[1], but would likely require kernel support for emulating floats (as the distro is compiled assuming hard floats) and of course without atomics would only be single core. The README says they plan to add atomics.

At least it supports compressed instructions. There are some actual chips about to be released without the C (compressed) extension, but Fedora and Debian are compiling everything assuming the C extension.

In Fedora news, just today we've got a bootable (but very minimal) RISC-V disk image[2] that you can run in qemu.

[1] https://fedoraproject.org/wiki/Architectures/RISC-V

[2] https://fedorapeople.org/groups/risc-v/disk-images/

azernik · on Feb 17, 2018

Lack of floating point is not so out there - I've worked on several embedded Linux systems that shipped without hardware floating point support, and they didn't suffer much for it. Userspace code got to use floating point with software emulation, and kernel code avoids floating point (especially in contexts where interrupts are blocked) for precisely this use case.

On atomics, see this section of the readme:

> While developing Ariane it has become evident that, in order to support Linux, the atomic extension is going to be mandatory. While the core is currently booting Linux by emulating Atomics in BBL (in a single core environment this is trivially met by disabling interrupts) this is not the behavior which is intended. For that reason we are going to fully support all atomic extensions in the very near future.

baobrien · on Feb 17, 2018

The SiFive chips (the microcontroller grade FE310 and closer to SoC FU540) both implement the C extension. The FE310 is RV32IMAC and the FU540 is RV64GC. Of course, you can't really hook external ram up to the FE310, and it doesn't have a TLB anyway, so no practical Linux there.

rwmj · on Feb 17, 2018

The SiFive FU540 is what Fedora is targeting initially. They have actual ASICs: https://rwmj.wordpress.com/2018/02/03/sifive-unleashed-board...

justinclift · on Feb 17, 2018

Any idea which FMC standard is used on the board?

Haven't found any docs/info about it so far. :(

Looking at the photo's on the announcement, your blog (etc), it doesn't look like the HPC nor Vita57 pin-out versions, so guessing it's an LPC version?

baobrien · on Feb 18, 2018

Like rwmj says, ask somebody at SiFive -- but if I had to guess, I'd say it only breaks out some low speed IO/GPIO lines and a few chiplink lanes. That is to say if you want anything high speed, you're going to have to go through an FPGA board.

rwmj · on Feb 17, 2018

Hi Justin, long time no see. I don't know the answer. But please ask either DJ Delorie or better still the SiFive team. It's definitely only a very narrow selection of boards, and very likely $random_board you buy will not work.

justinclift · on Feb 17, 2018

Thank Rich. Good point, will do. :)

justinclift · on Feb 20, 2018

Following up on this. I emailed the info@sifive.com address, but haven't received any kind of response.

In the mean time, this forum answer of a similar question answers the FMC port type:

https://forums.sifive.com/t/questions-about-hifive-unleashed...

> HiFive Unleashed is an FMC mezzanine card with a male High Pin Count FMC connector

phkahler · on Feb 17, 2018

So when will Fedora be doing RISC-V builds on real RISC-V hardware? That will be an interesting milestone.

rwmj · on Feb 17, 2018

In about 6-8 weeks from now, when we receive the first batch of hardware.

nickik · on Feb 17, 2018

This is not the first time the PULP project has released a chip without a FPU. In the other case they have latter added a version with one.

The atomics they seem to want to support but arn't quite done.

eeZah7Ux · on Feb 17, 2018

Debian already supports ARM with and without hard float.

Supporting a soft-float RISC is not different.

rwmj · on Feb 17, 2018

That's a proof that something is possible. Actually doing it is a huge amount of work. ARM and RISC-V both have different ABIs for hard/soft float and once you support multiple ABIs you end up compiling everything twice and multiarching everything. Also that's ARMv7 (32 bit). This is a 64 bit chip and I'm not aware of any ARM 64 bit chip that lacks an FPU.

In any case I was thinking about kernel or M-layer emulation of floats which is different, requires a bunch of code to be written once, but doesn't require software gymnastics from the distro.

cturner · on Feb 17, 2018

Not to disagree with you, you highlighted intended use for linux. It touches on something i have wondered about. How far you could get in creating a general purpose computing platform with no floating point support at all, including no soft float. If you could avoid float, you could save on implementation costs, and round out a source of bugs downstream. Are there traditions that takes this path?

pjc50 · on Feb 17, 2018

Quite far. Most cheap microcontrollers have no FPU an may not even have integer divide; PICs advertise whether or not they have a multiplier as a product feature.

Fixed-point is often a reasonable substitute for floating point, if you have enough bits; Doom uses it for its 2.5D rendering.

However, all your software has to be defined with this in mind. Otherwise you have to have soft-float in order to run it at all, which is horribly slow.

rwmj · on Feb 17, 2018

Some applications use different/narrower definitions of float. For example GPUs have 16 bit floats and TPUs (ie. AI co-processors) also use narrower floats which are specialized for neural networks.

But for a general purpose application/server chip firstly the FPU doesn't really take up much space and secondly if you ship without an FPU then you can never enter certain important markets (eg. HPC).

madez · on Feb 17, 2018

That is a very interesting question.

Sometimes a solution to a problem using IEEE-754-arithmetic can be rewritten to use integer arithmetic.

Now, what are the problems where IEEE-754-arithmetic is used now, and in what cases it cannot be rewritten?

I wouldn't be surprised if music decoding could be done without IEEE-754-arithmetic. Maybe, a specialized codec would have to be used, but I still consider that doable.

mrob · on Feb 17, 2018

Music decoding is certainly possible without floating point arithmetic. The popular "MAD" MPEG Audio Decoder uses only integer arithmetic:

https://www.underbit.com/products/mad/

madez · on Feb 17, 2018

Thanks for pointing this exmple out. Now, I'm also confident that graphics are doable in integer arithmetic.

Anybody any ideas where IEEE-754 is mandatory to achieve acceptable performance in individual computing workloads?

stephencanon · on Feb 17, 2018

Floating-point is (almost) never necessary to achieve acceptable performance in one specific compute workload. The value of IEEE 754 floating-point is that it makes it much, much easier to write correct numerically stable programs, because the arithmetic has a scale-invariant rounding error model (assuming no over-/underflow), and the results in the case of over/underflow are fully defined, so no unpleasant surprises like UB for integers.

For (almost) any one fixed algorithm with specified input scaling and known error tolerance, you can choose an appropriate fixed-point format and come within a reasonable constant of floating-point performance on generic hardware, and exceed it on special-purpose hardware. The real value of floating-point is in writing libraries that deliver good performance and acceptable numeric results without any a priori knowledge of input data scaling. This is critical to allowing people to quickly re-use and repurpose all the existing numerical computing software for new problems, and allows engineering tools like Matlab to be useful to people who are not numerics experts.

dathinab · on Feb 17, 2018

I do not think IEEE-754 is mandatory for any work load, it is just that using fixedpoint arithmetic, especially if mapped onto of a number of common integer types tends to be more complex to implement then simply using float.

With fixed point you have to be much more aware about e.g. the min-max-range of values and the precision needed, potentially changing them between different computation steps.

If you have any complex math algorithm you should also look into this aspects with float for error calculation etc. So maybe using fixed-point would not be so bad there, but for a lot of "just get it done" tasks people want to just have a float and not bother about it.

If you have a way to quickly translate math specific programs using floats + little amounts of meta data into math specific programs using fixed-point arithmetic including error analysis/correction* you probably would get ride of >90% of float requirements, through you would "still" have all that legacy software ;=)

[*] If I remember correctly someone did that for Neural Networks using tensorflow, but I don't remember the details

amelius · on Feb 17, 2018

Note that you can also code the floating point instructions in microcode (a kind of firmware of the CPU).

amelius · on Feb 17, 2018

Not sure why downvoted, but this basically allows you to redefine (within limits) the behavior of opcodes after you have shipped the CPU. I thought that was relevant.

https://en.m.wikipedia.org/wiki/Microcode

_chris_ · on Feb 17, 2018

And RISC-V has a "machine mode" which the OS doesn't know about, which is effectively a place to do microcode.

bogomipz · on Feb 18, 2018

From the Readme:

"While developing Ariane it has become evident that, in order to support Linux, the atomic extension is going to be mandatory. While the core is currently booting Linux by emulating Atomics in BBL (in a single core environment this is trivially met by disabling interrupts) this is not the behavior which is intended. For that reason we are going to fully support all atomic extensions in the very near future."

Is the atomic extension mentioned here something like the LOCK prefix in x86, i.e it just lock the bus for the instructions that follow? I'm guessing this is non-trivial to implement?

_chris_ · on Feb 18, 2018

Load-reserve, store-conditional, and atomic memory operations (e.g., AMO-ADD).

bogomipz · on Feb 18, 2018

Thanks, I had to look up the AMOADD instruction. For anyone else who's interested, this is nice RISC-V reference card:

https://www.cl.cam.ac.uk/teaching/1617/ECAD+Arch/files/docs/...

pooya13 · on Feb 18, 2018

Hey guys. What is your favourite resource for learning how to implement a complete RISC-V processor? I have taken a course in computer architecture so I have an idea about the theory but I really want to try and actually implement and simulate one from scratch and maybe even synthesize it if the FPGA board is less than 200$.

_chris_ · on Feb 18, 2018

I learned from MIT's 6.004. They provide infrastructure for you to build a simple core from gates for a RISC ISA in their own test environment. Their materials should be online (or available via OCW).(E.g., https://www.youtube.com/watch?v=CvfifZsmpQ4).

Once you understand how to build a core, frankly the more daunting part is how to interface with the core. How do you load a program into its memory? How do you see what it is doing? How do you know if it finished running or if it crashed?

You may find luck following along with FPGA tutorials (maybe take a look at the Pynq board?) that teach you how to implement a design, put it on the board, and talk to it. Once you understand how to use your FPGA and how to interface with your designs, changing the design to a simple RISC-V core will be more tractable.

You could try and check out picorv32 or riscv-sodor to see two examples of "simple" RISC-V cores that you can build, simulate on your computer, and watch them execute programs you wrote. But man the "magic" behind the test harnesses can be super opaque.

mtgx · on Feb 17, 2018

All RISC-V CPUs seem to be in-order right now, but I assume their ISA supports speculative execution for out-of-order CPUs. So are they going to redesign that part of the ISA in a future update, before companies start coming out with out-of-order RISC-V CPUs with speculative execution? I assume some of them were already mid-design of their out-of-order chips, so this could delay their launches quite a bit. But it would be for the best long-term.

nickik · on Feb 17, 2018

Seems very unlikely that ISAs will have to change, the problem needs to be fixed in micro Architecture.

Its unlikely that there are many mass production RISC-V CPU currently in design. A couple are known but they are more advanced research projects.

The BOOM core will be used by Esperanto and I'm sure they will evaluate what to do about the security issues. The guy who made BOOM written about this problems, check on Twitter.

_chris_ · on Feb 17, 2018

Spectre can be completely mitigated, it's just another headache to deal with. =(

Essentially, you just have to buffer up more stuff and clear the misspeculated entries so nothing is left lying around that could leak information. Comes with a cost, but chip companies don't have to give up on OOO/speculation entirely.

CodesInChaos · on Feb 17, 2018

Sounds tricky. For example if your CPU has hyperthreading, cache misses in speculative execution could leak to the other thread running on the same core, because they share computational resources and the utilization of those is affected by cache misses in speculative execution.

_chris_ · on Feb 17, 2018

You tag things and prevent bypassing uncommitted data from one thread to another.

For example, it's a mistake (that's been made by other ISAs!) to allow two hardware threads to bypass store data out of the uncommitted StoreQueue from one thread to another. Your memory consistency model violates single-copy atomicity and now the programmer's life is much harder.

CodesInChaos · on Feb 21, 2018

I'm talking about the shared computation resources (e.g. ALU), not about shared caches. Cache hits and cache misses affect how much and when one thread utilizes the ALU, which means the other thread sharing it won't be able to use the same resources for computation. Since speculative execution also uses these shared resources, it leaks information about its cache misses to the other thread, even if a rolled-back speculative execution doesn't modify any caches.

huntie · on Feb 17, 2018

Not sure if they've done a production run, but BOOM[1] has existed for a while. I'm not sure if it supporst speculative execution yet.

[1] https://github.com/ucb-bar/riscv-boom

_chris_ · on Feb 17, 2018

It's been taped out as a research chip, but no production runs.

Naturally, like any OOO-issue processor, it supports speculative issuance of load operations to the memory system (or it wouldn't be worth the effort of OOO-issue).

rwmj · on Feb 17, 2018

What needs to be redesigned? All OOO cores should behave as if they are executing in order when viewed by the programmer.

poizan42 · on Feb 17, 2018

I assume the parent is referring to Spectre.

monocasa · on Feb 17, 2018

RISC-V has a HINT instruction that's a NOP on the cores that don't support it, but can be a microarchitectual hint on the cores that do.

One of the encodings for the HINT will almost certainly be a speculative execution barrier.

oneplane · on Feb 17, 2018

This this also allow for compiling into something QEMU can use? Or does it not need anything like this to emulate RISC-V.