Really awesome project. I want to get into FPGAs, but honestly it's even hard to...

Aromasin · on April 25, 2024

Reframe it in your mind. "Getting into FPGAs" needs to be broken down. There are so many subsets of skills within the field that you need to level expectations. No one expects a software engineer to jump into things by building a full computer from first principles, writing an instruction set architecture, understanding machine code, converting that to assembly, and then developing a programming language so that they can write a bit of Python code to build an application. You start from the top and work your way down the stack.

If you abstract away the complexities and focus on building a system using some pre-built IP, FPGA design is pretty easy. I always point people to something like MATLAB, so they can create some initial applications using HDL Coder on a DevKit with a Reference design. Otherwise, there's the massive overhead of learning digital computing architecture, Verilog, timing, transceivers/IO, pin planning, Quartus/Vivado, simulation/verification, embedded systems, etc.

In short, start with some system-level design. Take some plug-and-play IP, learn how to hook together at the top level, and insert that module into a prebuilt reference design. Eventually, peel back the layers to reveal the complexity underneath.

checker659 · on April 25, 2024

I'm in the same boat. Here's my plan.

1. Read Harris, Harris → Digital Design and Computer Architecture. (2022). Elsevier. https://doi.org/10.1016/c2019-0-00213-0

2. Follow the author's RVFpga course to build an actual RISC-V CPU on an FPGA → https://www.youtube.com/watch?v=ePv3xD3ZmnY

dailykoder · on April 25, 2024

Love the Harris and Harris book!

I might add these:

- Computer Architecture, Fifth Edition: A Quantitative Approach - https://dl.acm.org/doi/book/10.5555/1999263

- Computer Organization and Design RISC-V Edition: The Hardware Software Interface - https://dl.acm.org/doi/10.5555/3153875

both by Patterson and Hennessy

Edit: And if you want to get into CPU design and can get a grip on "Advanced Computer Architecture: Parallelism, Scalability, Programmability" by Kai Hwang, then i'd recommend that too. It's super old and probably some things are made differently in newer CPUs, but it's exceptionally good to learn the fundamentals. Very well written. But I think it's hard to find a good (physical) copy.

ruslan · on April 27, 2024

I would suggest the following route:

1. Clone this educational repo https://github.com/yuri-panchul/basics-graphics-music - a set of simple labs for those learning Verilog from the scratch. It's written by Yuri Panchul who worked at Imagination developing GPUs, by the way. :) 2. Obtain one of the dozens supported FPGA boards and some accessories (keys, LEDs, etc). 3. Install Yosys and friends. 4. Perform as many labs from the repo as you can, starting from lab01 - DeMorgan.

You can exercise labs while reading Harris&Harris. Once done with the labs and with the book, it's time to start your own project. :)

PS: They have a weekly meetup at HackerMojo, you can participate by Zoom if you are not in the Valley.

samvher · on April 25, 2024

I don't know where you are in your journey, but I found these resources very helpful to better understand digital logic and CPU/GPU architecture:

1. https://learn.saylor.org/course/CS301

2. https://www.coursera.org/learn/comparch

3. https://hdlbits.01xz.net/wiki/Main_Page

imtringued · on April 25, 2024

If you want to accelerate LLMs, you will need to know the architecture first. Start from that. The hardware is actually both the easy (design) and the hard part (manufacturing).

IshKebab · on April 25, 2024

You might want to pick a more realistic goal! An FPGA capable of accelerating LLMs is going to cost at least tens of thousands, probably hundreds.

JoachimS · on April 25, 2024

Depends heavily on what system it is supposed to provide acceleration for.

If it is a MCU based on a simple ARM Cortex M0, M0+, M3 or RISC-V RV3I, then you could use an iCE40 or similar FPGA to provide a big acceleration by just using the DSPs and the big SPRAM.

Basically add the custom compute operations and space that doesn't exist in the MCU, operations that would take several, many instructions to do in SW. Also, just by offloading to the FPGA AI 'co-processor' frees up the MCU to do other things.

The kernel operations in the Tiny GPU project is actually really good examples of things you could efficiently implement in an iCE40UP FPGA device, resulting in substantial acceleration. And using EBRs (block RAM) and/or the SPRAM for block queues would make a nice interface to the MCU.

One could also implement a RISC-V core in the FPGA, thus having a single chip with a low latency interface to the AI accelerator. You could even implement the AI acceleator as a set of custom instructions. There are so many possible solutions!

An ice40UP-5K FPGA will set you back 9 EUR in single quantity.

This concept of course scales up to performance and cost levels you talk about. With many possible steps in between.

rjsw · on April 25, 2024

Or use one of the combined CPU+FPGA chips like the AMD/Xilinx Zynq, there are plenty of low cost dev boards for them.

JoachimS · on April 25, 2024

Sure, a good example of a step between a really tiny system and 100k+ systems.

imtringued · on April 25, 2024

Something that appears to be hardly known is that the transformer architecture needs to become more compute bound. Inventing a machine learning architecture which is FLOPs heavy instead of bandwidth heavy would be a good start.

It could be as simple as using a CNN instead of a V matrix. Yes, this makes the architecture less efficient, but it also makes it easier for an accelerator to speed it up, since CNNs tend to be compute bound.

lusus_naturae · on April 26, 2024

A simple project is implementing a FIR filter using a HDL like Verilog. The Altera university FPGAs are cheap enough.