I wonder what would be the advantages of using an FPGA to *test* a CPU design - ...

dbcurtis · on July 23, 2020

This idea is more than 30 years old. It has been done, and one upon a time companies were built around this idea.

First off, mapping an entire CPU to an FPGA cluster is a design challenge itself. Assuming you can build an FPGA cluster large enough to hold your CPU, and reliable enough to get work done on it, you have the problem of partitioning your design across the FPGA's. Second problem: observability. In a simulator, you can probe anywhere trivially, with an FPGA cluster, you must route the probed signal to something you can observe. (I am not even going to talk about getting stimulus in and results out, since with FPGA or simulator, either way you have that problem, it is just different mechanics.)

The big problem is that an FPGA models each signal with two states: 1 and 0. A logic simulator can use more states, in particular U or "unknown". All latches should come up U, and getting out of reset (a non-trivial problem), to grossly oversimplify, is "chasing the U's away". An FPGA model could, in theory, model signals with more than two states. The model size will grow quickly.

Source: Once upon a time I was pre-silicon validation manager for a CPU you have heard of, and maybe used. Once upon a time I was architect of a hardware-implemented logic simulator that used 192 states (not 2) to model the various vagaries of wired-net resolution. Once upon a time I watched several cube-neighbors wrestle with the FPGA model of another CPU you have heard of, and maybe used.

Note: What would 3 state truth tables look like, with states 0,1,U? 0 and 1 is 0. 0 and U is 0. 1 and U is U -- etc. You can work out the rest with that hint, I think.

Edit to add: Why are U's important? They uncover a large class of reset bugs and bus-clash bugs. I once worked on a mainframe CPU where we simulated the design using a two-state simulator. Most of the bugs in bring-up were getting out of reset. Once we could do load-add-store-jump, the rest just mostly worked. Reset bugs suck.

jacquesm · on July 23, 2020

> Reset bugs suck.

Indeed they do. And even if you have working chips you get the next stage: board level reset bugs. A MC68K board I helped develop didn't want to boot, some nasty side effect of a reset line that didn't stay at the same level long enough stopped the CPU from resetting reliably when everything else did just fine. That took a while to debug.

rcxdude · on July 23, 2020

Because it's substantially faster. Simulating a large CPU design in software is slow and it doesn't parallelise well, so your tests will take a lot longer (and these aren't fast even with FPGA acceleration: runtimes can be days or weeks if you're running a large fraction of the design for even a tiny amount of time in the simulation).

NotCamelCase · on July 23, 2020

SW-based simulation is mostly about functional correctness and robustness of an implementation. Even with cycle-accurate simulations there is a lot of data you can't just extrapolate from simulation results pertaining to timing and performance constraints. And that's where emulating CPU/GPU/ASIC designs generally help the most.