T1: A RISC-V Vector processor implementation

chasil · 2025-02-06T22:39:51 1738881591

It's a legacy processor, but the UltraSPARC T1 is the first thing I thought of when I saw the title...

https://en.m.wikipedia.org/wiki/UltraSPARC_T1

This legacy CPU is actually open source, along with its successor, the T2.

https://www.oracle.com/servers/technologies/opensparc.html

bpye · 2025-02-07T07:33:18 1738913598

I wonder if we would have seen Sun continue to release the RTL for future SPARC cores had they not been acquired by Oracle.

crest · 2025-02-07T19:33:59 1738956839

Maybe, but it's not like the T1 or T2 had been successful designs. Idea 4x (or even 8x) SMT to hide L2 (and to an extend L3) like cache latency from an in-order architecture feels didn't work out. Not many users have only parallelizable integer workloads. I suspect Sun got desperately tried to do what they could on their dwindling budget within the constraints of 64 bit SPARC. It's massive architectural register file and fixed sized register windows had to be pain to implement in an out of order uarch. I would love to gain more insight into how Fujitsu pulled this minor miracle off as the swan song SPARC.

camel-cdr · 2025-02-06T15:21:35 1738855295

Here is a list of open sourve RVV implementations: https://github.com/stars/camel-cdr/lists/rvv-implementations

They have varying progress and target performance.

explodingwaffle · 2025-02-06T16:16:46 1738858606

TIL this github "list" feature. neat

mindcrime · 2025-02-07T02:34:11 1738895651

> TIL this github "list" feature

Same.

Neywiny · 2025-02-03T23:04:38 1738623878

Might be interesting to see this combined with vex. Also nice to see the recognition that memory bandwidth is an important consideration.

crest · 2025-02-06T15:48:36 1738856916

Some of the old Cray machines only cached instructions and scalar data. Instead of a vector cache they used vector scratchpad registers and plenty of interleaved memory channels to keep up with the vector ALUs. That's one part of the design space you can't go RISC-V without yet an other vendor extension.

metadat · 2025-02-06T17:09:04 1738861744

Am I understanding the README correctly in that:

You can execute some nix commands to fire up an emulator for this CPU design? That's pretty cool, I wonder how hard it'd be to reduce it to a docker command?

Also, I'd never heard of Chisel but it looks amazing - software defines hardware via a Python-esque DSL to Verilog compiler.

https://github.com/chipsalliance/chisel

IshKebab · 2025-02-06T18:49:43 1738867783

Chisel is just a Scala library to generate SV. I haven't actually used it but I've used similar systems and a really big problem with them is debugging. The generated SV tends to be unreadable and you will spend a lot of time debugging it.

Chisel has a similar competitor called SpinalHDL that is apparently a bit better.

https://spinalhdl.github.io/SpinalDoc-RTD/master/index.html

IMO using general purpose languages as SV generators is not the right approach. The most interesting HDL I've seen is Filament. They're trying to do for hardware what Rust has done for software. (It's kind of insane that nobody has done that yet, given how much effort we put into verifying shitty SV.) Haven't tried it yet though.

https://filamenthdl.com/

aseipp · 2025-02-07T00:28:48 1738888128

The real reason to use those languages is because they often lend themselves to using the meta-language for other purposes, like writing flexible open ended test suites, or using some forms of code generation/metaprogramming to generate parts of the design from other things. That's very useful and one of the attractive properties of systems like Amaranth or Clash, and one of the downsides (IMO) of approaches like Filament or Bluespec. That said, the most important bits about Filament and Bluespec are their high-level concepts (like guarded actions and timeline types), which could be adopted into other RTLs as well.

At the end of the day though, sometimes you just have to debug a netlist, and that probably will remain true of Filament too. (Any language with higher-order applicative concepts will eventually run into some issues with wire names, etc, that's just unavoidable.) I think the SystemVerilog or whatever is just a red herring at that point; the tools for doing netlist debugging all feel like the equivalent of having to debug compiler assembly output with no debug symbols. Making sure you need to reach for the debugger much less is a good first step, but I'm not sure how to improve this part.

IshKebab · 2025-02-07T09:06:10 1738919170

Yeah I agree that debugging generated SV is like debugging assembly and you rarely have to do that when writing normal programs. I think the difference is tooling. I can fire up an IDE debugger and have full access to all the relevant information and controls without seeing any assembly.

I don't know of any compile-to-SV tools that have a debugger anywhere near as capable as that. They definitely should! But they don't right now, so we're stuck at debugging RTL.

_chris_ · 2025-02-06T21:22:01 1738876921

It’s not that hard to debug— your signal names and register names all carry through. Sure, lots of temp wires get generated but that’s never where your bug is.

IshKebab · 2025-02-06T23:02:59 1738882979

Maybe. I haven't used it. But with the compile-to-SV language I do use you're right it generates a lot of temporary wires and the bugs are never there, but they make it extremely tedious to trace drivers from the point of failure back to the cause.

camel-cdr · 2025-02-06T18:57:36 1738868256

The provide the docker image:

run docker image: docker run --name t1 -it -v $PWD:/workspace --rm ghcr.io/chipsalliance/t1-blastoise:latest /bin/bash

execute program: ip-emulator --no-logging -C yourProgram

JoachimS · 2025-02-06T15:22:18 1738855338

Would love to have seen some benchmarks.

camel-cdr · 2025-02-06T17:30:57 1738863057

It's sadly not integrated into a scalar core yet, AFAIK it currently uses spike to execute the scalar instructions.

You can try it if you want to:

They offer a pre-build docker environment, so you can play around with the RTL simulation via "docker run --name t1 -it -v $PWD:/workspace --rm ghcr.io/chipsalliance/t1-blastoise:latest /bin/bash" This drops you into a shell, and you can start simulating a 512-bit vector length processor, with "ip-emulator --no-logging -C yourProgram", see the tests/ director for example code. At least in theory, but there might still be a few bugs.

fulafel · 2025-02-06T20:20:17 1738873217

Previously (name-wise): https://en.wikipedia.org/wiki/UltraSPARC_T1

tl;dr - a 32-thread SPARC cpu from 20 years ago, with subsequent chips getting to 256 threads per chip

hassleblad23 · 2025-02-06T15:50:59 1738857059

Very cool