At least for ^ I was always told that the ADM3a terminal keyboard had a HOME key with ~ and ^ on it, which is why ~ means "home directory", and ^ means "home of the line".
I really enjoyed your faces example! I’ll have to look at the rest of the video soon, but it’s so cool what you can do with a little imagination and ingenuity!
Do you have any texts/websites/papers that would allow one (me) to learn about "deeply rudimentary Lisp" and how to create one? I am especially interested in learning why 4 general-purpose registers are important and other lower-level details like that.
Sure! One fantastic starting point is Lisp in Small Pieces, which shows you how to build multiple different Lisp interpreters, and then several increasingly fancy Lisp compilers.
The trick with a macro-assembler that uses Lisp macros to generate assembly was basically folklore when I learned it, and I haven't seen it fully fleshed out anywhere in the literature. For a tiny chip, you'd run this as a cross compiler from a bigger machine. But you basically have Lisp macros that expand to other Lisp macros that eventually expand to assembly representated as s-expressions.
As for why basic Lisps are register-hungry, you usually reserve an "environment pointer register", which points to closure data or scope data associated with the currently running function. And then you might also want a "symbol table base" register, which points to interned symbols. The first symbol value (located directly where the symbol register points) should be 'nil', which is both "false" and the "empty list". This allows you to check Boolean expressions and check for the empty list with a single register-to-register comparison, and it makes checks against other built-in symbols much cheaper. So now you've sacrificed 2 registers to the Lisp gods. If you have 8 registers, this is fine. If you have 4 registers, it's going to hurt but you can do it. If you have something like the 65C02, which has an 8-bit accumulator and two sort-of-flexible index registers, you're going to have to get ridiculously clever.
Of course, working at this level is a bit like using #[no_std] in Rust. You won't have garbage collection yet, and you may not even have a memory allocator until you write one. There are a bunch of Lisp bootstrapping dialects out there with names like "pre-Scheme" if you want to get a feel for this.
Forth is a stack machine, so you basically just need a stack pointer, and a couple of registers that can be used to implement basic operations.
Anyway, Lisp in Small Pieces is fantastic, and it contains a ton of the old tricks and tradeoffs.
I heartily endorse Lisp In Small Pieces. It's sitting beside me right now.
I recently wrote an assembler in scheme; I'm in the process of adding macros. You need very few primitives to implement what amounts to a lisp compiler. A larger issue is bootstrapping garbage collection from manual memory allocation—while there are a few tricks to do this simply, if inefficiently, high-performance garbage collection needs to be tightly integrated with the compiler in order to implement a) scanning for roots, b) rewriting pointers when you move allocations around, and c) likely pointer tagging. None of this is easy to design around or to bolt on to a macro-oriented compiler-assembler.
And of course, writing the really fancy bits of lisp—escaping closures, continuations, conditions—take a lot more effort and thought and care.
Curiously, garbage collection is far from necessary to achieve bootstrapping. It's just rather odd to use a lisp with manual memory allocation. I've found stealing from Rust's memory model has been very fruitful in this regard (not the borrow checker). RAII is also very easy to implement with careful attention to how scoping and moving is implemented.
Though no worse than 16 or 32 bit x86 (without FPU), and probably better because the lower 8 registers are general-purpose.
Also you can get something useful from the "spare" five registers r8-r12 as they support MOV, ADD and CMP with any other register, plus BX. Sadly you're on your own with PUSH/POP except for PUSH LR / POP PC.
Thumb-1 (or ARMv6-M) is fairly similar to RISC-V C extension. It's overall a bit more powerful because it has more opcodes available and because RVC dedicates some opcodes to floating point. RVC only lets you do MV and ADD on all 32 (or 16 in RV32) registers, not CMP (not that RISC-V has CMP anyway). Plus, RVC lets you load/store any register into the stack frame. Thumb-1 r8-r14 need to be copied to/from r0-r7 to load or store them.
But on the other hand, RVC is never present without the full-size 4 byte instructions, even on the $0.10 CH32V003, making that a bit more pleasant than the similar price Cortex M0 Puya PY32F002.
My initial experience with Thumb-1 was like stepping on a series of rakes. Can't use ADD? Why not? Oh, it turns out you have to use ADDS. Wait, why am I getting an error when I try to use ADDS? Turns out that inside an ITTE (etc.) block, you can't use ADDS; you have to use ADD. And the various other irregular restrictions on what you can express are similarly unpredictable. Maybe my gripe isn't really with Thumb-1 but with GAS, but even when you learn the restrictions, it still takes extra mental effort to program under them. I did have some similar experiences with 8086 code (it took me a certain amount of trial and error to learn which registers I could use as base registers and index registers, as I recall) but never 80386 code, where all of its registers are just as general-purpose as on Thumb-1, unless you're looking for sizecoding hacks to get your demo down under 64 bytes or whatever.
I agree that RVC is similar in theory, but being able to mix 4-byte instructions into your RVC code largely eliminates the stepping-on-rakes problem, even on Graham Smecher's redoubtable Minimax which Jecel Assumpção mentioned. I still prefer ARM assembly over RISC-V, but both definitely have their merits.
Oh, you're right, of course. I misremembered that rake. I stepped on some others I can't remember now, though.
I wouldn't be surprised to see commercial implementations of Minimax. It seems like it would have a much better cost/benefit ratio than SeRV for some applications.
This is an experimental rather than practical design that only directly implements the compressed instructions in hardware and then implements the normal RV32I instructions in "microcode" written using the compressed instructions.
The LUT counts do look competitive, until you realise that this doesn't include the cost of the microcode.
Probably fine on FPGA where there's lots of almost free BRAM, but on an ASIC where you'd need to use SRAM or mask ROM, or if you used LUTRAM, it would look very different.
Plus, the speed penalty for the microcoded instructions is huge. perhaps not as huge as SeRV :-)
That sounds reasonable, yeah. Presumably you'd write your inner loops purely in RVC instructions; in the situations where you'd use SeRV, you wouldn't be using it for your computational bottlenecks, which you'd build special-purpose hardware for, but just to sort of orchestrate a sequence of steps. But Minimax seems like it could really reduce the amount of stuff you had to design special-purpose hardware for.
reply