Wow. And I thought modern NetBSD on a 15 MHz m68030 with a 16 bit memory bus and 10 megabytes of RAM is slow. This is crazy!
It illustrates a point I've explained to many people over the years: once computers started coming with persistent storage, open address spaces and MMUs towards the late '80s and early '90s, we basically arrived at modern computing. An Amiga 3000 or i80486 computer can run the same things as a modern computer. Sure, we have ways to run things orders of magnitude faster, and sure, we now have things that didn't exist then (like GPUs that can run code), but there's no functional difference between those machines and new ones.
I love that Dmitry shows how loosely "functional" can be defined :)
I don’t know if this was a thing in America, but in USSR in the 70s and 80s it was very popular to play chess-by-correspondence. You would literally snail-mail letters back-and-forth with your move. Games would last months or years. It added an extra challenge to chess because by the time you got a response, you might have forgotten what strategy you had had.
This project is basically Linux-by-correspondence. The challenge is here too. By the time the command produces an output, you might have forgotten why you ran it.
There was something called Agora (https://en.wikipedia.org/wiki/Agora_(web_browser)) which was sort of an email/http proxy. You could browse the web via. email and set "GET", "POST" etc. commands. It was my first exposure to the web.
There was also a system where a person could email a request for a webpage to a certain email address and the web page would be returned by email. Existed in late 1980s? I think or maybe early 1990s?
M-xing https sites on Emacs, altering the format of text queried, is a means of inferential consumption of news. Parallel strains of information and thought, inhabiting the same site may be recollected if the medium is altered.
I installed Windows 95 on an Amiga 3000 with a 25 MHz m68030 via floppy to see if DMF formatted disks would work and to play around. By the time it finished, I had forgotten what I wanted to try out.
Reorganizing the combined correspondence on an excel spreadsheet through an arbitrary numbering system years later, and experiencing the Spinozan "pang of bad conscience."
Did you write the PIC12F1840 HEX file in 4004 machine code for your PIC-based one-chip sound player?
15 MHz m68030 with a 16 bit memory bus and 10 megabytes of RAM -- A Mac LC II, by any chance? :)
> towards the late '80s and early '90s
By the late 1960s, really. It would probably be possible to port Linux to the IBM Model 67 [1]; might even be easy since GCC can already target the instruction set. The MMU is sufficient. Maybe a tight squeeze with the max of 2 MB of fast core. Would be in a similar ballpark, a bit slower, to that 68030 machine.
Full virtualization, and hardware enforced memory and IO boundaries, were invented early on, too. It took a while for such features to trickle down to mini- and then micro- computers. And then much longer for popular software to take advantage.
I have fond memories of the System/360 M67 (and its successors, starting with a System/370 M168) I used from 1970 to the early 1980s. It ran the Michigan Terminal System, and we had all the modern conveniences (terminal operation in the IBM world was klunky, but PDP-10s at the same time did that right). And of course Unix dates from exactly that period.
The fact that Linux runs well on a modern zSeries demonstrates that, even with all the historical baggage this 60+-year-old architecture has, it carries with it many of the architectural innovations that current OSes and languages need.
Oh man, I has so much fun on MTS once that ridiculous fake PC-ish command line was removed. Wrote all kinds of wacky stuff using the command line macros.
Wonderful look in to our history, and you're undoubtedly correct about being able to target that system.
My example was about hardware that was affordable to mere mortals (although it's getting to be more expensive to buy that same hardware now as it was to buy it when it was new), but the idea is the same :)
That's basically the concept of Turing Completeness. Any Turing complete system can run anything. It may be very slow, but it will run. ChatGPT could run on a 4004, all you need is time.
I've always interpreted the definition of storage as arbitrarily large, not specifically infinite. The universe, after all, is finite. The "well, acshually" arguments aren't interesting, because they're 100% abstract.
It is defined as arbitrarily large but not infinite. That's not because of physical concerns, but because some of the theorems don't work if the memory is actually infinite.
You're comparing an a priori concept with a posteriori one. It's like claiming the number five doesn't "acshually" exist. Like yea, it's a concept, concepts don't exist.
A universe isn't a turing machine because it can't run all the programs that can run on a turing machine. This isn't exactly controversial.
What's the difference between arbitrary large and infinite? Would you say the number of possible Turing computable functions is merely arbitrary large and not actually infinite?
When you're talking about something like neural networks on a 4004, the "well ackshually" argument does become very much relevant. The limitations of that kind of platform are hard enough that they do not approximate a Turing machine with respect to modern software.
Llama takes a lot more MIPS and a lot more RAM than linux. Linux is more complicated, but computers were running linux 30 years ago. In this case, quantity has a quality all of its own.
Ignoring things like AVX512 you're looking at about 100x more compute to do something serious with LLaMA.
However! If you just want to demo it working, then you could generate 4 tokens using TinyLLaMA 1.1B which takes 25,164,386,466 cycles. That's about the same cost as booting Linux. So you could do TinyLLaMA if you can do Linux.
Note also that the 4004 lacks a floating-point unit of any kind - not just a vector unit. I think people make 8-bit integer quantizations of LLMs, though, which would be the fastest versions to run on a 4004.
A lot of quants just upcast to floats. Some of them work on integer multiplication using pmaddubsw. But oof, it looks like the i4004 doesn't even have that.
What? A Turing machine can literally be built. The only problem is supplying the infinite tape, or splicing on more when it runs to the end of a finite tape.
Machines that address storage using fixed width pointers (and have no other kind of storage) cannot be Turing machines.
You don't want your telephone exchange to halt. That's why the engineering marvel Erlang. "Engineering" being the key word. The Halting problem isn't a real problem. Just pull the power cord.
You want every call handled in finite time. It's not that there is an unending computation in a telephone exchange, but a series of finite tasks. You can call it corecursion, coinduction or something else co-.
A computer with external connectivity to a tape machine capable of reading and writing a symbols and moving along the tape would qualify as a Turing machine, provided someone feeds it enough tape for any problem that's thrown at it.
It was funny when Sun proudly and unilaterally proclaimed that Sun put the "dot" into "dot com", leaving it wide open for Microsoft to slyly counter that oh yeah, well Microsoft put the "COM" into "dot com" -- i.e. ActiveX, IE, MSJVM, IIS, OLE, Visual Basic, Excel, Word, etc!
And then IBM mocked "When they put the dot into dot-com, they forgot how they were going to connect the dots," after sassily rolling out Eclipse just to cast a dark shadow on Java. Badoom psssh!
Dmitry talks about compiling the kernel in years. While I haven't done that, I have built NetBSD-vax and NetBSD-mac68k natively on a VAXstation 4000/60 and on a Mac Quadra 610. It's on the scale of multiple months instead of years, but it's enough to give me a feel for it.
At Hackaday Supercon in 2002, the badge for attendees (https://hackaday.com/2022/10/12/the-2022-supercon-badge-is-a...) implemented a fictional 4-bit CPU along with control panel for directly entering instructions and running and stepping through code. I had a huge amount of fun implementing a space shooter video game on it, as the panel included a bit-by-bit view of one of its pages of memory. Comparing its Voya4 architecture with the 4004 was fascinating. Some similar tradeoffs, but the Voya4 has the benefit of 50 years of CPU instruction set exposure.
Alas, dimitygr's method wouldn't work on the badge, as the memory and RAM are all internal to the PIC24 that implements the CPU emulator.
Wow this was not a cheap project! Thanks Ebay collectors.
Also probably the only time I'd have gone for an LCD over a VFD. If you're running a multi-year long compile, it'll probably be burned in to hell by the end.
It's not original research. A Master's Thesis at most, or only a Bachelor's Degree Final Project. Regardless, doing such amazing work for a blog (and geek fame) is amazing.
Sorry, bad reading comprehension on my part! I see honorary PhDs often used to express political support, instead of honest admiration of achievements.
hm perhaps you right, I just meant that this is worth appraisal since it is so academic-level detailed, and I can imagine few universities where this can be a PHD thesis. I guess some parts of the whole reading can also rate original research.
> Still loads the kernel faster than a virtual ISO on a server's shitty IPMI over the internet
This gave me flashbacks of trying to boot Dell M1000e blade servers from an NFS-hosted ISO running on a Raspberry Pi, which was painfully slow to boot and run.
This was a very interesting read. I have read a bit about the 4004 before so I knew it was strange. But the level of obscurity is mind-blowing. Now I just got the urge to see how well I would be able to make a CPU with the same transistor count. It's not that much fewer than a 6502. 8 bit would make it so much easier to program.
For the video, i wanted a laptop with a real serial port (no usb). This one fit the bill and was $20 on eBay. Windows 2000 is the prettiest windows IMHO, so that’s what I installed for the demo video.
This is so awesome. I hope I can expand my knowledge such that I can understand most of this project, right now it was way past my limited CS proficiency.
Though my highlight (which I could completely comprehend) is "Section 14.b & 14.c - Getting the data..." All it took was 400K files (~275 photos/day after 4 years). We have so much of raw power of processing, storage & network still the most-used (probably) media-sync apps crashed or faced slow sync, AirDrop fails & lack of 'Select-All' UI feature. Crazy times we live/will live in... :)
That's kind of insulting honestly. Getting Linux to run on an i4004 is bona fide engineering. More real than engineering that we're paid to do most times. Looking at the list of Ig Nobel Prize winners it sounds like The Onion but not funny.
Many things which receive an Ig Nobel (but not all) are bona fide <thing> that just happen to be funny or have a particularly strong aspect (not necessarily in their entirety) of triviality. I'd be honored if I did a project like this that got enough attention and generated enough amusement to deserve an Ig Nobel instead of offended I'm on the same list as projects that weren't all genuinely representative works of the field.
Running a modern kernel on a 4004 just doesn't sound too far from running it on an incandescent light bulb or on a potato. No doubt sorcerer level engineering.
Running Linux on a computer? I could understand your comment if the display was something very non-conventional like a series of lightbulbs instead of digital output.
It's a cool project don't get me wrong, but Nobel Peace Prize? What is this thread? I would consider something like a strandbeest to be more award-worthy - but it too is just a neat invention, not contributing to world peace or anything near it.
There's the Turing award, which is an equivalent prize for computing. Could add an acknowledgement for strange and unusual applications of computer science.
It's not really the addressing modes, but the instruction format. Immediate values on RISC-V are not stored contiguously on certain RISC-V instructions.
On all MIPS instructions, the bits for a immediate add, load constant, branch, etc value are always stored in order.
On RISC-V, the bits are (sometimes) jumbled around. For example, on a unconditional branch, the bits for the destination offset are stored in the order of bit 19, bits 9-0, bit 10, bits 18-11. In hardware, reordering that is free, you just run your wires the right way to decode it. In software, you have to do a ton of bit manipulation to fix it up.
The reason RISC-V does that is to simplify the hardware design.
Well, lack of REG+REG and REG+SHIFTED_REG addressing modes handicaps it significantly. And no, it will not get magically fused by magic fusing fairies in your cpu
Not sure about GCC, but on clang it is trivial to enable it. And if you really want to (assuming you have the hardware) you could compile exactly the same code with and without it and compare how much exactly it is handicapped if those instructions are not there.
Plus, on RISC-V multiplication/division (about which you've complained) is optional, and there is also a variant of RISC-V with only 16 registers instead of 32 (also very simple to enable on recent versions of clang, although Linux would probably need some modifications to be able to run on that).
So I'm not entirely convinced that RISC-V would be worse here.
You’re replying to the author of the post explaining why he would have to do more in software to emulate RISC-V than MIPS and that would be more effort and run slower, and you’re telling him “that’s extremely subjective”?
I wonder if there is a simple chip, similar to the 4004, 8080, or Z80, that can run at modern high frequencies — 4–5 GHz, or even higher due to its simplicity? Not much of a practical use, but it could be fun for emulation projects. 100x slower with emulation - still fast enough for retro platforms.
Jack Ganssle’s The Embedded Muse covers tools and techniques in embedded development. In one article, he reported that 4-bit MCU’s were still competitive in some sectors:
I suspected we’d see either 4- or 8-bit return for neural networks. IBM did try the kilocore processor which was a massively-parallel, 8-bit system. Multicore 4- and 8-bitters are also the kind thing people could build on cheaper nodes, like Skywater’s 130nm.
Very impressive work, but most of the work has been necessary because Intel 4004 was not really the first microprocessor, this was just BS propaganda used by Intel to push back by one year the date of the launch of the first microprocessor, to 1971.
The first true (civilian) microprocessor was Intel 8008, in 1972.
Intel 8008 was a monolithic implementation, i.e. in a single PMOS integrated circuit, of the processor of Datapoint 2200, therefore it deserves the name "microprocessor".
The processor of Datapoint 2200 had an ugly architecture, but there is no doubt that it was a general-purpose CPU and traces of its ISA remain present in the latest Intel and AMD CPUs.
On the other hand, the set of chips that included Intel 4004 was not intended for the implementation of a general-purpose computer, but it was intended just for the implementation of a classic desktop calculator, not even a programmable desktop calculator.
This is the reason for the many quirks of Intel 4004, e.g. the lack of instructions for the logic operations, and many others that have increased the amount of work required for implementing a MIPS emulator suitable for running Linux.
Even if Intel 4004 was intended for a restricted application, after Intel has offered to sell it to anyone, there have been many who have succeeded to use it in various creative ways for implementing microcontrollers for the automation of diverse industrial processes, saving some money or some space over a TTL implementation.
In the early days of the electronics industry it was very normal to find ways to use integrated circuits for purposes very different from those for which the circuits had been designed. Such applications do not make Intel 4004 a true microcontroller or microprocessor. Very soon many other companies, and later also Intel, have begun to produce true microcontrollers, designed for this purpose, either 4-bit or 8-bit MCUs, then Intel 4004 has no longer been used for new designs.
I'm glad to see the Datapoint 2200 is getting attention, but by reasonable definitions of "microprocessor", the Intel 4004 was first, the Texas Instruments TMX 1795 was second, and the Intel 8008 was third. It seems like you're ruling out the 4004 on the basis of "intent" since it was designed for a calculator. But my view is that the 4004 is a programmable, general-purpose CPU-on-a-chip, so it's a microprocessor. Much as I'd like to rule out the 4004 as the first microprocessor, I don't see any justifiable grounds to do this.
Intel's real innovation—the thing that made the microprocessor important—was creating the microprocessor as a product category. Selling a low-cost general-purpose processor chip to anyone who wanted it is what created the modern computer industry. By this perspective, too, the 4004 was the first microprocessor, creating the category.
Your argument is that because the 4004 was built to power a calculator that disqualifies it as a microprocessor? Independent of the actual nature of the 4004 itself and its potential applications beyond its first intended use? Can’t see how that makes sense at all.
Your statement about Intel 'pushing back' the date to 1971 also makes little sense given Intel advertised [1] the 4004 as a CPU in Electronic News in Nov 1971.
Because of its purpose, Intel 4004 did not have many features that had been recognized as necessary already since the first automatic computers, for example the lack of logic operations, which was mentioned in the parent article.
Therefore I do not believe that it is possible to consider Intel 4004 as a general-purpose processor. It had only the features strictly necessary for the implementation of the Busicom calculator.
The idea to sell 4004 for other uses has appeared long after the design was finished, when Busicom did not want to pay for the chipset as much as Intel desired, so Intel decided to try to sell the chipset to other customers too, and then they thought to advertise it as a "CPU".
Moreover, it is debatable whether Intel 4004 can be considered as a monolithic processor, because 4004 was not really usable without the rest of the chipset, which provided some of the functions that are normally considered to belong into a processor.
The Intel 4004 4-bit "CPU" implemented less functions than the 4-bit TTL ALU 74181, which was general-purpose and which was the main alternative at that time for implementing a CPU, but it had the advantage of including many registers, because the MOS registers were much cheaper in die area than the TTL registers, and these included registers were the reason why a CPU implemented with the Intel 4004 chipset had a lower integrated circuit count than the equivalent implementation with MSI TTL ICs.
Intel's advertisement of 4004 being a CPU was just an advertisement of the same kind of Tesla having a "Full Self-Driving".
> Because of its purpose, Intel 4004 did not have many features that had been recognized as necessary already since the first automatic computers, for example the lack of logic operations, which was mentioned in the parent article.
So you're setting a range of instructions without which a device can't be considered a general-purpose computer even if the missing instructions can be recreated in software with instructions that do exist.
Sorry disagree with this completely, as does every definition I've ever seen.
No, logic operations aren't "recognized as necessary". For instance, the IBM 1401—the most popular computer of the early 1960s—did not have logic operations. (This was very annoying when I implemented Bitcoin mining on it.)
The reason that the 4004 is considered a CPU and the 74181 is not a CPU is that the 4004 contains the control logic, while the 74181 is only the ALU.
Of course, "microprocessor" and "CPU" are social constructs, not objective definitions. (For instance, bit-slice processors like the Am2901 were considered microprocessors in Russia.) So you can craft your own definition if you want to declare a particular processor first. cough MP944 cough
No kidding about unusual uses of ICs. Not related to microprocessors, but I have an old analog triple conversion HF receiver (Eddystone EC958/3 for what it's worth) that uses a TTL IC in an analog circuit! I'd have to look at the schematic again, I think it's a multi-stage counter, but basically what it uses it for is to generate a comb shaped spectrum, one "spike" of which can then be picked up by an analog circuit and locked to, to generate precisely spaced tuning steps for the high stability tuning.
I'd figure the earliest thing anybody has run Linux on before this would be a 386. Although I suppose with this MIPS emulator ported to some other proto-processors it could go older, but just getting the hardware to do that would be a challenge.
IMO, just because a microprocessor has a quirky instruction set that doesn't include some standard instructions, or was made for a specific purpose in mind, doesn't make it not a microprocessor.
I didn't know the guy but he clearly knows what he's doing, it's unbelievably entertaining to read the details of achieving an impossible task with the most underpowered tool possible.
I mean, it's fun and interesting bullshit that cheats a lot. I'm sure that you could emulate a MIPS using a one-bit processor like the MC14500[0] with enough supporting hardware, real or virtual. Looking forward to it, Dimitry.
At some point you will just need to offload the actual "processing" part to some nice old chap named Dave who has himself an abacus, and every now and then you send him a letter and he moves some stones and sends a letter back with the result.
I think you have invented the most cursed form of virtualized cloud computing where the memory is connected by a very slow network.
Bit of course you can solve this by sending letters speculatively, and having Dave keep copies of his letters back to you so he reference them for recently set values, that way you can at least keep Dave busy while the letters are in flight.
The funny thing is that the 4004 was used in emulation from pretty much it's beginning. It was used to make a calculator, the Busicom 141-PF. But the calculator couldn't be programmed in 4004 directly. 4004 ran virtual machine. The calculator was written in the virtual instruction set for this machine.
A chip like the 4004 could be used in some simple embedded applications using its instruction set directly. But that calculator was the original application was developed for. Already that was too complicated, not unlike, say, booting Linux.
I think the implication is that there is probably a much more reasonable way to do what I was trying to accomplish, and I didn’t do that. That is almost certainly true. This was my first time trying to film such an extreme time lapse so it is quite possible that I wasn’t aware of some tools designed for it
And the txr lisp virtual machine there is a t0 register, which is always nil, and that symbol can be used as an alias for it in the assembly. This was consciously inspired by the MIPS $zero.
A write to t0 is not simply ignored. It will generate an exception. Another MIPS-like thing is that t1 is an assembler temporary.
The CADR had a convention for the A and M memory (essentially what we would call registers today) for the Lisp Machine microcode where the first 32 locations had special meaning. Location 2 was for constant 0, and 3 for constant -1.
Very impressive project. Recording it with a phone seems complicated though. Why not a Raspberry Pi with camera module? Or a decent webcam and any computer?
not that I found. And it makes sense since without MHz-fast 4002 and 4289, it is useless. Plus, swinging a bus from -15V to 0V at MHz speeds will take quite some drive current.
Windows ran on a similar MIPS machine (Microsoft jazz). The issue is emulating scsi. I think I’d need a lot more rom space to do that. Scam is messy and hard.
The alternative is to find the Windows MIPS DDK and build a paravirtualized disk driver for it like I did for Linux. That would make it more doable.
Ha. Jazz a dim distant memory but methinks it's concept is ~20 later than the 4004. Just thought about emulating/writing SCSI on the 4004, might be easier to climb Everest.
Perhaps we need a competition, assembly programmers only need apply. :-)
> But for the one I'll have hanging in my office, I have loftier goals. With swap enabled, the kernel sources can actually be built right on-device. It will take some number of years. The partition where the kernel lives is /dev/pvd2 and is mounted under /boot. The device can build its own kernel from source, copy it to /boot/vmlinux, and reboot into it. If power is interrupted, thanks to ext4, it will reboot, recover the filesystem damage from the journal, and restart the compilation process. That is my plan, at least.
One, it reminds me of that "worlds longest song" or somesuch thing, where they play a note every 10 years.
The other is just a picture of someone, asleep at their desk, a pile of calendars with days checked off tossed to the side, random unwashed mugs and such all dimly lit by a desk lamp and see the `$ make linux` finally return to an new, unassuming `$` prompt. Like Neo in the Matrix.
Yes. I have an emulator of this board (it is in the downloads too) which is much faster than the real thing. It shows how much realtime is needed to get to the current state. Doing a build in it will answer the question unequivocally.
It illustrates a point I've explained to many people over the years: once computers started coming with persistent storage, open address spaces and MMUs towards the late '80s and early '90s, we basically arrived at modern computing. An Amiga 3000 or i80486 computer can run the same things as a modern computer. Sure, we have ways to run things orders of magnitude faster, and sure, we now have things that didn't exist then (like GPUs that can run code), but there's no functional difference between those machines and new ones.
I love that Dmitry shows how loosely "functional" can be defined :)