The first million-transistor chip: The engineers’ story

tasty_freeze · on July 4, 2022

In the early 90s I worked on a project that used the i860, two of them in fact. It was a terminal to display 3D graphics via the PEX protocol (https://en.wikipedia.org/wiki/PHIGS)

One of the things I recall about the i860 is the interrupt handler was a disaster. It was possible for it to take an interrupt in a state where one couldn't just reload the state and resume. Essentially the interrupt handler would have to inspect state for the problematic condition, then simulate the instruction stream until it got back to a state which could be restored to the CPU.

I was a hardware designer on the project so I didn't actually touch that code myself, but one of the software guys passed that on. I apologize to the i860 team if the story is overblown. Double checking this lore, wikipedia says a context switch took a minimum of 62 cycles and a maximum of 2000 cycles.

https://en.wikipedia.org/wiki/Intel_i860

iasay · on July 4, 2022

It's about right. I was involved in a project to get rid of an i860 design (aerospace) and replace it with a PPC design. No one had anything nice to say about 7 years of keeping an i860 design alive.

jacquesm · on July 4, 2022

Oh that sounds like a hardware bug to me, was that the intended behavior?

jecel · on July 5, 2022

The i860 introduced many new ideas, some that worked well and were copied and others that turned out to be unfortunate. Using wide datapaths in a single instruction/multiple data (SIMD) scheme for graphics got copied by Sparc and then the x86 and others, for example.

A selling point was the floating point performance of a Cray 1, far above what other microprocessors could do. Pipelining was a key to this and, taking the RISC philosophy seriously, it was exposed to the software. The compiler had to schedule the instructions so that they didn't try to use a result from a previous instruction until the clock it finally came out of the pipeline.

The problem of an interrupt is that the pipeline is full of unfinished calculations. If you know you are going to return to the same task, you just freeze the pipeline, don't using floating point inside the interrupt and then restart everything when you return you restart everything.

If you are going to switch to a different task instead (Unix) then this is a really bad design. The kernel will have to manually flush the floating point pipeline to somewhere, then manually restore the pipeline for the new task and then return to it. The sequence to flush and restore the pipeline is really tricky and slow.

Is it intended? In a way, yes. High performance was the highest priority for this project. If you are switching your processor among a bunch of tasks then you really don't care about performance that much, do you? Running Unix on a supercomputer should be possible, but it doesn't have to be good. As long as when you focus on a single program it goes amazingly fast.

jacquesm · on July 5, 2022

Thank you for the explanation Jecel, that makes it look as though it was always destined to be in the 'coprocessor' domain.

How is the Smalltalk-to-hardware project coming along?

jecel · on July 5, 2022

I would say it was intended as the main processor for supercomputers. I did a design like that (though it ended up actually built with old 68020s we already had in stock to save money) and there were many others. Intel itself had some interesting machines.

But Intel realized that asking customers to jump from x86 to an incompatible architecture (either the i860 or i960) was too risky - they could just as well jump to a competitor's architecture instead. So they repositioned their RISCs as coprocessors. It is amazing that they forgot this not too many years later with the Itanium.

My Smalltalk processor project is going very well, though obviously this is not the place for details.

lisper · on July 4, 2022

This story omits any mention of the i860's predecessor, the i960:

https://en.wikipedia.org/wiki/Intel_i960

The i960 was a commercial success, the i860 was not.

I used an i960 in my first startup back in the early 90s:

https://flownet.com/gat/fnlj.html

It was an absolute joy to work with, one of the most beautiful processor architectures I've ever seen.

The i960 didn't quite have a million transistors, but it came close, with the high-powered CF version having 900,000.

https://micro.magnet.fsu.edu/optics/olympusmicd/galleries/ch...

To put these numbers in perspective, an M1 pro has 33 billion (with a b) transistors, so the equivalent of 33,000 i960s.

jacquesm · on July 4, 2022

That's a very impressive piece of work. I've worked on a (much!) simpler SDLC card and wrote the firmware for it (someone else did the hardware), it was already quite a chunk of work to get that up and running and stable enough to process production data with. Eventually only several 10's of these were built but they worked until the whole system was decommissioned on account of obsolescence (more than a decade later).

What you've built there would run circles around what I was involved in. Does the hardware still exist?

lisper · on July 4, 2022

Thanks. Yes, I was pretty proud of what we built. We had two orders of magnitude price-performance advantage over the state of the art for about a year, but before we could get any significant market share, gigabit ethernet came along (and the price of fast ethernet dropped) and that wiped us out. But it was fun while it lasted.

The hardware kind of exists. I have two prototype boards in my closet but they haven't been powered on in 30 years. I also don't know where the code for the device drivers is any more, though I have a box full of old hard drives that probably has it somewhere. Maybe some digital archeologist will dig it up some day.

To give proper credit where it is due, the idea and hardware design for Flownet were the work of Mike Ciholas, who went on to found a very successful hardware design company [1]. I wrote the device drivers and did the marketing, which is probably one of the reasons we failed. Turns out I'm terrible at marketing.

[1] https://www.ciholas.com/

jacquesm · on July 4, 2022

Well, there are probably a couple of old timers on HN who really appreciate the kind of skill that it took, even though it doesn't show in your bank balance.

It also makes me very grateful for the magic that goes on behind the scenes whenever I plug in a high speed USB device and it 'just works', the kind of wizardry involved for this sort of thing is highly underappreciated.

lisper · on July 4, 2022

Very true. Ironically, my career has come full-circle and I am now working for Intel doing chip design (actually working on developing tools that do chip design). The process of producing modern state-of-the-art chips is truly mind-blowing.

jacquesm · on July 4, 2022

I watched that 'indistinguishable from magic' video and that is indeed the only appropriate way to describe it, and if our base tech would not improve from this point forward I'd say that is a job well done.

But I also have a soft spot for the GA144, which represents the other extreme, it's what one man can do versus what a whole team of talented engineers can do.

yvdriess · on July 5, 2022

Judging by your alias: Are you the guy with Lisp EDA tools?

lisper · on July 5, 2022

It's the guy I work for, not me, but yes.

mepian · on July 5, 2022

Intel is actually using Lisp in its tools? Can you tell more about your job?

lisper · on July 5, 2022

> Intel is actually using Lisp in its tools?

Yes.

> Can you tell more about your job?

This is my boss giving a talk about the tool:

https://www.youtube.com/watch?v=oGQd-suLvzQ

It was developed at a startup called Barefoot Networks, which was acquired by Intel a few years ago.

What else do you want to know?

mepian · on July 5, 2022

Is your work related in any way to Symbolics NS, beyond using the same language? Maybe you're reusing some public knowledge from the papers that were published about it?

Is your team hiring? I'm working at Intel, though far from hardware design - in the MPI library's team. In my spare time I learned Common Lisp and some basics of hardware design, so I would be happy to make my hobbies relevant to my job and work with you.

lisper · on July 5, 2022

Ping me on Teams.

silasdavis · on July 4, 2022

The preamble in that flownet document is a wonderful description of busy wires and switched networks.

lisper · on July 4, 2022

Thanks. CSMA/CD is a thing of the past. All networks are switched nowadays. FlowNet was among the first LAN designs to be exclusively switched.

jecel · on July 5, 2022

As far as I know the i960 was the follow on to the iAPX432 (a joint venture with Siemens called BiiN) and not at all related to the i860.

This led Intel to have three options for the future in early 1990s: x86, i860 and i960. They decided to bet on the x86 and moved the i860 to the graphics card and the i960 to the smart i/o cards.

adrian_b · on July 5, 2022

It is probably not right to say that BiiN was in any way a follow on the iAPX432.

It might have inherited a few ideas from iAPX432 and maybe also some designers, but otherwise it was a very different architecture, whose development was obviously triggered by the much publicity about the RISC advantages. At the same time, most companies involved in computers had concurrent RISC development programs, e.g. IBM, ARM, HP, AMD, Motorola, Fairchild, DEC.

I have not seen yet any document that would explain why Siemens has joined Intel into the BiiN project, what was Siemens expecting from the project and how the Intel and Siemens contributions to the project were split.

In any case, in 1988 Siemens has chosen to exit the BiiN project leaving Intel as its sole owner.

After renaming BiiN to 80960 (Intel had an 8096 series of 16-bit microcontrollers, which were supposed to be replaced by 80960), Intel has introduced the first 2 products based on it before the end of 1988, and in 1989 they have introduced additional variants.

80960 included many innovations, they were the first RISC ISA designed by Intel and already the 1988 products have been the first monolithic CPUs having the atomic fetch-and-add instruction (which had been invented in 1980/1981 for the NYU Ultracomputer project). The atomic fetch-and-add instruction was later included in the Intel 80486 instruction set (with the mnemonic XADD).

One of the 80960 variants introduced in 1989 (80960CA) was the first monolithic superscalar CPU, one year before IBM POWER. The first superscalar design had been the IBM ACS research project (1966), but the word "superscalar" has been coined only in 1987, by the team designing IBM POWER. In this case Intel has been very quick to include the results of published research in their design, even quicker than those who published them (but obviously, IBM POWER was a far more ambitious project, with CPUs for scientific workstations having a much higher performance than 80960CA).

After learning how to implement them in 80960, the more important innovations have been included by Intel in their mainstream CPUs, 80486, then Pentium (first mainstream superscalar).

jecel · on July 5, 2022

While the 432 was a memory-to-memory CISC and the 960 a classic RISC, I do think they have a lot in common technically. A key difference is that the 432 uses positive/negative offsets to separate raw data and capabilities while the 960 had an optional bit 33 to do the same thing, which makes it far simpler to mix data and capabilities on the stack.

The "operating in hardware" is microcode in the 432 but RISC-friendly in the 960, but it is still there.

I am talking about the original 960MX here. Most of these features were dropped on the following 960 models, if I understood correctly. Those do not indeed have much in common with the 432.

adrian_b · on July 6, 2022

I agree that the mechanism of implementing memory protection by capabilities was inherited by BiiN from iAPX432, but as you have said, after Intel has renamed BiiN to 80960 and they have changed its intended market from general-purpose CPUs to 32-bit microcontrollers, competing there with the older 16-bit MCUs or with 32-bit MCUs like AMD 29000, such high-level features have been dropped.

Nowadays there are attempts to resurrect the use of the memory tagging method for memory protection, e.g. the Cambridge CHERI research project, which has been implemented by ARM in their Morello demonstrator board.

Like the original BiiN/80960, CHERI uses an 1-bit memory tag for each 128 memory bits, to differentiate raw memory from capabilities.

drmpeg · on July 4, 2022

I was also a big fan of the i960, and did many designs with the i960CF and I960RP. One of the i960CF based MPEG-2 encoder PCI cards from when I worked at Optivision in the 90's.

https://www.ebay.com/itm/184183208245

cjsplat · on July 4, 2022

Processor in the Intel iPSC/860 and Paragon - early products aimed at massively parallel processing.

These provided software seeds for the later Virtual Interface Architecture and hence iWARP, Infiniband and others.

The 860 interrupt overhead and non-determinism may have been the critical items that forced commercial productization of direct user mode network fabrics.

I remember conversations at the time about the total S/W overhead for TCP/IP, and OS people forced to develop for i860s talking about the system level pain they had to get in and out of the TCP fastpath.

drmpeg · on July 4, 2022

I worked with Les Kohn at C-Cube Microsystems for a very brief time before he left the company in 2000. While he was there, he had completely redesigned the C-Cube MPEG-2 encoder architecture. The previous architecture required a giant board (12U VME) with 12 processors to encode an SD (720x480) image. His design only required one chip.

The design was based on a microSPARC core with many special processing units (motion estimation, DCT/IDCT processing, etc.) glued on.

mepian · on July 4, 2022

This is the chip Windows NT (the main ancestor of today's Windows) was initially targeted for, the name NT was derived from the chip's codename N10 (or N-Ten).

ivegotnoaccount · on July 4, 2022

Wait, wasn't Windows NT ("WNT") a pun on Vax MicroSystems ("VMS") à la "Windows is Better than Vax" since WNT VMS plus 1 at each letter?

That's the story In a always heard.

my123 · on July 4, 2022

Both are true. It's a multi-level pun.

dboreham · on July 5, 2022

Virtual Memory System

macintux · on July 4, 2022

Probably should have (1989) in the title; virtually all of the text was printed then.

axus · on July 4, 2022

I liked the story, had not heard of this chip before. Wikipedia gives the rest of the story: https://en.wikipedia.org/wiki/Intel_i860

JPLeRouzic · on July 5, 2022

I have a question (I know nothing about the domain):

In the article they say that the die should have 450 mils length at most.

This is 11.43mm (in SI units) so it's a surface of 130.6449mm2

For one million transistors, this gives an area of 0.0001306449mm2

So if it was a square it would have an edge of 0.01143mm or 11.43 μm.

Wikipedia says that the size of a I860 transistor at the time was from 1 to 0.8μm.

Indeed the connections between transistors take much more area than the transistor itself, and power connections are much larger but the discrepancy seems still large between 1μm and 11.43 μm. Please enlighten me!

adrian_b · on July 5, 2022

"1 to 0.8μm" was not the size of a transistor, but the width of a line drawn on the die, presumably the half-pitch of the polysilicon patterns that made the gate of the MOS transistors.

To draw a MOS transistor, you have to draw a source, a drain and a gate, each having a width of "1 to 0.8μm", and you must have spaces between them, about as large as their width.

In most transistors, you cannot use a source, a drain and a gate of minimum width, because you need contact windows, i.e. holes through the insulating layers, to connect the inputs, outputs and power supplies of a gate. Where a contact window is placed, which is a square with a "1 to 0.8μm" side, the source/drain/gate traces must be wider, to ensure that the hole remains under or over them even when successive masks are slightly misaligned.

If you draw a MOS transistor of minimum size, taking into account all the design rules, its area might be 10 to 30 times larger than the area of a square of minimum size. A large part of the transistors in a CPU are larger than the minimum size, to ensure a high enough speed for them. In old CMOS processes, like for 80860, the PMOS transistors were at least twice wider than the NMOS transistors, to decrease the PMOS "on" resistance to values similar to that of the NMOS transistors.

Moreover, in a CPU, especially in old CPUs like 80860, i.e. at a time when they used much fewer metal layers than in modern chips, large areas might have contained no transistors, because they were covered by internal buses, i.e. they were used for routing many metal interconnections, leaving no place for transistors.

So there are a lot of reasons why the average die area per transistor, especially in so old CPUs, must be much larger than the area of a square of minimum size.

Therefore the values computed by you seem very reasonable.

JPLeRouzic · on July 5, 2022

Many thanks for the detailed explanation!

ggm · on July 4, 2022

I think a few xterminal manufacturers targeted this chip. Ncd maybe? Or Labtam/Tektronix

tgflynn · on July 4, 2022

So whatever happened to this chip. It seems like a million transistor 64 bit CPU would have been revolutionary in the mid-80's, but I've never even heard of it.

rwmj · on July 4, 2022

Byte magazine covered it: https://archive.org/details/byte-magazine-1991-01/page/n398/...

frozencell · on July 5, 2022

Was there a link between Intel and Bell Labs?

snek_case · on July 5, 2022

The transistor was invented at Bell Labs by William Shockley, John Bardeen and Walter Brattain in 1947.

William Shockley founded Shockley semiconductors. Eight researchers working there, known as the "traitorous eight" left to form Fairchild Semiconductor. Engineers at Fairchild are responsible for many key innovations in semiconductor manufacturing processes.

Gordon Moore, head of R&D at Fairchild left to form Intel in 1968.

... So you could say that Bell Labs indirectly birthed Intel :)

https://en.wikipedia.org/wiki/History_of_the_transistor

https://en.wikipedia.org/wiki/William_Shockley#Career

https://en.wikipedia.org/wiki/Fairchild_Semiconductor

frozencell · on July 5, 2022

Thanks for the reminder. For fun, with this reasoning we could also say Napoleon III indirectly birthed Bell Labs since he funded Bell’s Volta lab?

baybal2 · on July 4, 2022

The first million transistor chip was such a big thing, but the first trillion transistor one didn't even get a press release.