What happens when a CPU starts

billti · on Jan 11, 2023

Amongst the first sentences...

> It may be thought of as what happens when a whole computer starts, since the CPU is the center of the computer and the place where the action begins.

I thought that too. Last year I spent a while getting as low-level as I could and trying to understand how to write a boot loader, a kernel, learn about clocks and pins and interrupts, etc. I thought, "I know, I'll get a Raspberry Pi! That way even if I brick it I didn't waste too much money".

Turns out the Raspberry Pi (and I'm guessing many other systems) are pretty confusing to understand at boot time. For one, it's the GPU that actually does the initial boot process, and much of that is hard to find good info on. (https://raspberrypi.stackexchange.com/questions/14862/why-do...)

I spent many many hours reading various specs & docs and watching tons of low-level YouTube videos. Compared to software development higher up the stack (my usual area), I found the material surprisingly sparse and poor for the most part. (Maybe that's reflective of the size of the audience and the value in producing it).

Gigachad · on Jan 11, 2023

Yeah modern hardware is crazy hard to understand and a lot is proprietary/trademark stuff like you saw on the rpi. Arduino is a good start but it's just so low level that it doesn't get you much further to understanding how a modern system works.

One slightly weird but fascinating path I have been playing with is writing programs for old game consoles, particularly the Nintendo DS. You can get full and comprehensive hardware documentation for these consoles and there are easy ways to run your own code on them now as well as libraries/tooling around it. But they run no OS, your program runs directly on the hardware, so you get a good feel for low level programming while not being down to the level of atmel chips.

It can be a little hard to work out how to get started but it's really as simple as setting up `libnds` from devkitpro and then either hacking a dsi to run the twilight firmware, or buying a cheap flashcard from ebay to run your own programs. Read the example programs from the devkitpro github and some posts on the hardware and you'll get the hang of it.

tmtvl · on Jan 11, 2023

writing programs for old game consoles,

Oh yeah, like the Intellivision.

particularly the Nintendo DS.

Oh come on, the DS ain't that old.

markus_zhang · on Jan 11, 2023

Never done anything but I think the gameboy family is a good choice. Unlike NES or SNES which use assembly heavily, we can use C for the GBA. There are also great communities devoted for them.

Agingcoder · on Jan 12, 2023

The nes is a simple but extremely elegant machine. Assembly isn't a problem, since the cpu is very simple, I recommend playing with it.

markus_zhang · on Jan 12, 2023

Definitely. I guess one can climb the ladder by first programming the NES and then SNES and just go up, maybe even learn some Japanese to immerse in the life of a Japanese game developer in the 80s/90s. Could be a long but interesting side project.

saagarjha · on Jan 11, 2023

You can, but it’s pretty lacking in registers.

snvzz · on Jan 11, 2023

It's still not a PIC12.

hhh · on Jan 11, 2023

20 years next year.

mrcheesebreeze · on Jan 11, 2023

I haven't even reached 30, yet that comment of yours made me feel old.

Gigachad · on Jan 11, 2023

It's probably more confusing since the DS had several iterations, first one in 2004, then the lite in 2006, DSi 2008, DSi XL 2009. And then the 3DS having mostly the same form factor probably made the design feel new for a lot longer.

Dylan16807 · on Jan 11, 2023

It did launch in 04/05, but it's only two console generations older than a switch.

oharapj · on Jan 11, 2023

Considering the switch is at the end of its lifespan, I’d say that’s pretty old. It’s like saying 7 years into the PS3 lifespan that the PS1 isn’t old. It’s old

MuffinFlavored · on Jan 11, 2023

> But they run no OS, your program runs directly on the hardware

How is threading/time scheduling/interrupt handling/context switching usually done?

toast0 · on Jan 11, 2023

I dunno about the DS, which doesn't really qualify as that old to me; it's got a pretty decent sized rom for when you don't have a cartridge in, and I'd guess the SDK gives you something approaching an OS, maybe even with a threading library.

But on real old hardware, you're not going to run threads or a scheduler, you're going to run one iteration of your game loop, then wait for a sign that it's time to do your graphics work (after VBlank, during the screen scan on Atari 2600, mostly during VBlank on more generous platforms). If the platform doesn't have many interrupts, it probably has a fixed address for the interrupt vectors and your rom would cover that address/those addresses; and there's probably a fixed address that execution starts at too or it's one of the elements of the interrupt vector table.

Context switching isn't really that hard anyway --- call into (or get interrupted into) a routine that saves registers to the stack, then saves the stack pointer for the current task, restores the pointer for another task, pops the registers from the stack and returns.

There's not usually any sort of memory protection between tasks in a game, but it's assumed you kmow what you're doing.

im3w1l · on Jan 11, 2023

One thing I've always wondered about is what if you get interrupted again while you are halfway done saving registers to the stack? Like I heard something about disabling interrupts during sensitive operations like that, but wouldn't that then risk missing an event entirely instead?

Gordonjcp · on Jan 11, 2023

There's a flag set in hardware when /INT is asserted. At this point you can smack /INT around all you want, you're not getting any more interrupts firing.

Once you're done you clear the flag, either explicitly in the CPU's "condition code register" in some chips or by using a specific "Return from Interrupt" opcode that works like a "normal" subroutine return but clears the flag.

We use the analogy of a doorbell to describe interrupts a lot. In truth it's like if your doorbell could only be rung once, until you open and shut the door.

noselasd · on Jan 11, 2023

One approach is to disable interrupts during these sensitive tasks yes.

Once you enable interrupts again, the interrupt controller will trigger an interrupt if one occured while it was disabled, so it works out. All systems work differently though, some can't queue interrupts, meaning if 2 or more interrupts occured while disabled, you will lose those interrupts. And if interrupts can be queued, there's a finite length of the queue.

Another approach is that the interrupt handler will save the current state of the world and restore it again afterwards - meaning if you're halfway through saving registers to the stack, you just continue as nothing happened, and save the rest of the registers. Note that you can't be interrupted in the middle of a single instruction.

Ofcourse that puts the problem one level deeper, what if an interrupt handler gets interrupted by another interrupt while it's saving the state of the world. On some system this can nest (up to a point where things just breaks badly), on others or depending on the interrupt type, you need to disable all or certain other interrupts again.

This works out when you get all your code/ducks in a row, which is a lot of hard work, sweat, reading highly low level and technical documentation, sometimes trial and error as devices/documentation may be lying, buggy or absent.

Most of us sit on top of an OS kernel where all this is thought about and handled over many years or decades, or can use an existing kernel or libary even if working on a more bare-bones and simpler systems and embedded systems, and we should thank the people that makes all of this work out.

toast0 · on Jan 11, 2023

Well said, one thing to add is it's not uncommon to have the CPU disable further interrupts as part of the automatic handling. Or it may be configurable, it's often not something the interrupt service routine needs to explicitly do; although, sometimes it is, there's so many options.

If you're on a system where interrupts are disabled automatically, and you actually do want re-entrancy sometimes, you can usually make that happen too, but you might wait until you get to a safer place (maybe switched to a kernel stack or ??? again, so many options)

yetihehe · on Jan 11, 2023

If you are interrupted by another higher priority handler, that handler ALSO saves and restores registers, so whatever was interrupted doesn't even know. the registers after interruption should have the same vaalues as before interruption, so whatever was saving, is saving properly. Typically the same interrupt is not called again while already servicing a signal. Interrupts are handled by flags, so when there are several flags raised at once, interrupt handlers of the same priiority run one after another and each one of them should clear it's own flag (sometimes it's done automatically by processor), often just at the start of routine. If then another interrupt sets the flag before routine finishes, routine will restart after finishing.

swuesqe · on Jan 11, 2023

When interrupts are disabled events that occur during execution are usually buffered to the next possible moment.

MuffinFlavored · on Jan 11, 2023

> In a typical Windows 10 installation with many background processes and services, the CPU context switching rate can vary greatly depending on the specific system's hardware configuration, running processes, and workload. However, on a typical system, the context switching rate can be anywhere from a few hundred to a few thousand times per second.

Would you agree with this statement from ChatGPT? Is the Windows kernel handling thousands of context switches and time slicing processes the way you described? pushad/pushfd + popad/popfd

toast0 · on Jan 11, 2023

Yeah, more or less; mostly more. pusha/pushad is one of those instructions that sounded good, but isn't used much (it became invalid in amd64), windows will push the registers one at a time, and maybe FPU, MMX, SSE, etc registers; of course, that's a lot of extra pushing, so there's strategies to avoid it if the thread doesn't use them. If you switch to a different task, you're going to need to load its page tables, and these days you've gotta flush a bunch of caches to avoid Spectre (although you shouldn't avoid the Spectre game from the 90s, that was nifty).

If you're good at Windows, you can probably get a count of context switches per second on your system, with your load. Context switches generally includes interrupts as well as calls into the kernel from userspace. A server work load is going to go up to hundreds of thousands, maybe millions per second, again depending on your load.

TedDoesntTalk · on Jan 11, 2023

This seems wildly inefficient. Can’t we have multiple sets of registers? Not millions, but…

Are registers expensive in hardware? Why not have loads of them?

saagarjha · on Jan 11, 2023

Many processors do have this and expose it to programmers, it’s called “register windows” (though usually used for procedure calls). Or you can have banked registers, which often serve different privilege levels. Once you start looking at the microarchitectural level, you’ll find that modern processors have large register files and rename them into the architecturally visible ones.

toast0 · on Jan 11, 2023

Registers are expensive, yeah, but pushing them onto the stack isn't the most expensive part of a context switch on a modern cpu anyway. Switching the page table, and blowing away the TLB is. Pushing all the registers is some nice sequential memory activity and the stack area is frequently accessed and unlikely to have contention, so it's easy to cache (other than one or the other of push or pop has to go backwards, so you better have predictive access in both directions)

cesarb · on Jan 11, 2023

> windows will push the registers one at a time

Wouldn't Windows (and Linux) use the FXSAVE instruction instead?

toast0 · on Jan 11, 2023

Probably FXSAVE or XSAVE for the mathy registers, yes. But that doesn't cover the general purpose registers, and (F)XSAVE can be skipped if the process in question doesn't use fancy math (easy to detect, disable it, when the process uses it, the kernel will catch the fault, then enable it and set a flag on the process so it saves and restores that state as well)

DonHopkins · on Jan 12, 2023

Please stop using ChatGPT to write your comments. Nobody here is here to have a conversation with ChatGPT, and anyone who wants to talk with ChatGPT instead of actual human beings can do that privately without polluting the conversations of real human beings.

retard91949 · on Jan 12, 2023

I think it would only be problematic if he was pretending that what ChatGPT said is what he said. Instead he asked a question to verify if what he found is true (same way you would do with other online sources).

msk-lywenn · on Jan 11, 2023

No threads. Well, there’s two CPUs so you can count that as two threads. They run completely independently, operating on two different codes with different instruction sets. (ARM9 and ARM7TDMI)

I don’t know what you mean exactly by time scheduling. There are several timers that you can configure and you can set them to raise an interrupt when they finish. They can restart automatically.

There is an interrupt vector for both software and hardware interrupts. Software ones are raised by the swi assembly instruction. Hardware ones are raised by the aforementioned timers but also the display (vertical and horizontal blank), the sound system, wifi, etc. They can be enabled/disabled by setting a specific bit in a specific memory location (IE interrupt enable). Your interrupt handler is supposed to restore registers and clear the interrupt flag at the end.

Context switching is done manually.

I think libnds has an implementation of software threads. I don’t know how they work.

Gigachad · on Jan 11, 2023

I'm still digging in to the details so I can't answer perfectly. But as far as I can tell. There is no threading or processes, it's more like a microcontroller. I don't think there are any interrupts for things like inputs, you have a main loop and you are expected to poll the inputs frequently. I suspect there are timers which can interrupt like you'd get on a microcontroller though I haven't tried this yet.

eismcc · on Jan 11, 2023

Just looked in eBay and had no idea there were so many DS systems available.

Gigachad · on Jan 11, 2023

The console sold 154 million units. And since it’s so old, they are all finding they way to eBay. It’s honestly more interesting to play with and easier to buy than a raspberry pi.

867-5309 · on Jan 11, 2023

wondering if they meant variants

retrac · on Jan 11, 2023

As early as the mid-70s, minicomputers started including microcontrollers to manage the boot environment. I had a Sun machine once that included what they called Lights Out Management. A little microcontroller that had control over the system power and etc. Always available via its dedicated serial port, even when the machine was shut off. Everything is like that now. A smartphone will have multiple processors. Some doing IO interfacing. One to manage the battery. The radio hardware will have a general-purpose processor.

Most of these processors are fixed-in-ROM sort of machines, so the story for booting them individually is pretty simple. Much like a late 20th century PC, when switched on (either by the power supply or by another processor), start running BIOS code from the hardwired start address. Some need to have more software transferred to them after that.

Modern machines are really networks of computers in themselves. Networking and bringing all these parts together to support the main processors, at the low level, is not only poorly or completely undocumented, but it's probably impossible for one person to fit it all in their head these days.

8bitsrule · on Jan 11, 2023

True. It's like what's happened with electronics, radio, video in that way. A solid grasp of the (always-present) basic components, and basic tech and design philosophy, goes a long way. It's like learning a language.

So easily-understandable articles like this are essential for beginners, along with a short list of masterful books ( e.g. those by Forrest Mims, for electronics) and playing with physical components! The rest is the endless variations, but they're all speaking that language, 'cuz the laws don't change.

rmckayfleming · on Jan 11, 2023

A recurring question I have is: how many microcontrollers/CPUs are in a modern personal computer? There are clearly a lot, but just how many?

tenebrisalietum · on Jan 11, 2023

- the embedded controller (EC)

- the CPU core in the chipset that runs the ME/PSP

- if the TPM is not an fTPM, it has a CPU I'm sure.

- if your NIC has offload engines, it has a CPU or two.

- each storage device has a CPU.

- each Wi-Fi device has a CPU.

- thunderbolt controller takes firmware, it has a CPU. I'd bet USB3 and 4 do too.

- any USB device has a CPU on the other end accepting and interpreting commands.

- same with any SCSI device.

- monitors have a CPU or two, one for the OSD settings and another to drive the display I'm sure.

- I think any nVidia or AMD graphics card has a CPU in there (in addition to the GPU).

- The following portable media has microcontroller firmware: SD cards, Memory Stick. (The now defunct SmartMedia and Olympus XD were raw NAND).

- your optical mouse has a CPU as well to process optical data.

- obviously printers have CPUs and firmware, probably separate ones for the web UI and the part that drives the actual print mechanism, and I'd bet a separate one for scanning and image processing.

- any keyboard has a microcontroller and firmware (there are open source keyboard firmwares)

- SIM cards for cellular connections have their own CPU and firmware.

Dork1234 · on Jan 11, 2023

Apple laptop chargers had a 16bit MSP430 microcontroller in them, the same performance as original Macintosh.

Apple Thunderbolt Cables have a ARM chip on each end of the connector.

Every Intel CPU has atleast one smaller x86 core that isn't visible to the user that is running MINIX.This CPU is responsible for MicroCode updates.

Gordonjcp · on Jan 11, 2023

> obviously printers have CPUs and firmware

Famously Apple's first laser printer had a faster 68000 than the Mac it connected to.

uticus · on Jan 11, 2023

forgot memory controller

https://en.wikipedia.org/wiki/Memory_controller

tenebrisalietum · on Jan 11, 2023

> Memory controllers contain the logic necessary to read and write to DRAM, and to "refresh" the DRAM. Without constant refreshes, DRAM will lose the data written to it as the capacitors leak their charge within a fraction of a second (not more than 64 milliseconds according to JEDEC standards).

Do they use CPUs now?

I did hear that IBM was developing serial RAM (not NVRAM) with an onboard controller on the memory modules - with the need for firmware. Beyond that I didn't think memory controllers ran instructions from ROM or other instruction storage like a CPU.

As far as flash memory, they definitely use a CPU and that's what I meant by "storage" in my list. :)

wiml · on Jan 11, 2023

If you want something you can almost fully understand, I'd go another step or two lower, to a microcontroller like an AVR or one of the old Motorola chips (68HC11 or something). These chips were actually designed by hand and expected to be programmed by hand, and their documentation reflects this. They also ave much less hidden microarchitectural state than a modern CPU.

Once you're familiar with that move to a more modern microcontroller like an Arm Cortex-M0, and after that maybe something with off-chip memory, a MMU, etc.

LeifCarrotson · on Jan 11, 2023

Those 8-bit controllers with (exclusively) on-chip SRAM and Flash are indeed quite simple - they can be summed up as "Jump to 0x0000", but really are surprisingly complex once you get into bootloaders and interrupt service routines and so on.

The proprietary blobs and GPU on the Raspberry Pi make basically impossible to have a full understanding of what's happening. Instead, I'd recommend learning with the TI Beaglebone Black, which has an ARM Cortex-A8 with an MMU and lots of open documentation.

userbinator · on Jan 11, 2023

Look at the IBM PC/XT/AT if you want something more documented. IBM supplied schematics and even BIOS listings for those machines, the newest being a 286 (AT). Thanks to doing some BIOS RE work many years ago, the address F000:E05B still remains in my mind...

pcwalton · on Jan 11, 2023

> For one, it's the GPU that actually does the initial boot process, and much of that is hard to find good info on. (https://raspberrypi.stackexchange.com/questions/14862/why-do...)

x86 is similar, since Intel ME (part of the PCH, whether in the chip or not) is needed to boot the CPU.

ajross · on Jan 11, 2023

The PMC also needs to be online before the main cores. And many machines there's still an external EC on the board responsible for sequencing power state external to the chip. The x86 application cores actually start up quite late in the process, long after the SPI has been read out, memory and cache controllers initialized, etc...

ilyt · on Jan 11, 2023

Yup, in bigger systems you also often need to initialize/train DRAM which means your first code essentially runs off L2/L3 cache

Here is a presentation about open firmware with a lot of stuff about boot process: https://www.youtube.com/watch?v=fTLsS_QZ8us

trissylegs · on Jan 11, 2023

I think OpenPower has a fully open bootloader. But it's not exactly cheap to try.

Those systems startup into "Cache Contained" mode. Where the Boot ROM is copied to CPU Cache and there's no main memory yet. The code in the ROM has to initialized main memory before it can use it.

P_I_Staker · on Jan 11, 2023

This is very common in embedded systems, and honestly I'm not surprised to see this, even with projects like Raspberry Pi... although, I do think it's a big shortcoming for a project like that.

In a "real project" you would be relying on your suppliers and other teams / colleagues for many of the "hard questions". Chances are you'd have a contact with the silica manufacturer, just as a major example.

There would be a boot team, or sometimes just one boot developer. If anything goes wrong, or you want to change anything, you would be pretty helpless without them. You can go to the various specs and code, but this can be quite in depth if you're starting from scratch.

Change project or micro? Chances are that all goes out the window, and you have to start from scratch.

c0wb0yc0d3r · on Jan 11, 2023

Would you mind sharing some of your favorite youtube videos that you've come across?

shash · on Jan 11, 2023

Chiming in with everyone else - RPi is way too complex.

But another suggestion if you want a modern alternative to understand - RISC-V has an open boot ecosystem. You can just try it in QEmu and maybe buy a board if you get more advanced?

_448 · on Jan 11, 2023

> Turns out the Raspberry Pi (and I'm guessing many other systems) are pretty confusing to understand at boot time.

That is because how ARM ecosystem functions. There is no standard way of integrating ARM CPU into a product as Arm, the company, just sells base IP and not the complete CPU. Every ARM licensee, from Qualcomm to Apple to Nvidia are free to design their own extensions and integrations into their SoC. There is no standard for this. This creates a lot of problems for writing a generic tutorial that you see in the x86 world.

hot_gril · on Jan 11, 2023

Maybe a good place to start is an old video game console. Those have a lot of community info because of interest in retro gaming/emulation, plus they're simpler than modern hardware.

Dork1234 · on Jan 11, 2023

Might want to look at MicroPython. The ports have various init functions for various chips /system. It might not be a full OS, but it is pretty low level code you can compare to various other systems.

https://github.com/micropython/micropython/tree/master/ports

pabs3 · on Jan 12, 2023

Check out the LibreRPi project, they are aiming to reverse engineer, document and replace the GPU firmware boot blob and other blobs.

https://github.com/librerpi/

saagarjha · on Jan 11, 2023

Macs with a T2 chip first start execution there before pulling the Intel CPU out of reset.

pabs3 · on Jan 12, 2023

This keynote about hardware vs operating systems knowledge of hardware was enlightening:

https://www.youtube.com/watch?v=36myc8wQhLo

sgtnoodle · on Jan 11, 2023

You could start with a simple microcontroller like an AVR. Pick up an Arduino Uno or a Mega and an ICSP programmer, then write your own bootloader for it.

bsder · on Jan 11, 2023

RPi's unfortunately have lousy documentation for the low level stuff.

Just about anything else (including dodgy Chinese substitutes) is better, sadly.

epigramx · on Jan 11, 2023

the "whole computer" is also thought to start on the memory, because you can have memory on a computer and a manual operator of it, but you can't have a computer with an operator of it and no memory.

psychphysic · on Jan 11, 2023

Yeah better off just getting QEMU to boot. Target 286 or something.

santadakota · on Jan 11, 2023

Ben Eater's fantastic video series on building a breadboard 6502 based computer and an 8-bit breadboard computer from scratch might be appreciated in this thread.

6502 playlist: https://www.youtube.com/watch?v=LnzuMJLZRdU&list=PLowKtXNTBy...

8-bit build playlist: https://www.youtube.com/watch?v=HyznrdDSSGM&list=PLowKtXNTBy...

He also sells kits if one is interested in playing along.

AceJohnny2 · on Jan 11, 2023

Highly recommend Ben Eater's video, which in this embedded developer's eyes has been the clearest, best explanation of the fundamentals of a CPU as I've ever seen.

snvzz · on Jan 11, 2023

Similarly, I have come to appreciate rehsd's[0] efforts in building a 80286 computer.

0. https://www.youtube.com/@rehsd/videos

rep_lodsb · on Jan 11, 2023

Wow, that series looks fascinating!

Only watched the first video yet, after initializing itself the CPU actually runs this code (because D8-D15 is wired to zero):

  addr    opcode
  FFFFF0  90           NOP
  FFFFF1  00 90 00 90  ADD [BX+SI+9000],DL
  FFFFF5  00 90 00 90  ADD [BX+SI+9000],DL
  FFFFF9  00 90 00 90  ADD [BX+SI+9000],DL
  FFFFFD  00 90 00 --  !! general protection fault

You can see it read and write the same address three times, then fetch the interrupt 0Dh vector and push flags+CS+IP to the stack :)

kens · on Jan 11, 2023

If anyone is interested in what happens with an older CPU, I've written up how the IBM 1401 from 1959 starts up: https://www.righto.com/2021/02/an-ibm-1401-mainframe-compute...

Among other things, since it uses magnetic core memory, you can run the program that was loaded when you shut it off.

TedDoesntTalk · on Jan 11, 2023

Go Ken!

saagarjha · on Jan 11, 2023

> The Z80's system, although simpler, creates a "hole" in the memory, because the bottom of the memory space is used by ROM and therefore you cannot use the beginning of the memory space for normal RAM work.

Gameboy actually does a funny thing where the boot ROM gets mapped at the bottom of the address space, and then it writes to a MMIO address to unmap the ROM overlay and restore the first 256 bytes of the cartridge there instead. It’s quite amusing!

> On some computer platforms, the instruction pointer is called the "program counter", inexplicably abbreviated "PG"

Typo, maybe? Typically it’s called “pc”.

Gordonjcp · on Jan 11, 2023

CP/M systems used to do that too, where the ROM would copy its important bits to upper memory and then swap itself out.

als0 · on Jan 11, 2023

> Regardless of where the CPU begins getting its instructions, the beginning point should always be somewhere in a ROM chip. The computer needs startup instructions to perform basic hardware checking and preparation, and these are contained in a ROM chip on the motherboard called the BIOS. This is where any computer begins executing its code when it is turned on.

I can't see any date on this, but this is a bit antiquated. For security and reliability, modern CPUs have an on-chip ROM, which is executed first. That on-chip ROM will tend do basic things like check clock, power, memory etc. Once that's complete it will then securely load firmware from the motherboard flash. Even modern cheapo microcontrollers are shipping with on-chip ROM these days.

pclmulqdq · on Jan 11, 2023

"It starts at 0 and executes instructions" is a funny, but mostly true way to express this. Some people are shocked that no magic happens before the instructions start.

monocasa · on Jan 11, 2023

Sort of. It's actually fairly common on larger cores (and particular, larger SoCs) for there to exist magic that happens before architectural reset vectors.

https://www.bunniestudios.com/blog/?p=5127

> By pre-boot code, I’m not talking about the little ROM blob that gets run after reset to set up your peripherals so you can pull your bootloader from SD card or SSD. That part was a no-brainer to share. I’m talking about the code that gets run before the architecturally guaranteed “reset vector”. A number of software developers (and alarmingly, some security experts) believe that the life of a CPU begins at the reset vector. In fact, there’s often a significant body of code that gets executed on a CPU to set things up to meet the architectural guarantees of a hard reset – bringing all the registers to their reset state, tuning clock generators, gating peripherals, and so forth. Critically, chip makers heavily rely upon this pre-boot code to also patch all kinds of embarrassing silicon bugs, and to enforce binning rules.

spijdar · on Jan 11, 2023

Something fun mentioned in a comment on that article, this the majority of this "pre-boot" code is actually FOSS in POWER9. There are a set of "auxiliary processors" called "PPE"s, among which there is one, the "SBE", or "self boot engine", which is a very small and simple PowerPC core that IPLs the big POWER9 cores [0]. These big processors with tons of cache and interconnects need a lot of help to get to executing PC 0x00.

I suspect that almost all the big "application processors" from Intel and AMD, and the exotic ARM/SPARC server chips, have equivalent embedded ICs to jump-start the "big cores".

[0] https://github.com/open-power/sbe/blob/master/src/sbefw/app/...

NegativeLatency · on Jan 11, 2023

Reminds me of the pony motor on old large excavating equipment. Both are also “big iron”

pclmulqdq · on Jan 11, 2023

In many modern CPUs, it's an auxiliary processor that "starts at 0" (within its dedicated ROM) and then eventually turns on the main CPUs. In a modern Intel core, I think that CPU is actually the one in the secure enclave, which also happens to do things like DRAM training...

In a microcontroller, the clock generators and the peripherals are often set up by the main core just after boot, and are under user control - the chip's reset network (literally just a wire) handles bringing things into a known state before boot.

P_I_Staker · on Jan 12, 2023

Yeah, I was actually thinking the basic behavior that OP mentioned could be quite rare these days. Last project I worked on there was a whole startup procedure. You could even use a special bootloader that they kept in ROM, if you wanted... It did a number of other things that were generally not visible to the developer, but probably required some CPU interaction.

MichaelZuo · on Jan 12, 2023

So what boots the ‘pre-boot’ code?

zerohp · on Jan 11, 2023

This is pretty old. There's quite a lot that happens before instructions start on modern CPUs.

Just one small example: CPUs have many small SRAM arrays for micro-architecture features like branch predictors. Some of these need to be initialized after reset by a state machine that takes many cycles.

I have even heard of a large chip that pushes initialization vectors through the scan chains so that all flops begin in an initial state, without requiring a reset network.

ip26 · on Jan 11, 2023

sidebar, have a question about an older comment of yours that can't be replied to. would you reach out via email (in profile)?

analog31 · on Jan 11, 2023

It's easy to overlook for those who program mostly on higher level systems. There was a recent HN thread that I think pointed to this article on what happens before main() in C programs:

https://embeddedartistry.com/blog/2019/04/08/a-general-overv...

On microcontrollers, there are often some preliminaries that are programmed into nonvolatile settings of the hardware, such as the type of clock oscillator. On a microprocessor like the Z80, your circuit was supposed to ensure that things like the clock oscillator and power supplies were stable before releasing the RESET pin.

raggi · on Jan 11, 2023

As long as things like embedded coprocessors (IME, TXT, PSP, etc) are kept out of view.

intelVISA · on Jan 11, 2023

That hasn't been true for a long time, unfortunately.

https://9esec.io/blog/hardware-assisted-root-of-trust-mechan...

derefr · on Jan 11, 2023

If older CPUs have no magic, what is the 6502 doing here (https://youtu.be/yl8vPW5hydQ?t=706) that causes it to put eb60 → ffff → eb60 → 01f7 → 01f6 → 01f5 on the address bus, before it actually reads from memory?

abbeyj · on Jan 11, 2023

https://www.pagetable.com/?p=410 may go some way toward explaining this. The behavior is slightly different than in the video. I believe this is because the video is using a 65C02 instead of the original NMOS 6502 and the implementation is slightly different.

derefr · on Jan 11, 2023

Brilliant; makes perfect sense of the "magic." It'd be interesting to see a compare-and-contrast of the "dump" of a decode ROM of a 65C02 vs the 6502's decode PLA, to see exactly what the 65C02 is doing with those few added cycles in what is presumably its generic BRK implementation.

hackan · on Jan 11, 2023

Is it me or the article seems incomplete? It kinda finishes for me after:

> The following are the memory ranges you get with a 2-to-4 converter on an 8-bit address bus:

And that's it. It looks truncated, or incomplete :thinking:

tyingq · on Jan 11, 2023

I see this at the end:

  The following are the memory ranges you get with a 2-to-4 converter on an 8-bit address bus:

  00: 00h to 3Fh
  01: 40h to 7Fh
  10: 80h to BFh
  11: C0h to FFh

hackan · on Jan 20, 2023

Yes, exactly that. It feels incomplete :S

squokko · on Jan 11, 2023

Now there's a domain I haven't seen in a very long time.

ChrisMarshallNY · on Jan 11, 2023

Looks like old 8-bit computers. I remember this from my days of Machine Code, and that was a long time ago.

Wonder when this was written? I would guess mid-1980s; maybe earlier.

Back then, we were all a lot closer to the hardware.

broast · on Jan 11, 2023

Amazing to see that tripod sites are still around.

greenyoda · on Jan 11, 2023

Even stranger: If you go to www.tripod.com, it redirects to https://www.tripod.lycos.com. Going to https://www.lycos.com, you can see that Lycos, one of several search engines that predated Google, is still around. Here's their about page: https://info.lycos.com/about/company-overview/

makeworld · on Jan 11, 2023

It seems like a pretty decent search engine also.

andirk · on Jan 11, 2023

Mine's still around altho it looks like I turned it from a The Simpsons fan page in to one sentence https://bartzone.tripod.com/ . And with allowing all scripts, it's >1,000 requests, >10MB of data.

These ad networks on top of ad networks on the oldest IE-compliant code with it's `document.write` fighting it out for eyeballs since 1999. As Lycos' motto says: Battling it out to complete obsolescence and will never give an inch. Go iframe go!

systematical · on Jan 11, 2023

It's the only reason I clicked on this.

Rimintil · on Jan 11, 2023

> The memory chips respond by sending the contents of the selected memory cell over the data bus to the CPU.

What does that ROM memory cell _physically look like_? How do we physically manipulate it to contain a 1 or a 0 (absence of something)?

wiml · on Jan 11, 2023

Each type of memory has a somewhat different design, but generally, the address lines will control a pair of demultiplexers, one which activates a single row in a rectangular grid of memory cells, and one which reads a single column. (Old DRAM chips had separate RAS and CAS — row address select and column address select — phases.) Each memory cell in that row puts its value onto its column line, and the column portion of the address determines which column line is routed to the output pin and fed back to the data bus. Multiply everything by 16 for a 16-bit memory bus.

The memory cell itself might be a handful of transistors forming a bistable flipflop (for SRAM), or it might be a capacitor and transistor (for DRAM), or a floating gate and transistor (for EPROM), or just a wire and transistor (for mask-ROM).

kaba0 · on Jan 11, 2023

For (much) more detail I can recommend Drepper’s “What every developer should know about memory” paper. Don’t be afraid by its publication date, it is still very very relevant.

spc476 · on Jan 11, 2023

There are several types of ROM available. The type you would find in most 8-bit computers from Atari, Apple, Commodore, Radio Shack, etc. would be chips pre-programmed (aka a Mask Read-Only Memory) at a factory with the contents, such that each bit is set high or low and cannot be changed at all.

Then there's PROM (Programmable Read-Only Memory), which is "empty" (either all 0, or all 1, I don't know the full details) but can be programmed once and that's it. Then there's EPROM (Erasable Programmable Read-Only Memory), which can be erased (by exposing the actual chip to UV light), then reprogrammed. Then there is EEPROM (Electrically Erasable Programmable Read-Only Memory) which can be electrically erased and reprogrammed. For each type of ROM, once programmed, it good for years, if not forever.

The programming of a PROM, EPROM or EEPROM usually consists of applying a higher than normal voltage to the chip on certain pins and is usually done in a separate device. How these chips works internally (how the gates are arranged, erased, programmed, etc.) is not something I know (being into software). This is just stuff I picked up over the years.

convolvatron · on Jan 11, 2023

my understanding is that there is a thin trace (a fuse) at each cell. in order to program it, the address lines are manipulated to select the word, and a programming voltage is applied (for some reason I remember 18v, but it really could be anything), that draws a large current across the fused and blows it.

on a die, you wouldn't go through that trouble, you would just directly synthesize the zeros and ones.

but we don't really use ROMs that much anymore, they are usually persistent and programmable

wittenbunk · on Jan 11, 2023

Many modern microcontrollers have programmable fuses for security reasons.

Gordonjcp · on Jan 11, 2023

Internally? Imagine a grid of wires where each address is one vertical wire and each bit of the output is one horizontal wire. The output wires have resistors pulling them up to 5V. To select an entry in the ROM you pull its corresponding vertical wire to ground.

Now here's the clever bit - there are diodes between the vertical and horizontal wires. When you pull a vertical wire to ground it pulls all the horizontal wires connected through diodes to ground, putting a 0 on those pins.

In a real "mask programmmed" ROM the grid is more square, so there are a lot of horizontal lines grouped in 8 bit bytes.

In an EPROM the diode is replaced by a little MOSFET. By applying a high voltage to its gate it'll stay charged basically forever, switching on and forming a "0" in the output.

alexeldeib · on Jan 11, 2023

A peer mentioned different types of memory and flip flops. I’d search for JK flip flop or D flip flop/latch for conceptual pointers. Those can be used to build up register files. The logic gates used to build flip flops/latches use things like NMOS/CMOS/PMOS - different kind of metal oxide seniconductor logic based on substrate. The differing behavior affects how you arrange circuits (voltage source, ground, etc) to provide desired logic from inputs (keywords: pull down/pull up network). Once you have logic and memory, you can build control units and things like arithmetic logic (look up von Neumann/Harvard architecture).

I’m not an expert in modern hardware, but these are the basic principles I recollect. Happy to be corrected if this answer is dated :)

dreamcompiler · on Jan 11, 2023

The standard decomposition is

--The instruction set is the lowest level of user-visible software. The instruction set is implemented (in principle) by microcode.

--Microcode (if present) is a bunch of hardwired logic signals that cause data to move between subunits of the CPU in response to instructions. Register transfer specifications govern the design at this level.

--A CPU is a collection of subunits including registers, arithmetic units, and logic gates.

--Registers are collections of flip-flops.

--Arithmetic units are made of logic gates.

--Flip-flops are made from logic gates in feedback configurations to make them stateful. State machine theory governs the design of these circuits.

--Logic gates are stateless and made of transistors. Boolean algebra governs the design at this level.

--Transistors are made of NMOS or PMOS or bipolar silicon junctions. Device physics and electrical properties govern this level.

This abstraction hierarchy is useful when learning, but in the real world the abstraction layers are much blurrier: Everything's really just transistors, stateless stuff often has invisible state, electrical considerations matter at every level, etc.

superasn · on Jan 11, 2023

I found this video a long time ago about this and it's a great way to understand it if you're a visual learner

https://www.youtube.com/watch?v=7J7X7aZvMXQ&t=5s

Rimintil · on Jan 11, 2023

This is an excellent video of exactly how RAM works. Now to find one for the various types of ROM!

gdprrrr · on Jan 11, 2023

Ben eater to the rescue https://www.youtube.com/watch?v=FnxPIZR1ybs

Rimintil · on Jan 11, 2023

For RAM, but I was asking about ROM. Since [type]ROM keeps its state for a long-term (looks like some manufactures claim 200 years shelf life under optimal conditions), I'd imagine it wouldn't use capacitors holding an electrical charge that require any form of refresh like RAM.

atomjames · on Jan 11, 2023

the nand2tetris course is a nice primer for understanding this kind of stuff:

nand2tetris.org/

ruslan · on Jan 11, 2023

Starting procedure on MOS6502 is not that simple as it is told in the article. It spends some 7 clocks on doing internal initialization and only then fetches address of first jump from reset vector (0xFFFC/0xFFFD). I find Ben Eater's series on building simple 8-bit breadboard computer using 6502 CPU very educating and entertaining.

https://www.youtube.com/watch?v=HyznrdDSSGM

tim_hutton · on Jan 11, 2023

One route to understanding how CPUs work is to explore the computers that have been made in cellular automata. Golly (https://golly.sourceforge.net/) has several, including one by John von Neumann, one by Edgar Codd and another by John Devore. The advantage of course is that the physics is trivial and you can see everything that happens and step backwards and forwards.

Example:

https://timhutton.github.io/2010/03/10/30984.html

https://github.com/GollyGang/ruletablerepository/wiki/CoddsD...

sanatgersappa · on Jan 11, 2023

Honestly, I had no idea tripod was still around.

fzliu · on Jan 11, 2023

> This is because when the power supply is first powering up, even if it only takes a second or two, the CPU has already received "dirty" power, because the power supply was building up a steady stream of electricity. Digital logic chips like CPUs require precise voltages, and they get confused if they receive something outside their intended voltage range.

This is only partially true. When digital chips boot up, gate outputs are in an indeterminate state. The reset sets them to know initial (and valid) values/bits.

martyvis · on Jan 11, 2023

All so called digital devices are really analog devices corralled into operating using binary logic. All it takes is a volt or two applied or removed unexpectedly on those discrete components and you have a bit-flip and possible ensuing mayhem.

storklathe · on Jan 11, 2023

Could anybody clarify for me the purpose of the NOP opcode that the article refers to? I would think that something like a "do nothing" instruction would want to be optimized away as much as possible, but maybe there's some hidden facet of the instruction protocol I'm not familiar with that necessitates it?

fennecfoxy · on Jan 11, 2023

I don't know how much it's really used these days, but when I've done asm stuff before if you have tight timing requirements in assembly, say you're bit banging an IO for some arcane serial protocol a la WS281X LED series, where it doesn't use a hardware supported serial protocol like SPI (where you can just DMA what you want to send to the controller) you then have to implement this protocol manually, in code, switching the IO pin on & off according to the timing requirements of said serial protocol/what data you want to send.

For the purposes of this, the nop instruction is useful as it's a great way to delay the processor for 1 instruction at a time and given you know the clock speed and therefore instructions per second you can: * Set the IO high * Use nops (or other instructions) to waste time * Set the IO low

This is very useful in situations where the timing needs to be too precise to use interrupts, which are somewhat unpredictable. Obviously this means an entire CPU is held up running this code.

Another timing usage is for generating a signal to display a picture on a CRT using a microcontroller. Plenty of others as well.

Personally, though, I would never bother. It's always better to get a dedicated chip/controller for something like driving those stupid LEDs (or just get an SPI LED) or to generate a TV signal.

Reading up on it further, apparently it is also used as a way to reserve space in code memory, I imagine for self-modifying code (ie fill with nops which can be replaced with actual instructions by your code, depending on code execution). But I've never actually done this myself.

seanw444 · on Jan 11, 2023

I believe you can get a good idea for using NOPs if you watch Ben Eater's video on implementing the RS-232 protocol.

storklathe · on Jan 11, 2023

Thank you so much, that makes perfect sense!

miga · on Jan 12, 2023

It describes old 8-bit machines. New machines start by executing instructions firmware ROM or in cache memory before RAM is tested and initialized.

Additionally the reset registers differ by the platform. (Some platforms expect software to initialize and reset some registers.)

snvzz · on Jan 11, 2023

A tangent, but I thought worth mentioning:

If you like this sort of plain text technical document, you might enjoy browsing the net with Gopher.

I recommend Gopherus[0] as a modern implementation that's cross-platform.

0. https://gopherus.sourceforge.net/

fennecfoxy · on Jan 11, 2023

I think this is traditional for this topic: https://nandgame.com/

Say goodbye to a couple hours!

charcircuit · on Jan 11, 2023

I thought it started by executing microcode from ROM.

martyvis · on Jan 11, 2023

Which is precisely what another of today's HN posts alludes to https://news.ycombinator.com/item?id=34329201

valleyer · on Jan 11, 2023

It does, in the sense that executing any (regular, macro) code involves executing microcode. But no, there's not a separate phase during which only microcode is executed, if that's what you mean.

DotaFan · on Jan 11, 2023

Last week I've tried to understand how Chip8 worked, and it did help me understand this article a bit more.

dirtyid · on Jan 11, 2023

Somewhere out there is a company specializing in selling tripods patiently waiting for the domain to free up.

Wolfenstein98k · on Jan 11, 2023

What a delightful article to read. Thanks for sharing!

bandrami · on Jan 11, 2023

Tripod still exists?