This is actually pretty common. There was a time when not every CPU had a FPU and floating point operations were emulated in software if required. Or doing 64 bit integer operations on a 32 bit CPU. Or doing arbitrary precision arithmetics on todays CPUs. You do it just like in school, piece by piece, but instead of single digits you use the largest registers available. If you really like it slow you could of course also just use plain strings to represent numbers and do all math on a char by char, digit by digit basis.
There used to be bit-slice processors that chained together narrow (eg. 4-bit) chips to build a wider processor. This is pretty much a software emulation of that technique, but lacking the parallelism.
Commonly, to perform something like an arbitrary width addition, these CPUs have a carry flag that is set if an operation overloads, so that the carry can be added into the next addition. This way you can simply chain simple 8-bit additions to get the desired operation width. For example, a 16-bit addition (0x5555 + 0x1234 on 6502
clc
lda a
adc b
sta res
lda a+1
adc b+1
sta res+1
a:
.byte $55, $55
b:
.byte $34, $12
res:
.byte $00, $00 ; reserved for result
The simple answer would be: by using memory. More bits means that the CPU can operate at larger amounts of data at once.
For example shift would need to perform multiple 8bit operations but it's entirely doable (I hope the cpu he used has shifts with the carry flag)
Adding to some of these other comments, when IBM did their groundbreaking 32 bit System/360 in the 1960's, they started with a nice silicon transistor based logic family, but it pretty much ran at only one speed. So to deliver a family of computers with the same ISA, they microcoded the slower ones.