Bandwidth of dual channel DDR4-3600: 48 GB/s
Bandwidth of PCIe 4 x16: 26 GB/s
Bandiwdth of 3090 GDDR6X memory: 935.8 GB/s
Since neural network evaluation is usually bandwidth limited, it's possible that pushing the data through PCI-E from CPU to GPU is actually slower than doing the evaluation on CPU only for typical neural networks.
I once tried to start Firefox (back in the 2.5-3.0 days >:D) on a Celeron with 64MB RAM.
It worked perfectly fine, with the sole exception that the HDD LED was on solid the whole time, a single window took just over a literal half an hour to open, and loading a webpage took about 1-2 minutes.
You kind of can - projects like deepspeed (https://www.deepspeed.ai/) enable running a model that is larger than in VRAM through various tricks like moving weights from regular system RAM into VRAM between layers. Can come with a performance hit though depending on the model, of course.
For training you can often divide the batch size by n (and then only apply the backprop gradient stuff after each n batches for it to be mathematically equivalent). At a cost of speed, though.