Always wondered why we can't virtualize VRAM like how we did for VMs.

WithinReason · on Sept 4, 2022

Good question.

    Bandwidth of dual channel DDR4-3600: 48 GB/s
    Bandwidth of PCIe 4 x16: 26 GB/s
    Bandiwdth of 3090 GDDR6X memory: 935.8 GB/s

Since neural network evaluation is usually bandwidth limited, it's possible that pushing the data through PCI-E from CPU to GPU is actually slower than doing the evaluation on CPU only for typical neural networks.

https://www.microway.com/knowledge-center-articles/performan...

https://en.wikipedia.org/wiki/List_of_Nvidia_graphics_proces...

redox99 · on Sept 4, 2022

And that's without even taking into account latency of accessing main memory through PCIe, which would make matters even worse.

sp332 · on Sept 4, 2022

Ok, but at least it would run.

zamadatix · on Sept 4, 2022

What's the point of running it on the GPU if to do so you need to make it slower tham running in the CPU? Just run it on the CPU at that point.

exikyut · on Sept 4, 2022

I once tried to start Firefox (back in the 2.5-3.0 days >:D) on a Celeron with 64MB RAM.

It worked perfectly fine, with the sole exception that the HDD LED was on solid the whole time, a single window took just over a literal half an hour to open, and loading a webpage took about 1-2 minutes.

But it worked.

WithinReason · on Sept 4, 2022

It already does, on the CPU.

kernelsanderz · on Sept 4, 2022

You kind of can - projects like deepspeed (https://www.deepspeed.ai/) enable running a model that is larger than in VRAM through various tricks like moving weights from regular system RAM into VRAM between layers. Can come with a performance hit though depending on the model, of course.

matsemann · on Sept 4, 2022

For training you can often divide the batch size by n (and then only apply the backprop gradient stuff after each n batches for it to be mathematically equivalent). At a cost of speed, though.

amelius · on Sept 4, 2022

Do libraries like torch and tensorflow facilitate this?

fragmede · on Sept 4, 2022

Yes, eg https://pytorch.org/docs/stable/generated/torch.nn.parallel....

amelius · on Sept 4, 2022

Thank you!

matsemann · on Sept 4, 2022

Quite trivial to implement this yourself if you want to. See gradient accumulation in fastai for instance https://www.kaggle.com/code/jhoward/scaling-up-road-to-the-t...