Impossible? It’s just a bunch of math, you don’t need to keep the entire network...

brucethemoose2 · on Sept 17, 2023

Well, any scheme where weights are dynamically loaded/unloaded from memory enough to fit on a 48GB GPU are so slow that training is basically impractical. Your 70B model would be obsolete by the time the finetuning is done.

Some inference frameworks came up with schemes for just this, and it was horrifically slow.