One can keep all tensors in the RAM, just push whatever needed to GPU VRAM, basi...

		bitL on May 14, 2023 \| parent \| context \| favorite \| on: Run Llama 13B with a 6GB graphics card One can keep all tensors in the RAM, just push whatever needed to GPU VRAM, basically limited by PCIe speed. Or some intelligent strategy with read-ahead from SSD if one's RAM is limited. There are even GPUs with their own SSDs.