10 tokens / second is what you get running llama-30b entirely on the GPU. A 65b ...

		int_19h on May 15, 2023 \| parent \| context \| favorite \| on: Run Llama 13B with a 6GB graphics card 10 tokens / second is what you get running llama-30b entirely on the GPU. A 65b model will be slower than that since there's more compute involved.