Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I would be surprised if you can't. The smallest weight file is 14gb apparently


https://github.com/facebookresearch/llama/blob/main/FAQ.md#3

Looks like it needs 14gb for weights and it isn't clear what the minimum size for the decoding cache is, but it defaults to settings for 30gb GPUs.


In int8 7B needs only 9GB of VRAM and 13B needs only 20GB on a single GPU. https://github.com/oobabooga/text-generation-webui/issues/14...




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: