Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Once you are working with local LLMs you quickly run into CUDA Out of Memory error. Managing input context input sizes in prompts is really critical. Also keeps cost down.


If you're working with local LLMs, why do you care about cost?


You can use a lower-end GPU (like the RTX 3060), which also uses less energy. But you are right, you won't be encountering model API costs when running it locally.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: