Once you are working with local LLMs you quickly run into CUDA Out of Memory err... | Hacker News

Hacker Newsnew | past | comments | ask | show | jobs | submit

		jsemrau 4 months ago \| parent \| context \| favorite \| on: Context Rot: How increasing input tokens impacts L... Once you are working with local LLMs you quickly run into CUDA Out of Memory error. Managing input context input sizes in prompts is really critical. Also keeps cost down.

kbelder 4 months ago [–]

If you're working with local LLMs, why do you care about cost?

jsemrau 4 months ago | [–]

You can use a lower-end GPU (like the RTX 3060), which also uses less energy. But you are right, you won't be encountering model API costs when running it locally.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact