Once you are working with local LLMs you quickly run into CUDA Out of Memory error. Managing input context input sizes in prompts is really critical. Also keeps cost down.
You can use a lower-end GPU (like the RTX 3060), which also uses less energy.
But you are right, you won't be encountering model API costs when running it locally.