So context size actually helps with this, relative to how LLMs are actually deployed as applications. For example, if you look at how the “continue” option in the DeepSeek web app works for code gen, what they’re likely doing is reinserting the prior messages (in some form) to a new one to prompt further completion. The more context size a model has and can manage successfully, the better it will likely be able at generating longer code blocks.
Isn't input/output lengths an arbitrary distinction. Under the hood, output becomes the input for the next token at each step. OpenAI may charge you more $$ by forcing you to add output to the input and call the API again. But running local you don't have that issue.