I guess this would be the context window size in the case of LLMs.
Edit: On second thought, maybe at a certain minimum context window size it is possible to cajole the instructions in such a way that you at any point in the process make the LLM work at a suitable level of abstraction more like humans do.
Maybe the issue is that for us the "context window" that we feed ourselves is actually a compressed and abstracted version - we do not re-feed ourselves the whole conversation but a "notion" and key points that we have stored. LLMs have static memory so I guess there is no other way as to single-pass the whole thing.
For human-like learning it would need to update it state (learn) on the fly as it does inference.
Half baked idea: What if you have a tree of nodes. Each node stores a description of (a part of) a system and an LLM generated list of what the parts of it are, in terms of a small step towards concreteness. The process loops through each part in each node recursively, making a new node per part, until the LLM writes actual compilable code.
See https://github.com/mit-han-lab/streaming-llm and others. There's good reason to believe that attention networks learn how to update their own weights (Forget the paper) based on their input. The attention mechanism can act like a delta to update weights as the data propagates through the layers. The issue is getting the token embeddings to be more than just the 50k or so that we use for the english language so you can explore the full space, which is what the attention sink mechanism is trying to do.
Edit: On second thought, maybe at a certain minimum context window size it is possible to cajole the instructions in such a way that you at any point in the process make the LLM work at a suitable level of abstraction more like humans do.