I learned this very explicitly recently. I've had some success with project and branch prompts - feeding a bunch of context into the beginning of each dialog.
In one dialog, some 30k tokens later, Claude requested the contents of package.json... which was in the context window already - the whole file!
The strange thing was that after I said so, without re-inserting, Claude successfully read it from context to fill the gap in what it was trying to do.
It's as if a synopsis of what exists in-context delivered with each message would help. But that feels weird!
Most chat is just a long running prompt. LLMs have zero actual memory. You just keep feeding it history.
Maybe I misunderstood what you're saying but what you're describing is some kind of 2nd model that condenses that history and that gets fed; this has been done.
Really, what you probably need is another model managing the heap and the stack of the history and brining forward the current context.
In one dialog, some 30k tokens later, Claude requested the contents of package.json... which was in the context window already - the whole file!
The strange thing was that after I said so, without re-inserting, Claude successfully read it from context to fill the gap in what it was trying to do.
It's as if a synopsis of what exists in-context delivered with each message would help. But that feels weird!