Oh, that's not a problem. Just cache the retrieval lookups too.

michaelhoney · 2025-05-21T23:20:25 1747869625

it's pointers all the way down

drob518 · 2025-05-22T01:13:31 1747876411

Just add one more level of indirection, I always say.

EGreg · 2025-05-22T01:28:44 1747877324

But seriously… the solution is often to cache / shard to a halfway point — the LLM model weights for instance — and then store that to give you a nice approximation of the real problem space! That’s basically what many AI algorithms do, including MCTS and LLMs etc.