>What I mean is that when its generate a response, the computation happens on a snapshot from input to output, trying to map a set of tokens, into a set of tokens. Model doesn't operate on a context larger than the window
The weights in the model have the larger context, the context length size of data is just the input, which then gets multiplied by those weights, to get the output.
The weights in the model have the larger context, the context length size of data is just the input, which then gets multiplied by those weights, to get the output.