sorry if I misread your comment, but you seem to be indicating that LLMs such as...

mikewarot · on March 28, 2024

I edited my response... I hope it helps... my understanding is that the output gives probabilities for all the words, then one is chosen with some random thrown in (via the #temperature) then fed back in... which to me seems to equate to bag of words. Perhaps I mis-understood the term.

smaddox · on March 28, 2024

Bag of words models use a context that is a "bag" (i.e. an unorder map from elements to their counts) of words/tokens. GPT's use a context that is a sequence (i.e. an ordered list) of words/tokens.