sorry if I misread your comment, but you seem to be indicating that LLMs such as chat gpt (which use gpt 3+) are bag of words models? they are sequence models.
I edited my response... I hope it helps... my understanding is that the output gives probabilities for all the words, then one is chosen with some random thrown in (via the #temperature) then fed back in... which to me seems to equate to bag of words. Perhaps I mis-understood the term.
Bag of words models use a context that is a "bag" (i.e. an unorder map from elements to their counts) of words/tokens. GPT's use a context that is a sequence (i.e. an ordered list) of words/tokens.