The another element that's missing is control of output sampling. LLM models don...

The another element that's missing is control of output sampling. LLM models don't actually produce text. They produce a word probability map. Essentially a huge table of every single word(token) they know and the probability value of it being next. You run the model again and again to get each next word. You don't have to pick up the most probable word. Doing that is called greedy decoding. You can randomise a bit. Pick up one of less likely words if they have similar probabilities. This makes the output "more creative" sometimes. There are also more advanced ways of "steering the model" by maintaining a list of possible sentences and switching from one to another if it is considered better. You can run a smaller model on the output so far to judge if this answer is not becoming inappropriate etc.

Output decoding is a huge way to control the answers. Most users aren't even aware it exists. It's one of the reasons why comparing "naked" open source models to chatgpt is unfair. Chatgpt has all these extras on top.