I'd encourage everyone to learn about Metropolis Hastings Markov chain monte car...

		igorkraw 80 days ago \| parent \| context \| favorite \| on: Reasoning LLMs are wandering solution explorers I'd encourage everyone to learn about Metropolis Hastings Markov chain monte carlo and then squint at lmms, think about what token by token generation of the long rollouts maps to in that framework and consider that you can think of the stop token as a learned stopping criterion accepting (a substring of) the output

You seem to have squinted already. What are the results?