It's gradient descent. Why are we surprised when the answers get better the more...

tsimionescu · 2025-06-13T03:48:54 1749786534

Gradient descent is how the model weights are adjusted during training. There is no gradient descent, and nothing even remotely similar to it, that happens during inference.

klank · 2025-06-13T19:07:22 1749841642

Fair, thanks for pointing that out.

If you allow me to view the weights of a model as the axioms in an axiomatic system, my (admittedly limited) understanding of modern "AI" inference is that it adds no net new information/knowledge, just more specific expressions of the underlying structure (as defined by the model weights).

So while that does undercut my original flippancy of it being "nothing but gradient descent" I don't think it runs counter to my original point that nothing particularly "uncanny" is happening here, no?

tsimionescu · 2025-06-14T04:34:53 1749875693

To some extent, I think the axiom comparison is enlightening. In principle, axioms determine the entire space of all mathematical truths in that system. However, in practice, not all truths in a mathematical system are as easy to discover or verify. Knowing the axioms of arithmetic doesn't mean you don't get anything out of computing how much 271819 × 637281 actually is.

The claims about LLM reasoning are precisely related to this point. Do LLMs follow some internal deductive process when they generate output that resembles such a logical process to humans? Or are they just producing text that looks like reasoning, much like an absurdist play might do, and then simply picking a conclusion that resembles other problems they've seen in the past?

I don't think any arguments about the base nature of the model are particularly helpful here. In principle, deductive reasoning can be expressed as a mathematical function, and for any function, there is a neural net that can approximate it with arbitrary precision. So it's not impossible that the model actually does this, but it's also not a given - this first principles approach is just not helpful. We need more applied study of how the model actually works to probe this deeper.