Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

It's gradient descent. Why are we surprised when the answers get better the more we do it? Sometimes you're stuck in a local max/minima, and you hallucinate.

Am I oversimplifying it? Is everybody else over-mystifying it?



Gradient descent is how the model weights are adjusted during training. There is no gradient descent, and nothing even remotely similar to it, that happens during inference.


Fair, thanks for pointing that out.

If you allow me to view the weights of a model as the axioms in an axiomatic system, my (admittedly limited) understanding of modern "AI" inference is that it adds no net new information/knowledge, just more specific expressions of the underlying structure (as defined by the model weights).

So while that does undercut my original flippancy of it being "nothing but gradient descent" I don't think it runs counter to my original point that nothing particularly "uncanny" is happening here, no?


To some extent, I think the axiom comparison is enlightening. In principle, axioms determine the entire space of all mathematical truths in that system. However, in practice, not all truths in a mathematical system are as easy to discover or verify. Knowing the axioms of arithmetic doesn't mean you don't get anything out of computing how much 271819 × 637281 actually is.

The claims about LLM reasoning are precisely related to this point. Do LLMs follow some internal deductive process when they generate output that resembles such a logical process to humans? Or are they just producing text that looks like reasoning, much like an absurdist play might do, and then simply picking a conclusion that resembles other problems they've seen in the past?

I don't think any arguments about the base nature of the model are particularly helpful here. In principle, deductive reasoning can be expressed as a mathematical function, and for any function, there is a neural net that can approximate it with arbitrary precision. So it's not impossible that the model actually does this, but it's also not a given - this first principles approach is just not helpful. We need more applied study of how the model actually works to probe this deeper.




Consider applying for YC's Fall 2025 batch! Applications are open till Aug 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: