Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I feel like the fact that ML has no good explanation why it works this well gives a lot of people room to invent their head-canon, usually from their field of expertise. I've seen this from exceptionally intelligent individuals too. If you only have a hammer...


I think it would be more unusual, and concerning, if an intelligent individual didn't attempt to apply their expertise for a head-canon of something unknown.

Coming up with an idea for how something works, by applying your expertise, is the fundamental foundation of intelligence, learning, and was behind every single advancement of human understanding.

People thinking is always a good thing. Thinking about the unknown is better. Thinking with others is best, and sharing those thoughts isn't somehow bad, even if they're not complete.


When you say ML, I assume you really mean LLMs?

Even with LLMs, there's no real mystery about why they work so well - they produce human-like input continuations (aka "answers") because they are trained to predict continuations of human-generated training data. Maybe we should be a bit surprised that the continuation signal is there in the first place, but given that it evidentially is, it's no mystery that LLMs are able to use it - just testimony to the power of the Transformer as a predictive architecture, and of course to gradient descent as a cold unthinking way of finding an error minimum.

Perhaps you meant how LLMs work, rather than why they work, but I'm not sure there's any real mystery there either - the transformer itself is all about key-based attention, and we now know that training a transformer seems to consistently cause it to leverage attention to learn "induction heads" (using pairs of adjacent attention heads) that are the main data finding/copying primitive they use to operate.

Of course knowing how an LLM works in broad strokes isn't the same as knowing specifically how it is working in any given case, how is it transforming a specific input layer by layer to create the given output, but that seems a bit like saying that because I can't describe - precisely - why you had pancakes for breakfast, that we don't know how the brains works.




Consider applying for YC's Winter 2026 batch! Applications are open till Nov 10

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: