Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I mean, faces are faces, right? If the training data set is large and representative I don't see why any two (representative) halves of the data would lead to significantly different models.


I think that's the point; language is language.

If there's some fundamental limit of what type of intelligence the current breed of LLMs can extract from language, at some point it doesn't matter how good or expansive the content of the training set is. Maybe we are finally starting to hit an architectural limit at this point.


But information is not information. They may be able to talk in the same style, but not about the same things.




Consider applying for YC's Winter 2026 batch! Applications are open till Nov 10

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: