Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I now understand what you were trying to say. I think it is still wrong to think of it as "memorization vs. generalization". In my opinion, the question is "why does it happen to generalize when it is in the process of memorization?" As alluded to in another comment, my belief is that the dimensionality of neural networks combined with certain optimization schemes favor networks whose outputs aren't overly sensitive to changes in the data. That is basically the definition of generalization. It also explains why double descent occurs. First, it memorizes as best as it can since this is "easy", then the optimization scheme starts to push it towards parameters that yield better generalizability.


Consider applying for YC's Winter 2026 batch! Applications are open till Nov 10

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: