Hacker News new | past | comments | ask | show | jobs | submit login

This video by Computerphile is a great overview of transformers and how we got to this point [0]. Basically the networks we used before, recurrent neural networks, "forgot" prior information so they're not good at long tasks. The transformer architecture however does not forget (or at least as easily).

[0] https://www.youtube.com/watch?v=rURRYI66E54




Consider applying for YC's Summer 2025 batch! Applications are open till May 13

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: