This video by Computerphile is a great overview of transformers and how we got t...

cercatrova on Aug 21, 2022 | parent | context | favorite | on: A demo of GPT-3's ability to understand long instr...

This video by Computerphile is a great overview of transformers and how we got to this point [0]. Basically the networks we used before, recurrent neural networks, "forgot" prior information so they're not good at long tasks. The transformer architecture however does not forget (or at least as easily).

[0] https://www.youtube.com/watch?v=rURRYI66E54