Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

GPT-3's transformers only recur some finite amount. Attention does a lot compared to a bog standard RNN, and probably if the numbers were tokenized it would be enough for most reasonable computations, but eventually you definitely would hit a cap. That's probably a good thing, of course. The network and training are Turing complete together, but it would suck if the network itself could fail to terminate.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: