Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

In RNN, hidden states are to be sequential; in transformers with attention mechanism, we break free of the sequential requirement. Transformers are more amenable to parallelism, and make use of GPUs the most (within the context axis, and outside).


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: