Hacker News new | past | comments | ask | show | jobs | submit login

I would say you also need a fair bit of data too...



Well, yes. But I think lucidrains was referring to:

https://arxiv.org/abs/1706.03762


and transformers


The Attention Is All You Need paper is where Transforms were introduced:

We propose a new simple network architecture, the Transformer, based solely on attention mechanisms, dispensing with recurrence and convolutions entirely.


Yup, and the state of the art BERT and gpt-2 are both based on transformers.




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: