Hacker News new | past | comments | ask | show | jobs | submit login

I'm suspecting it could be similar to the learned vs. predefined positional embeddings in GPTs. That is, the learned version is a "warped and distorted" version of the exact predefined pattern, and yet somehow it performs a bit better, and no one knows exactly why.



Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: