Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

One could argue TF-IDF is a case of an attention layer... but not quadratic in inference/training and kinda just a quotient. Yeah maybe we should go back


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: