Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

You can't just wave your hand and tell someone that words are broken up into sub-word tokens that are then transformed into a numerical representation to feed to a transformer and expect people to understand what is happening. How is anyone supposed to understand what a transformer does without understanding what the actual inputs are (e.g. word embeddings)? Plus, those embeddings directly related to the self attention scores calculated in the transformer. Understanding what an embedding is is extremely relevant.


Consider applying for YC's Winter 2026 batch! Applications are open till Nov 10

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: