Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

This is correct. Caching only saves you from having to recompute self attention on the system prompt tokens, but not from the attention from subsequent tokens, which are free to attend to the prompt.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: