Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

If read is 1GB/s then it takes 20s to infer across a 20GB model. That's 3 tokens a minute.


Yeah I'm not sure what kind of math I was subscribed to yesterday, thanks


:-) I keep saying that we don't have to stop AI from hallucinating, we only need to bring the rate to below human level.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: