If read is 1GB/s then it takes 20s to infer across a 20GB model. That's 3 tokens... | Hacker News

Hacker Newsnew | past | comments | ask | show | jobs | submit

		flangola7 on March 3, 2023 \| parent \| context \| favorite \| on: Facebook LLAMA is being openly distributed via tor... If read is 1GB/s then it takes 20s to infer across a 20GB model. That's 3 tokens a minute.

bick_nyers on March 4, 2023 [–]

Yeah I'm not sure what kind of math I was subscribed to yesterday, thanks

flangola7 on March 4, 2023 | [–]

:-) I keep saying that we don't have to stop AI from hallucinating, we only need to bring the rate to below human level.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact