Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

load model, compute a 1k token response (ie, do a thousand forward passes in sequence, one per token), load a different model, compute a response,

I would expect the model loading to take basically zero percent of the time in the above workflow



Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: