So please let me know if I am wrong are you guys running a batch size of 1 in 50... | Hacker News

Hacker Newsnew | past | comments | ask | show | jobs | submit

		varunvummadi on Feb 19, 2024 \| parent \| context \| favorite \| on: Groq runs Mixtral 8x7B-32k with 500 T/s So please let me know if I am wrong are you guys running a batch size of 1 in 500 GPU's? then why are the responses almost instant if you guys are using batch size 1 and also when can we expect bring your own fine tuned models kind of thing. Thanks!

tome on Feb 19, 2024 [–]

We are not using 500 GPUs, we are using a large system built from many of our own custom ASICs. This allows us to do batch size 1 with no reduction in overall throughput. (We are doing pipelining though, so many users are using the same system at once).

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact