This is a 50B model. (Mixtral 8x7b) | Hacker News

Hacker Newsnew | past | comments | ask | show | jobs | submit

		snowfield on Feb 19, 2024 \| parent \| context \| favorite \| on: Groq runs Mixtral 8x7B-32k with 500 T/s This is a 50B model. (Mixtral 8x7b)

SeanAnderson on Feb 19, 2024 [–]

Oh, sorry, I assumed the 8 was for quantization. 8x7b is a new syntax for me.

Still, the NVIDIA chart shows Llama v2 70B at 750 tok/s, no?

tome on Feb 19, 2024 | [–]

I guess that's total throughput, rather than per user? You can increase total throughput by scaling horizontally. You can't increase throughput per user that way.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact