Hacker News
new
|
past
|
comments
|
ask
|
show
|
jobs
|
submit
login
snowfield
on Feb 19, 2024
|
parent
|
context
|
favorite
| on:
Groq runs Mixtral 8x7B-32k with 500 T/s
This is a 50B model. (Mixtral 8x7b)
SeanAnderson
on Feb 19, 2024
[–]
Oh, sorry, I assumed the 8 was for quantization. 8x7b is a new syntax for me.
Still, the NVIDIA chart shows Llama v2 70B at 750 tok/s, no?
tome
on Feb 19, 2024
|
parent
[–]
I guess that's total throughput, rather than per user? You can increase total throughput by scaling horizontally. You can't increase throughput per user that way.
Guidelines
|
FAQ
|
Lists
|
API
|
Security
|
Legal
|
Apply to YC
|
Contact
Search: