Hacker News
new
|
past
|
comments
|
ask
|
show
|
jobs
|
submit
login
karmasimida
on April 25, 2024
|
parent
|
context
|
favorite
| on:
Snowflake Arctic Instruct (128x3B MoE), largest op...
That is my reading too, if you consider latency as the utmost inference metric, then you need all models in memory all the time.
What is you guys 70B configuration, do you guys try TP=8 for the 70B model for a fair comparison?
Guidelines
|
FAQ
|
Lists
|
API
|
Security
|
Legal
|
Apply to YC
|
Contact
Search:
What is you guys 70B configuration, do you guys try TP=8 for the 70B model for a fair comparison?