Hacker News
new
|
past
|
comments
|
ask
|
show
|
jobs
|
submit
login
boroboro4
3 months ago
|
parent
|
context
|
favorite
| on:
Kimi K2 is a state-of-the-art mixture-of-experts (...
Check out DeepSeek v3 model paper. They changed the way they train experts (went from aux loss to different kind expert separation training). It did improve experts domain specialization, they have neat graphics on it in the paper.
Consider applying for YC's Winter 2026 batch! Applications are open till Nov 10
Guidelines
|
FAQ
|
Lists
|
API
|
Security
|
Legal
|
Apply to YC
|
Contact
Search: