They likely continue to train dense models because they are far easier to fine t... | Hacker News

Hacker Newsnew | past | comments | ask | show | jobs | submit

		leetharris 7 months ago \| parent \| context \| favorite \| on: How has DeepSeek improved the Transformer architec... They likely continue to train dense models because they are far easier to fine tune and this is a huge use case for the Llama models

whimsicalism 7 months ago [–]

It probably also has to do with their internal infra. If it were just about dense models being easier for the OSS community to use & build on, they should probably be training MoEs and then distilling to dense.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact