Yes, but I meant it slightly differently than the distills. The idea is to creat... | Hacker News

Hacker News new | past | comments | ask | show | jobs | submit

login

		spyckie2 7 days ago \| parent \| context \| favorite \| on: An analysis of DeepSeek's R1-Zero and R1 Yes, but I meant it slightly differently than the distills. The idea is to create the next gen SOTA non reasoning model with synthetic reasoning training data.

entropicdrifter 6 days ago [–]

So you mean something like, "what if the baseline, off-the-cuff response for the next-gen models was tuned based on the results of the reasoning model excluding the reasoning itself?"

spyckie2 6 days ago | [–]

Exactly, albeit it may need the reasoning later to form the proper foundational logic in the weights.

Consider applying for YC's Spring batch! Applications are open till Feb 11.
Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact