Some info that may be missing: - v2/v3 (not r1) seem to be cloned from o1/4o out... | Hacker News

Hacker News new | past | comments | ask | show | jobs | submit

login

		PontifexCipher 11 days ago \| parent \| context \| favorite \| on: OpenAI says it has evidence DeepSeek used its mode... Some info that may be missing: - v2/v3 (not r1) seem to be cloned from o1/4o output, and perform worse (this cost the oft-repeated 5ish mm USD) - r1 is specifically a reasoning step (using RL) _on top of_ v2/v3 and performs similarly to o1 (the cost of this is _not reported anywhere_) - In the o1 blog post, they specifically say they use RL to add reasoning to LLMs: https://openai.com/index/learning-to-reason-with-llms/

sudosysgen 11 days ago [–]

The R1-Zero paper shows how many training steps the RL took, and it's not many. The cost of the RL is likely a small fraction of the cost of the foundational model.

Consider applying for YC's Spring batch! Applications are open till Feb 11.
Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact