- v2/v3 (not r1) seem to be cloned from o1/4o output, and perform worse (this cost the oft-repeated 5ish mm USD)
- r1 is specifically a reasoning step (using RL) _on top of_ v2/v3 and performs similarly to o1 (the cost of this is _not reported anywhere_)
- In the o1 blog post, they specifically say they use RL to add reasoning to LLMs: https://openai.com/index/learning-to-reason-with-llms/
reply
- v2/v3 (not r1) seem to be cloned from o1/4o output, and perform worse (this cost the oft-repeated 5ish mm USD)
- r1 is specifically a reasoning step (using RL) _on top of_ v2/v3 and performs similarly to o1 (the cost of this is _not reported anywhere_)
- In the o1 blog post, they specifically say they use RL to add reasoning to LLMs: https://openai.com/index/learning-to-reason-with-llms/