The README says "RL" without specifying what kind of RL is used. Researchers: I ...

ainch · 2025-05-04T13:15:48 1746364548

The technical report does go into a lot of depth about how they use RL, such as the modified GRPO objective they use. As far as the README, I imagine most people active in the field understand the implications of "RL" for a reasoning model.

paulluuk · 2025-05-01T10:09:59 1746094199

I assume they mean "Reinforcement Learning", and it's been a decade since I studied AI in university, but isn't it perfectly valid to just say "RL"? What kind of specificity are you looking for, whether they used Q-Learning or some other algorithm?

xpe · 2025-05-02T13:00:00 1746190800

I wouldn’t phrase it as a matter of “validity”. I would phrase it as a question of transparency.

Putting a model out in public without clearly explaining how it works doesn’t meet my bar for a proper scientific exchange of knowledge. Perhaps they are being intentionally vague for competitive reasons.

RL is a generic term that can be mixed and matched with various other methods. In the context of LLMs, often some variation of RLHF is used.

But the authors don’t even say “RLHF”, much less explain their methodology. Understanding this isn’t just a matter of academic interest; it has implications for understanding and using this work.

I’m often concerned by the writing quality of ML/AI papers but this strikes me as particularly disappointing.

It is increasingly important to have confidence that the creators of AI systems are thoughtful and thorough. I want to see their reasoning. I want to understand the trade-offs they make and why.

paulluuk · 2025-05-06T05:19:30 1746508770

If you put it like that, I absolutely agree with you, except that I suppose I don't really consider this an exchange of knowledge but more like the release of an open-source project: the only thing they need to publish are instructions on how to use it. I don't think they’re really interested in anyone improving their model by themselves or reproducing the work. It would be amazing if they did, though!