Hacker News new | past | comments | ask | show | jobs | submit login

In the case of ARC they are referring to verifiable math and reasoning problems. They still used SFT and model-based rewards for other domains.





Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: