Hacker News new | past | comments | ask | show | jobs | submit login

it seems to work and seems very scalable, "reasoning" helps to counter biases (answers become longer, ie. the system uses more tokens which means more time to answer a question -- likely longer answers allow better differentiation of answers from each other in the "answer space")

https://newsletter.languagemodels.co/i/155812052/large-scale...

also from the posted article

"""

The R1-Zero training process is capable of creating its own internal domain specific language (“DSL”) in token space via RL optimization.

This makes intuitive sense, as language itself is effectively a reasoning DSL.

"""






Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: