Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

One interesting takeaway for me, a non-practitioner, was that the models appears to be fairy decent at judging their own output.

They used best-of-32 and used the same model to judge a "tournament" to find the best answer. Seems like something that could be boltet on reasonably easy, eg in say WebUI.

edit: forgot to add that I'm curious if this translates to smaller models as well, or if it requires these huge models.



Consider applying for YC's Winter 2026 batch! Applications are open till Nov 10

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: