Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I'm not asking for actual examples, but what kind of thing is in your internal reasoning benchmark?


Things like “summarize this text in exactly 14 words”, programming questions, unstructured data to structured data transformations and so on…


Do you let it use CoT? I think that first one is pretty hard if you have to produce it directly one token at a time, but I guess that's kind of the point.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: