Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Each question falls into a different category (ie math, coding, story writing etc). Typically models are better at some categories and worse at others. Saying "56% of people preferred responses from o3-mini" makes me wonder if those 56 are only from certain categories and the model isn't uniformly 56% preferred.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: