Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Typically in these tests you have three options "A is better", "B is better" or "they're equal/can't decide". So if 56% prefer O3 Mini, it's likely that way less than half prefer O1.also, the way I understand it, they're comparing a mini model with a large one.


If you use ChatGPT, it sometimes gives you two versions of its response, and you have to choose one or the other if you want to continue prompting. Sure, not picking a response might be a third category. But if that's how they were approaching the analysis, they could have put out a more favorable-looking stat.


> If you use ChatGPT, it sometimes gives you two versions

Does no one else hate it when this happens (especially when on a handheld device)?




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: