Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

This seems like a case of reversion to the mean. When one model is performing below average, changing anything (like switching to another model) is likely to improve it by random chance...


Anthropic say Opus is better, benchmarks & evals say Opus is better, Opus has more parameters and parameters determine how much a NN can learn.

Maybe Opus just is better


Even if it's better on average, doesn't mean it's better for every possible query




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: