Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

How does running it multiple times performs?

LLMs are non-deterministic, I think benchmarks should be more about averages of N runs, rather than single shot experiments.



Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: