Hacker News new | past | comments | ask | show | jobs | submit login

I agree that these benchmarks don’t mean as much anymore because it’s highly likely they were already present in the training set, but also believe it’s likely these tools will be significantly better in a few research cycles



A significant number of bugs just end in 'stupid mistake I didn't notice' or 'weird behaviour with a fix described on SO/docs/forum post'. Current day LLMs are much better positioned to solve these issues than humans are.




Consider applying for YC's Summer 2025 batch! Applications are open till May 13

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: