I agree that these benchmarks don’t mean as much anymore because it’s highly lik... | Hacker News

Hacker News new | past | comments | ask | show | jobs | submit

login

jerpint on March 16, 2024 | parent | context | favorite | on: AutoDev: Automated AI-driven development by Micros...

I agree that these benchmarks don’t mean as much anymore because it’s highly likely they were already present in the training set, but also believe it’s likely these tools will be significantly better in a few research cycles

torginus on March 16, 2024 [–]

A significant number of bugs just end in 'stupid mistake I didn't notice' or 'weird behaviour with a fix described on SO/docs/forum post'. Current day LLMs are much better positioned to solve these issues than humans are.

Consider applying for YC's Summer 2025 batch! Applications are open till May 13
Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact