Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Reward hacking is literally just overfitting with a different name no?


They're different concepts with similar symptoms. Overfitting is when a model doesn't generalize well during training. Reward hacking happens after training, and it's when the model does something that's technically correct but probably not what a human would've done or wanted; like hardcoding fixes for test cases.




Consider applying for YC's Winter 2026 batch! Applications are open till Nov 10

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: