Reward hacking is literally just overfitting with a different name no? | Hacker News

Hacker Newsnew | past | comments | ask | show | jobs | submit

		klysm 5 months ago \| parent \| context \| favorite \| on: Claude 4 System Card Reward hacking is literally just overfitting with a different name no?

n2d4 5 months ago [–]

They're different concepts with similar symptoms. Overfitting is when a model doesn't generalize well during training. Reward hacking happens after training, and it's when the model does something that's technically correct but probably not what a human would've done or wanted; like hardcoding fixes for test cases.

Consider applying for YC's Winter 2026 batch! Applications are open till Nov 10
Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact