Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I wonder if you could do this with multiple alignment training passes, where you extract the refusal direction each time, and suppress it in future training passes.


Consider applying for YC's Winter 2026 batch! Applications are open till Nov 10

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: