Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

> RLHF

Reinforcement Learning from Human Feedback

Aren't these systems already trained to score good things higher and bad things worse dictated by human feedback?



personalized RLHF is the keyword




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: