Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Reinforcement learning w/ human feedback. What u guys are describing is the alignment problem


That’s just a supervised fine tuning method to skew outputs favorably. I’m working with it on biologics modeling using laboratory feedback, actually. The underlying inference structure is not changed.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: