Reinforcement learning w/ human feedback. What u guys are describing is the alig... | Hacker News

Hacker Newsnew | past | comments | ask | show | jobs | submit

		meow_mix on March 14, 2023 \| parent \| context \| favorite \| on: GPT-4 Reinforcement learning w/ human feedback. What u guys are describing is the alignment problem

mistymountains on March 14, 2023 [–]

That’s just a supervised fine tuning method to skew outputs favorably. I’m working with it on biologics modeling using laboratory feedback, actually. The underlying inference structure is not changed.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact