Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

ChatGPT is GPT-3.5 plus some human feedback reinforcement learning[1] to steer it away from the things that tanked Tay. Meaning they had a bunch humans test the thing out, rate responses, and incorporate that into future training.

    [1]: https://en.wikipedia.org/wiki/ChatGPT#Training


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: