Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I'm reading it to say, having working knowledge of intra-personal structures in a way that is contingent on historical context. These would be social, economic, religious, family, political, patterns of relation that groups of people exist in.

The assumption GP is making is that the incentives, values, and biases impressed upon folks providing RL training data may systematically favor responses along a certain vector that is the sum of these influences in a way that doesn't cancel out because the sample isn't representative. The economic dimension for example is particularly difficult to unbias because the sample creates the dataset as an integral part of their job. The converse would be collecting RL training data from people outside of the context of work.

While that it may not be feasible or even possible to counter, that difficulty or impossibility doesn't resolve the issue of bias.



Consider applying for YC's Winter 2026 batch! Applications are open till Nov 10

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: