Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

> The issue is likely that left-wing people are attracted to professions whose primary output is words rather than things. Actors, journalists and academics are all quite left-biased professions whose output consists entirely of words, and so the things they say will be over-represented in the training set.

Yet the 'base' models which aren't chat fine tuned seem to exhibit this far less strongly, -- though their different behavior makes an apples to apples comparison difficult.

The effect may be because the instruct fine tuning radically reduces the output diversity, thus greatly amplifying an existing small bias, but even if it is just that it shows how fine tuning can be problematic.

I have maybe a little doubt on your hopes for synthetic correction-- seems you're suggesting a positive feedback mechanism which tend to increase bias and I think would here if we assume that the bias is pervasive. E.g. that it won't just produce biased outputs but it will also judge its own biased outputs more favorably than it should.



Well, RLHF is nothing but synthetic correction in a sense. And modern models are trained on inputs that are heavily AI curated or generated. So there's no theoretical issue with it. ML training on its own outputs definitely can lead to runaway collapse if done naively, but the more careful ways it's being done now work fine.

I suspect in the era when base models were made available there was much more explicit bias being introduced via post-training. Modern models are a lot saner when given trolly questions than they were a few years ago, and the internet hasn't changed much, so that must be due to adjustments made to the RLHF. Probably the absurdity of the results caused a bit of a reality check inside the training teams. The rapid expansion of AI labs would have introduced a more diverse workforce too.

I doubt the bias can be removed entirely, but there's surely a lot of low hanging fruit there. User feedbacks and conversations have to be treated carefully as OpenAI's recent rollback shows, but in theory it's a source of text that should reflect the average person much better than Reddit comments do. And it's possible that the smartest models can be given an explicit theory of political mind.




Consider applying for YC's Winter 2026 batch! Applications are open till Nov 10

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: