Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I'd think any natural language model would have the same biases we see from real humans.


Are there really no moderated forums that the data can be taken from? Even HN-based training data would be much more civil


A model trained on HN would spit out a 5 paragraph story about how minorities provide a negative ROI for cities. Or how the homeless need to removed from society.


Don't forget that it must also generate, at some point regardless of the topic, a new terminal emulator, and an extremely positive or extremely negative opinion about how blockchain can solve a problem.


Sure, but it would never do something actually bad, like raising the possibility that sexual harassment might, sometimes, be an issue, or questioning the value of phrenology.


Note that HN is included in the training data, see page 20.


Go figure (8)!


I'd think the training data is something that could be curated. Eliminating all bias might be impossible, but GIGO applies.




Consider applying for YC's Winter 2026 batch! Applications are open till Nov 10

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: