I'd think any natural language model would have the same biases we see from real...

tsol · on May 3, 2022

Are there really no moderated forums that the data can be taken from? Even HN-based training data would be much more civil

Gigachad · on May 3, 2022

A model trained on HN would spit out a 5 paragraph story about how minorities provide a negative ROI for cities. Or how the homeless need to removed from society.

can16358p · on May 3, 2022

Don't forget that it must also generate, at some point regardless of the topic, a new terminal emulator, and an extremely positive or extremely negative opinion about how blockchain can solve a problem.

IAmEveryone · on May 3, 2022

Sure, but it would never do something actually bad, like raising the possibility that sexual harassment might, sometimes, be an issue, or questioning the value of phrenology.

sanxiyn · on May 3, 2022

Note that HN is included in the training data, see page 20.

IAmEveryone · on May 3, 2022

Go figure (8)!

yosito · on May 3, 2022

I'd think the training data is something that could be curated. Eliminating all bias might be impossible, but GIGO applies.