Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

There is another fix, but it will have to wait for GPT-5. They could reword articles, summarize in different words and analyze their contents, creating sufficiently different variants. The ideas would be kept, but original expression stripped. Then train GPT5 on this data. The model can't possibly regurgitate copyrighted content if they never saw it during training.

This can be further coupled with search - use GPT to look at multiple sources at once, and report. It's what humans do as well, we read the same news in different sources to get a more balanced take. Maybe they have contradictions, maybe they have inaccuracies, biases. We could keep that analysis for training models. This would also improve the training set.



Consider applying for YC's Winter 2026 batch! Applications are open till Nov 10

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: