Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

My hard drive can - bit for bit - recall video files. If I serve them to other people on the internet without permission of the copyright holder, that’s called piracy.


Yeah, but the LLMs can't. They aren't big enough to contain every byte of every NYT article, even with the best-known compression algorithms. Rather, they pick up and remember the same patterns that humans do when they write. Authors of the articles also did that, and so the two algorithms (human writer, LLM inference) end up with the same result. (That doesn't preclude large chunks of text that are actually remembered, though. We humans have large chunks of verbatim text floating around in our brains. Passwords, phone numbers, "I pledge allegiance to the flag...", etc.)

Anyway, like I said, I don't think OpenAI will win this. Someone will produce one verbatim article and the court will make OpenAI pay a bunch of money as though every article could be reproduced verbatim, and AI in the US will be set back that many billion dollars. It probably doesn't matter in the long run; it preserves the status quo for as long as the judge is judging and the newspaper exec is newspaper exec-ing. That's all they need. The next generation will have to figure out how to deal with AI-induced job loss... and climate change. Have fun, next generation!


But is it still piracy if you compress them and serve only a likeness of the original?


If 20% of a NYT article is recalled correctly, does that mean I can publish 20% of a movie if surrounded by junk? What if I do that 5 times over?


Yes.




Consider applying for YC's Winter 2026 batch! Applications are open till Nov 10

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: