My hard drive can - bit for bit - recall video files. If I serve them to other p...

jrockway · on Dec 28, 2023

Yeah, but the LLMs can't. They aren't big enough to contain every byte of every NYT article, even with the best-known compression algorithms. Rather, they pick up and remember the same patterns that humans do when they write. Authors of the articles also did that, and so the two algorithms (human writer, LLM inference) end up with the same result. (That doesn't preclude large chunks of text that are actually remembered, though. We humans have large chunks of verbatim text floating around in our brains. Passwords, phone numbers, "I pledge allegiance to the flag...", etc.)

Anyway, like I said, I don't think OpenAI will win this. Someone will produce one verbatim article and the court will make OpenAI pay a bunch of money as though every article could be reproduced verbatim, and AI in the US will be set back that many billion dollars. It probably doesn't matter in the long run; it preserves the status quo for as long as the judge is judging and the newspaper exec is newspaper exec-ing. That's all they need. The next generation will have to figure out how to deal with AI-induced job loss... and climate change. Have fun, next generation!

ninjinxo · on Dec 28, 2023

But is it still piracy if you compress them and serve only a likeness of the original?

hsbauauvhabzb · on Dec 28, 2023

If 20% of a NYT article is recalled correctly, does that mean I can publish 20% of a movie if surrounded by junk? What if I do that 5 times over?

madeofpalk · on Dec 28, 2023