I'm of a similar mind. I take the more expansive view that everything created is...

briansm · on Dec 28, 2023

a LLM is just a hugely lossy-compressed version of its training data, an abstraction of it.

Much in the same way as when you read a book, your brain doesn't become a pirated copy of the text as you only store a hugely compressed version of it afterwards, a feeling for the plot, generated images and so on.

rickydroll · on Dec 28, 2023

That's what I thought from my various readings about LLM systems. I'm guessing that the kerfuffle from the New York Times and other shortsighted organizations is that copyright allows them to control how their content is used. With humans, it's simple as its read and misremembered. Using it for LLM training requires a different model. It probably should be a RAND fee system based on volume of training data because, as you say, the training data is converted into an abstract form.