Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I'm of a similar mind. I take the more expansive view that everything created is part of our common property and that something like an LLM should be able to yield the summary and references to those creations. As I've said elsewhere, LLM systems might be our first practical example of an infinite number of monkeys typing and recreating Shakespeare (or the New York Times).

I understand that copyrights and patents are vehicles for ensuring a creator gets paid for their work, but they are flawed in not rewarding multiple parallel creations and that they last too long.



a LLM is just a hugely lossy-compressed version of its training data, an abstraction of it.

Much in the same way as when you read a book, your brain doesn't become a pirated copy of the text as you only store a hugely compressed version of it afterwards, a feeling for the plot, generated images and so on.


That's what I thought from my various readings about LLM systems. I'm guessing that the kerfuffle from the New York Times and other shortsighted organizations is that copyright allows them to control how their content is used. With humans, it's simple as its read and misremembered. Using it for LLM training requires a different model. It probably should be a RAND fee system based on volume of training data because, as you say, the training data is converted into an abstract form.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: