Copyright infringement is not avoided by changing some text so it isn’t an exact...

cycrutchfield · on Dec 28, 2023

And you think that it would be impossible to train a model to avoid outputs that are substantially similar to training data?

Vegenoid · on Dec 28, 2023

I certainly don't think it's impossible, but I think it is hard problem that won't be solved in the immediate future, and creators of data used for training are right to seek to stop wide availability of LLMs that regurgitate information they worked hard to obtain.

cycrutchfield · on Dec 28, 2023

I think it will be a bit easier than you believe. The reason why it hasn’t been done yet is that there hasn’t been a compelling economic reason to do so.