So, if someone applies a filter to a video/audio, it is no more "copies" of the ...

Matticus_Rex · on Dec 28, 2023

It's not analogous to a filter, because that's applied to the actual work. The model does not keep the work, so what it does isn't like applying a filter. It's more like being able to reproduce a version of the work from memory and what it learned from that work and others about the techniques involved in crafting it, e.g. art students doing reproductions.

And if OpenAI were selling the reproductions, that would be infringement. But that's not what's happening here. It's selling access to a system that can do countless things.

concordDance · on Dec 28, 2023

> AI still could produce exact or extremely similar results of stuff it learned on.

Can it do so more than a human can?

I think that's the key here. If an AI is no more precise than a human telling you about the news article they read today then ChatGPT learning process probably can't be morally called copying.

octacat · on Dec 28, 2023

So, if someone decompiles a program and compiles it again, it would look different. "It is not copying", we just did some data laundering.

Feeding someone else data into your system is usually a violation of copyright. Even if you have a very "smart" system, trying to transform and obfuscate the original data.

Matticus_Rex · on Dec 28, 2023

> Feeding someone else data into your system is usually a violation of copyright

In some circumstances, yes, but often it's not, especially if you're not continuing to store and use it (which OpenAI isn't).

_rm · on Dec 28, 2023

I'm regularly feeding other people's data into my "system" (brain) in order to produce my outputs.

So I'm a living breathing copyright violator. As a person I should be banned.

Fortunately, copyright is a bullshit fictitious right with no basis in natural law. So I don't lose much sleep over it.

octacat · on Dec 28, 2023

Computers are deterministic. Giving the same inputs training would produce the same model. The comparison with brain is incorrect. You could add noise on input data during the training - it would more of less reproduce the real learning. Still, it could produce less useable models as a result.

The court could ask to show the training dataset.