Fair use is something Wikipedians dance around a fair amount. It also meant I did a lot of reading about it.
It’s a four part test. Let’s examine it thusly:
1. Transformative. Is it? It spits out informative text and opinion. The only “transformation” is that its generative text. IMO that’s a fail.
2. Nature of the work - it’s being used commercially. Given it’s being trained partially on editorial, that’s creative enough that I think any judge would find it problematic. Fail on this criteria.
3. Amount. It looks like they trained the model on all of the NYT articles. Oops, definite fail.
4. Effect on the market. Almost certainly negative for the NYT.
You're getting mixed up. When applying the four factors, you need to individually separate all the uses. So you would need to repeat the fair use test for every alleged type of infringement. This means that the scraping from the public internet to OpenAI's dataset storage cluster is one instance where the full analysis of the 4 must take place, then the training itself, so another full analysis, then the distribution of model outputs, another one, etc.
Why so? From the point of view of the company alleging damage the separation of processes is irrelevant. It all leads to massive copyright infringement.
It is not the NYT making the claim of Fair Use, it is OpenAI.
Because fair use is an affirmative defense to each claim, not to the general accusation. So if someone sues you for "copyright infringment", broadly speaking, but then you look at the actual document and it's 4 claims based on 4 sections of U.S.C Title 17, you can raise a fair use defense to two of them and a different one to the other two, or none at all for those last ones and simply settle them, while still defending the first two.
Sure, but my original point still stands. OpenAI has a much better chance with how fair use actually works than with how you described it in your original comment.
It’s a four part test. Let’s examine it thusly:
1. Transformative. Is it? It spits out informative text and opinion. The only “transformation” is that its generative text. IMO that’s a fail.
2. Nature of the work - it’s being used commercially. Given it’s being trained partially on editorial, that’s creative enough that I think any judge would find it problematic. Fail on this criteria.
3. Amount. It looks like they trained the model on all of the NYT articles. Oops, definite fail.
4. Effect on the market. Almost certainly negative for the NYT.
IMO, OpenAI cannot successfully claim fair use.