Fair use is something Wikipedians dance around a fair amount. It also meant I di...

EMIRELADERO · on Dec 28, 2023

You're getting mixed up. When applying the four factors, you need to individually separate all the uses. So you would need to repeat the fair use test for every alleged type of infringement. This means that the scraping from the public internet to OpenAI's dataset storage cluster is one instance where the full analysis of the 4 must take place, then the training itself, so another full analysis, then the distribution of model outputs, another one, etc.

chris_wot · on Dec 30, 2023

Why so? From the point of view of the company alleging damage the separation of processes is irrelevant. It all leads to massive copyright infringement.

It is not the NYT making the claim of Fair Use, it is OpenAI.

EMIRELADERO · on Dec 30, 2023

Because fair use is an affirmative defense to each claim, not to the general accusation. So if someone sues you for "copyright infringment", broadly speaking, but then you look at the actual document and it's 4 claims based on 4 sections of U.S.C Title 17, you can raise a fair use defense to two of them and a different one to the other two, or none at all for those last ones and simply settle them, while still defending the first two.

chris_wot · on Dec 31, 2023

An affirmative defence shifts the burden to the defendant. Its not going to end well for Open AI.

EMIRELADERO · on Jan 1, 2024

Sure, but my original point still stands. OpenAI has a much better chance with how fair use actually works than with how you described it in your original comment.