I've been arguing since ChatGPT came out that LLMs should fall under fair use as...

mynegation · on Dec 28, 2023

Suit claims that GPT reproduced passages from NYT almost verbatim.

lodovic · on Dec 28, 2023

I'm sure the NYT uses dictionaries, encyclopaedias and style books verbatim as well. And they don't invent the facts they write about. As journalists they are compiling and passing along other knowledge. You usually don't get a piece of their income when a journalist quotes you verbatim (people usually don't get paid for interviews).

madeofpalk · on Dec 28, 2023

If the NYT reproduces other content verbatim too much, it will get in trouble.

threeseed · on Dec 28, 2023

NYT doesn't reproduce the contents of the dictionary or encyclopaedia.

And even if they did it will be fine because those sources allow for it.

The point is that OpenAI never asked NYT for permission to use their data.

dahart · on Dec 28, 2023

I don’t doubt it does. It’s easy to get it to spit out long answers from Stack Overflow verbatim, I’ve done it. Maybe some of the “transformative” nature of the LLM output is the removal of any authorship, copyright, license, and edit history information. ;) The point here is to supplant Google as the portal of information, right? It doesn’t have new information, but it’s pretty good at remixing the words from multiple sources, when it has multiple sources. One possible reason for their legal woes wrt copyright is that it’s also great at memorizing things that only have one source. My college Markov-chain text predictor would do the same thing and easily get stuck in local regions if it couldn’t match something else.

__loam · on Dec 28, 2023

I don't think these can replace search engines.

rvz · on Dec 28, 2023

Precisely.

This tired 'fair use' excuses from AI bros whilst the GPT has reproduced the article text verbatim, word for word and it being monetized without the permission from the copyright holder and source (NYT) is an obvious copyright violation 101. Full stop.

Again, just like Getty v. Stability, this copyright lawsuit will end in a licensing deal. Apple played it smart with their choice with licensing deals to train their GPT [0]. But this time, OpenAI knew they could get a license to train on NYT articles but chose not to.

[0] https://9to5mac.com/2023/12/22/apple-wants-to-train-its-ai-w...

throwup238 · on Dec 28, 2023

The four factors considered in a fair use test:

    the purpose and character of the use
    the nature of the copyrighted work
    the amount and substantiality of the portion taken
    the effect of the use upon the potential market.

Literally every single one of these factors has very complicated precedent and each one is an open question when it comes to AI. Since fair use is a balancing test this could go any way.

Stability took the easy way out because they didn't have billions of dollars to play around with and Microsoft to back them. Let's see what OpenAI does but calling everyone who disagrees with your naive interpretation of fair use "AI bros" is doing everyone a disservice.

chatmasta · on Dec 28, 2023

> AI bros

What (or whom) do you consider to be an "AI bro?"

This sort of ad hominem generalization usually accompanies a weak argument.

vidarh · on Dec 28, 2023

I generally tend to downvote comments that use "x bros" for pretty much any x on sight for that reason. It's exceedingly rare for such a comment to be much more than a thinly veiled insult with little substance. Sometimes I might even agree with the insult, but it's still rarely appropriate here.

beau_g · on Dec 28, 2023

Young males that wear Tensorflow branded muscle tank tops and drive Mitsubishi Eclipse convertibles with the vanity plate OVERFIT. They are everywhere these days.

danielbln · on Dec 28, 2023

https://i.imgur.com/4tF7q8M.jpg

vidarh · on Dec 28, 2023

The text generation is getting quite decent. The limbs disappearing into the car are somewhat less impressive.

jakderrida · on Dec 28, 2023

Thank you for the absurd visual. The vanity plate, especially, was worth saving for last. Somehow, the car is well suited, also. Love how they prefer Tensorflow over Pytorch, too.

irq · on Dec 28, 2023

Not saying I agree with this labeling, but it means approximately the same thing as “crypto bro”, but for AI

satvikpendem · on Dec 28, 2023

It seems to be used by people who've previously used the term "tech bro."

dahart · on Dec 28, 2023

This seems like a reasonable opinion when you think about the training data size and imagine that any given output is some kind of interpolation of some unknown large number of training examples all from different people. If it’s borrowing snippets from tens or hundreds or thousands of sources, then who’s copyrights are being violated? Remixing in music seems to be withstanding some amount of legal scrutiny, as long as the remix is borrowing from multiple sources and the music is clearly different and original.

It gets harder to stand behind a blanket claim that LLMs or any AI we’ve got falls under fair use when they keep repeatedly reproducing complete and identifiable individual works and clearly violating copyright laws in specific instances. The models might be remixing and/or transformative most of the time, but we have proof that they don’t do that every time nor all the time… yet. Maybe the lawsuits will be the impetus we need to fix the AIs so they don’t reproduce specific works, and thus make the fair use claim solid and actually defensible?

ramesh31 · on Dec 28, 2023

It's inevitable that this question ends up at the supreme court. And the sooner the better IMO. It's clearly fair use. Generative agents will be seen legally as no different than a human artist leveraging the summation of their influences to create a new work.

simion314 · on Dec 28, 2023

>It's inevitable that this question ends up at the supreme court. And the sooner the better IMO. It's clearly fair use. Generative agents will be seen legally as no different than a human artist leveraging the summation of their influences to create a new work.

Why do you think the architecture is important? If I have a computer program and it outputs the an entire copyrighted poem then the answer to "is this copyright violation" SHOULD NOT depends on the architecture of the program.

agentgumshoe · on Dec 28, 2023

Clearly fair use? What if I pay ChatGPT to give me the NYT article it sourced verbatim as stored (i.e. without referring me to the NYT source)?

MacsHeadroom · on Dec 28, 2023

It's not stored in ChatGPT actually, unlike Google's web search cache where it is stored verbatim, can be recalled perfectly, and is still fair use.

Fair use has nothing to do with reproducibility. LLMs are more clearly fair use than a search engine cache and those court cases are long settled. There's no world in which OpenAI doesn't win this entire thing.

agentgumshoe · on Dec 28, 2023

What if I ask ChatGPT to print the article verbatim as sourced, from its own dataset?

cjbprime · on Dec 28, 2023

It doesn't have database access to its own training dataset; it only has access to the weights it lossily-compressed that training dataset into.

neop1x · on Dec 29, 2023

Paywalled content as well?