The NYT is preparing for a tsunami by building a sandcastle. Big picture, this s...

mat0 · on Dec 28, 2023

I'm sorry but this is such a bad take. Nice appeal to consequences. In my view, the New York Times is entirely justified in pursuing legal action. They invested time and effort in creating content, only to have it used without permission for monetary gain. A clear violation.

Analyzing the factors involved for a "fair use" consideration:

Purpose and Character of the Use: While the argument for transformation might hold in the future as you point out, the current dispute revolves around verbatim use. So clearly not transformative. Also commercial use is more difficult to be ruled fair use.

Nature of the Copyrighted Work: Using works that are more factual may be more likely to be considered fair use, but I would argue that NYT articles are as creative as factual.

Amount and Substantiality of the Portion Used: In this case, the entirety of the articles was used, leaving no room for a claim of using an insignificant portion.

Effect on the Market Value: NYT isn't getting any money from this, and it's clearly not helping their market value if people are checking on ChatGPT instead of reading a NYT article.

IANAL, but in my opinion NYT is well within its rights to pursue legal action. Progress is inevitable, but as humans, we must actively shape and guide it. Otherwise it cannot be called progress. In this context, legal action serves as a necessary means for individuals and organizations to assert their rights and influence its course.

tbcj · on Dec 28, 2023

I don’t think the original point being made was that NYT wasn’t justified in bringing the action. The point that was being made was the suit would be ultimately meaningless in the long term even if it was successful in the short term. There is a potentially more significant risk in the future that this suit will not protect against because of the reasons enumerated by the author. While the author is speculating, the law struggles with technology and adapting to change, which makes their prediction useful because it does highlight the problems that are coming that can’t be readily mitigated through legal precedent.

paulddraper · on Dec 28, 2023

Correct, a_wild_dandan argues that the outcome of this suit makes no pragmatic difference.

NotMichaelBay · on Dec 28, 2023

> it's clearly not helping their market value if people are checking on ChatGPT instead of reading a NYT article.

People are not using ChatGPT as a replacement for current news, and because of hallucinations, no one should be using it for past news either. I wouldn't remotely call ChatGPT a competitor of NYT traffic, like I would Reuters or other news outlets.

jprete · on Dec 28, 2023

The intended result is clearly to supplant other information sources in favor of people getting their information from ChatGPT. Why should it matter to legality that the tech isn't good enough for the goal?

stale2002 · on Dec 28, 2023

> T. Why should it matter to legality that the tech isn't good enough for the goal?

Because if it is not good enough, then it is not a market substitute.

The laws cares if it is a market substitute and if there are damages. If it sucks, then there aren't damages, which matters for the 4th factor of fair use.

vbi8iBEX · on Dec 28, 2023

Imo gpt itself is the transformative work.

tantalor · on Dec 28, 2023

Ok but it's not

UrineSqueegee · on Dec 28, 2023

Definition of Transformative Use: The legal concept of transformative use involves significantly altering the original work to create new expressions, meanings, or messages. AI models like GPT don't merely reproduce text; they analyze, interpret, and recombine information to generate unique responses. This process can be argued as creating new meaning or purpose, different from the original works.

In the case of the famous screenshot, the AI just relayed the information it found on the web, it's not included in its training data.

So you're just wrong.

bonzini · on Dec 28, 2023

Nope, it doesn't work that way. The fact that the LLM can regurgitate original articles doesn't remove the possibility that training can be considered transformative work, or more in general that using copyrighted material for training can be considered fair use.

Rather, verbatim reproduction is the proof that copyrighted materials was used. Then the court has to evaluate whether it was fair use. Without verbatim reproduction, the court might just say that there is not enough proof that the Times's work was important for the training, and dismiss the lawsuit right away.

Instead, the jury or court now will almost certainly have to evaluate OpenAI's operation against the four factors.

In fact, I agree with the parent that ingesting text and creating a representation that can critique historical facts using material that came from the Times is transformative. An LLM is not just a set of compressed texts, people have shown for example that some neurons fire when you are talking of specific historical periods or locations on Earth.

However, I don't think that the trasformative character is enough to override the other factors, and therefore in the end it won't/shouldn't be considered fair use IMHO.

vbi8iBEX · on Dec 28, 2023

What if the LLM is running locally and doing all of these things rather than hosted on a webserver which is serving the content?

bonzini · on Dec 28, 2023

It doesn't matter, if everything else stays the same what matters is what it's used for. If it's used to make money, it would certainly hurt claims of fair use—maybe not for those that do the training, but for those that use it.

vbi8iBEX · on Dec 29, 2023

> If it's used to make money, it would certainly hurt claims of fair use

What if a human manually searches all those articles and transcribes / summarizes them to me in the way ChatGPT did?

bonzini · on Dec 29, 2023

It might also be considered copyright violation, after evaluating the four fair use factors.

tantalor · on Dec 28, 2023

Only humans can do those things, so the test fails for LLM

ciabattabread · on Dec 28, 2023

> rent seeking media companies

Rent seeking? Media companies that actually create content are rent seeking? Versus the garbage hallucinations AI creates?

amadeuspagel · on Dec 28, 2023

Rent seeking is an awful term that was from the beginning intended to describe anyone pursing a political or legal goal that deviates from a pure free market economy. As Econlib writes:

> ”Rent seeking” is one of the most important insights in the last fifty years of economics and, unfortunately, one of the most inappropriately labeled. Gordon Tullock originated the idea in 1967, and Anne Krueger introduced the label in 1974. The idea is simple but powerful. People are said to seek rents when they try to obtain benefits for themselves through the political arena. They typically do so by getting a subsidy for a good they produce or for being in a particular class of people, by getting a tariff on a good they produce, or by getting a special regulation that hampers their competitors. Elderly people, for example, often seek higher Social Security payments; steel producers often seek restrictions on imports of steel; and licensed electricians and doctors often lobby to keep regulations in place that restrict competition from unlicensed electricians or doctors.

https://www.econlib.org/library/Enc/RentSeeking.html

This is linked in the wikipedia article, which is even more confused:

https://en.wikipedia.org/wiki/Rent-seeking

bugglebeetle · on Dec 28, 2023

No, it dates back to Adam Smith’s conception of rents derived from land-ownership as a parasitic drag on economies (about which he was entirely correct). This concept was later extended to a whole host of other forms of monopolization, some state-granted and some market-derived. In the case of U.S. copyright, we can look at its original terms (quite limited) and see that its current incarnation is more harmful than beneficial to most people.

stuckinhell · on Dec 28, 2023

The New York Times is dying company that is rent seeking here. Along time ago, their content was valuable, yet now you can't even give it away to researchers.

I know because they tried to make a deal with my company, we passed because social media data is infinitely more valuable.

ruune · on Dec 28, 2023

You don't want to seriously tell me that garbage on Twitter in 240 characters is more useful to me than actual journalism, do you?

Maybe their data isn't as valuable to eg. advertisers than the data their audience actually shouted into the internet themselves (guess what), but the thing they've been actually selling for a long time now, journalism, can't be dying that fast considering we're both on this website that in big parts consists of discussing journalism.

ciabattabread · on Dec 28, 2023

Because its usefulness to your private jet fund is the only measurement of value.

gpvos · on Dec 28, 2023

To me, your comment only reinforces the point that NYT's content is actually valuable, rather than valuable to rent seekers. But maybe you can give a bit more detail.

nozzlegear · on Dec 28, 2023

> 2. Research/hosting/progress will proceed. The US cannot stop this, only choose to be left behind. The world will move on, with China gleefully watching as their biggest rival commits intellectual suicide all to appease rent seeking media companies.

Sorry, is this the same China that has already introduced their own sweeping regulations on AI? Which in at least one instance forced a Chinese startup to shut down their newly launched chatbot because it said things that didn't align with the party's official stance on the war in Ukraine?

https://finance.yahoo.com/news/beijing-tries-regulate-china-...

https://nitter.unixfox.eu/CDT/status/1625936306814717952?337...

I don't disagree that research/hosting/progress will continue, but I'm not so sure that it's China who stands to benefit from the US adding some guardrails to this rollercoaster.

truculent · on Dec 28, 2023

Are media really rent-seeking? They create new content and analysis, for which they want to be compensated. It seems quite different to hoarding natural resources or land, for example.

glerk · on Dec 28, 2023

> It seems quite different to hoarding natural resources or land

Indeed, it is quite different, because those things are scarce physical things in the real world. Intellectual property is a scam, and killing it once and for all will be one of the best things to come out of the current AI hype cycle. Nobody will "own" ideas, pieces of information, or strings of bytes.

hypercube33 · on Dec 28, 2023

Interesting. So as a hobby photographer I should only publicly release physical prints? An interesting idea.

Vicinity9635 · on Dec 28, 2023

Rule 1 of the Internet: If you put it on the Internet, it's not yours anymore.

You don't have to agree with it. You don't have to like it. But if you accept it and live by it, it's much harder to get burned.

krapp · on Dec 28, 2023

Rule 1 of the internet is "don't talk about /b/."

DalasNoin · on Dec 28, 2023

About your 1. point: you can't possibly know that future models will be trained exclusively on synthetic data without any hit to performance. It is also not easy to reword the entire copyrighted training corpus without introducing errors or hallucinations. And you assume that this is just a fact?

Your second point reminds me a bit of 'War with the Newts' where humanity arms a race of sentient salamanders until they overthrow humanity. How could we not arm our newts if Germany might be arming theirs?

I also think basically everything else you wrote is wrong.

notahacker · on Dec 28, 2023

If Microsoft doesn't get royalty free rights to resell access to everyone's content on demand, China will become the powerhouse of interference-free media? Rrrrrright....

yieldcrv · on Dec 28, 2023

I think it can be simultaneously true that NYT is accurate in their complaint, while having no legal remedy for this and that there shouldn’t be.

There are plenty of large companies in other sectors that acknowledge there are limited legal remedies for them if someone copies some aspect of their business or name.

maxlin · on Dec 28, 2023

This is the actual truth. What it sucks for is for citing the data, but GPT-4 doesn't do that to start with unless it's directly from a web result and not the weights.

bonzini · on Dec 28, 2023

> GPT-4V can easily whitewash its entire copyrighted training corpus to be unrecognizably distinct

Is that just by increasing the temperature, tweaking the prompt, etc.? If you can operate on the raw weights and recreate the original text, copyright infringement still applies.