Seems reasonable - they probably broke the TOS of the site

yjftsjthsd-h · on Dec 28, 2023

Did OpenAI agree to those ToS? If not, I think (IANAL) LinkedIn was kind enough to give precedent that it's irrelevant.

( https://en.wikipedia.org/wiki/HiQ_Labs_v._LinkedIn )

KETHERCORTEX · on Dec 28, 2023

On the other hand, NYT website willingly gave out all the information without imposing limitations. Seeing terms of service requires visiting a separate page, they aren't seen immediately upon visiting the website. Understanding and accepting the terms also requires a human interaction.

robots.txt on nytimes.com now disallows indexing by GPTBot, so there's an argument against automated information acquisition starting from some moment, but before some moment they weren't explicitly against that.

arrrg · on Dec 28, 2023

Seems weird to argue that you have to speak up if you don’t want something done to you or else you consent to everything.

I do think that’s the case for some things but especially for new things that doesn’t seem like a common sense understanding of the world.

KETHERCORTEX · on Dec 28, 2023

> Seems weird to argue that you have to speak up if you don’t want something done to you or else you consent to everything.

If you don't want people to get at your land, setting up even a small fence creates an explicit indication of limitations. Just like the record in robots.txt I mentioned earlier.

New York Times also doesn't limit article text content if you just request HTML, which is typical for automated cases. But they impose th limits imposed on users viewing the pages in browser with Javascript, CSS and everything else. So they clearly:

1. Have a way to determine the user's eligibility for reading the full article on server side.

2. Don't limit the content for typical automated cases on server side.

3. Have a way to track the activity of not logged in users, determining the eligibility for access. So it's reasonable to assume that they had records of repeated access from the same origin, but didn't impose any limitations before some time.

So there are enough reasons to think that robots are welcome to read the articles fully. I'm not talking about copyright violations here, only about the ability to receive the data.

thallium205 · on Dec 28, 2023

What if they OCR’d the newspapers? No ToS there.

steve1977 · on Dec 28, 2023

I’m pretty sure there is still a copyright also for the physical newspaper.

pyuser583 · on Dec 28, 2023

For the paper or the author? What exactly was the licensing agreement for Op-Ed authors in 1962?

bloppe · on Dec 28, 2023

Read the article. It's not difficult to get ChatGPT to regurgitate recent, obviously copyrighted articles, verbatim.

thallium205 · on Dec 28, 2023

It will be equally easy for ChatGPT to rewrite copyrighted content that makes the output materially different for a copyright claim to succeed also.

bloppe · on Dec 28, 2023

Then ChatGPT should do that.

product-render · on Dec 28, 2023

It's at least partially a copyright claim, isn't it? So the method -- OCR or scraping -- doesn't matter, I think.