Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Seems reasonable - they probably broke the TOS of the site


Did OpenAI agree to those ToS? If not, I think (IANAL) LinkedIn was kind enough to give precedent that it's irrelevant.

( https://en.wikipedia.org/wiki/HiQ_Labs_v._LinkedIn )


On the other hand, NYT website willingly gave out all the information without imposing limitations. Seeing terms of service requires visiting a separate page, they aren't seen immediately upon visiting the website. Understanding and accepting the terms also requires a human interaction.

robots.txt on nytimes.com now disallows indexing by GPTBot, so there's an argument against automated information acquisition starting from some moment, but before some moment they weren't explicitly against that.


Seems weird to argue that you have to speak up if you don’t want something done to you or else you consent to everything.

I do think that’s the case for some things but especially for new things that doesn’t seem like a common sense understanding of the world.


> Seems weird to argue that you have to speak up if you don’t want something done to you or else you consent to everything.

If you don't want people to get at your land, setting up even a small fence creates an explicit indication of limitations. Just like the record in robots.txt I mentioned earlier.

New York Times also doesn't limit article text content if you just request HTML, which is typical for automated cases. But they impose th limits imposed on users viewing the pages in browser with Javascript, CSS and everything else. So they clearly:

1. Have a way to determine the user's eligibility for reading the full article on server side.

2. Don't limit the content for typical automated cases on server side.

3. Have a way to track the activity of not logged in users, determining the eligibility for access. So it's reasonable to assume that they had records of repeated access from the same origin, but didn't impose any limitations before some time.

So there are enough reasons to think that robots are welcome to read the articles fully. I'm not talking about copyright violations here, only about the ability to receive the data.


What if they OCR’d the newspapers? No ToS there.


I’m pretty sure there is still a copyright also for the physical newspaper.


For the paper or the author? What exactly was the licensing agreement for Op-Ed authors in 1962?


Read the article. It's not difficult to get ChatGPT to regurgitate recent, obviously copyrighted articles, verbatim.


It will be equally easy for ChatGPT to rewrite copyrighted content that makes the output materially different for a copyright claim to succeed also.


Then ChatGPT should do that.


It's at least partially a copyright claim, isn't it? So the method -- OCR or scraping -- doesn't matter, I think.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: