Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

The article states that they used it initially through ChatGPT, but that seems to have been fixed in the meantime, at least for the very simplistic queries that used to work ("the first paragraph of the Carl Zimmer article on old DNA" in ChatGPT used to return the exact data from NYT, and "next paragraph" could then be used to get the following ones). Even if this has been fixed, it still proves that ChatGPT encodes exact copies of NYT articles in its weights, which may be a violation in itself, even if it is prevented from returning them directly. Especially if they ever started distributing the trained model.

Additionally, even the use through Copilot is very debatable. They are not returning the NYT link, which requires a subscription, they are returning the contents of it even to non-subscribers. And they are doing this in a commercial product, not a non profit like the Internet Archive, which has some arguments for fair use.



If it had exact copies they would have showed it could recall the 8th paragraph or something. Even google and the nyt release the first paragraph for free.


They say that asking for "next paragraph" used to return the next paragraph up until the whole article had been returned.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: