>But if you simply ask for a summary of a specific article by, say, just name and date, and you get a copy of it, it's clear that GPT is storing the original data in some way, and thus it has copied the NYT's protected works without permission.
In this particular case they were using it via Bing, which actively did a HTTP request to the particular article to extract the content. So GPT hadn't memorised it verbatim, instead it fetched it, much like a human using a search engine would.
The article states that they used it initially through ChatGPT, but that seems to have been fixed in the meantime, at least for the very simplistic queries that used to work ("the first paragraph of the Carl Zimmer article on old DNA" in ChatGPT used to return the exact data from NYT, and "next paragraph" could then be used to get the following ones). Even if this has been fixed, it still proves that ChatGPT encodes exact copies of NYT articles in its weights, which may be a violation in itself, even if it is prevented from returning them directly. Especially if they ever started distributing the trained model.
Additionally, even the use through Copilot is very debatable. They are not returning the NYT link, which requires a subscription, they are returning the contents of it even to non-subscribers. And they are doing this in a commercial product, not a non profit like the Internet Archive, which has some arguments for fair use.
If it had exact copies they would have showed it could recall the 8th paragraph or something. Even google and the nyt release the first paragraph for free.
In this particular case they were using it via Bing, which actively did a HTTP request to the particular article to extract the content. So GPT hadn't memorised it verbatim, instead it fetched it, much like a human using a search engine would.