On the other hand, NYT website willingly gave out all the information without imposing limitations. Seeing terms of service requires visiting a separate page, they aren't seen immediately upon visiting the website. Understanding and accepting the terms also requires a human interaction.
robots.txt on nytimes.com now disallows indexing by GPTBot, so there's an argument against automated information acquisition starting from some moment, but before some moment they weren't explicitly against that.
> Seems weird to argue that you have to speak up if you don’t want something done to you or else you consent to everything.
If you don't want people to get at your land, setting up even a small fence creates an explicit indication of limitations. Just like the record in robots.txt I mentioned earlier.
New York Times also doesn't limit article text content if you just request HTML, which is typical for automated cases. But they impose th limits imposed on users viewing the pages in browser with Javascript, CSS and everything else. So they clearly:
1. Have a way to determine the user's eligibility for reading the full article on server side.
2. Don't limit the content for typical automated cases on server side.
3. Have a way to track the activity of not logged in users, determining the eligibility for access. So it's reasonable to assume that they had records of repeated access from the same origin, but didn't impose any limitations before some time.
So there are enough reasons to think that robots are welcome to read the articles fully. I'm not talking about copyright violations here, only about the ability to receive the data.