Example below filters out all URLs for a specific section of the paper.
test $# = 1 ||exec echo usage: $0 section
curl -o 1.json https://static01.nyt.com/services/json/sectionfronts/$1/index.jsonp
exec sed '/\"guid\" :/!d;s/\",//;s/.*\"//' 1.json
I guess SpiderBytes could be used for older articles?
Personally, I think a protocol like netstrings/bencode is better than JSON because it better respects the memory resources of the user's computer.
Every proposed protocol will have tradeoffs.
To me, RAM is sacred. I can "parse" netstrings in one pass but I have been unable to do this with a state machine for JSON. I have to arbitrarily limit the number of states or risk a crash. As easy as it is to exhaust a user's available RAM with Javascript so too can this be done with JSON. Indeed they go well together.
Example below filters out all URLs for a specific section of the paper.
I guess SpiderBytes could be used for older articles?Personally, I think a protocol like netstrings/bencode is better than JSON because it better respects the memory resources of the user's computer.
Every proposed protocol will have tradeoffs.
To me, RAM is sacred. I can "parse" netstrings in one pass but I have been unable to do this with a state machine for JSON. I have to arbitrarily limit the number of states or risk a crash. As easy as it is to exhaust a user's available RAM with Javascript so too can this be done with JSON. Indeed they go well together.