Thanks for your comment.
I realized now that achive.org's "archive string" (here 20220719195142if_) is updated
automatically. So if I use this string + some other URL,
then I get redirected to a current snapshot of that other site,
e.g.
I suppose the string consists of date + time in hhmmss format + if_?
Anyhow, looks like arbitrary strings (e.g. 19991230225818if_)
also get redirected to the next existing snapshot counting
from that string. This is really nice and simple for text browser scripts.
Is there some straightforward way to list all of archive.org's snapshots
(of a particular site) without a javascript-enabled browser?
FWIW, below is a quick and dirty script I use for a variety of purposes, such as accessing www search result URLs so I do not have to (a) use sites that do not support TLS1.3, (b) use sites that require SNI or (c) use DNS. I will call this script "www".
NB. I used curl here because this is an example for HN. That does not mean I am a curl user.
I also have a small script I use for the Common Crawl archives. They also use CDX but the results are WARC files compressed with gzip. I wrote a small program in C to extract the gzip'd results after HTTP/1.1 pipelining. For retrieving results without pipelining (i.e., many TCP connections), I modified tnftp to accept a Range header.
I don't know how I didn't think that Wayback Machine might maybe also have an API. :/ Also, lots of interesting stuff for things like the above on Common Crawl: https://commoncrawl.org/the-data/examples/
I guess my text-only browsing just got a bunch of extra batteries (thus far simply w3m + a few wget-etc scripts).
The number is a timestamp, and the if_ just hides the toolbar, it's optional (It presumably stands for IFrame, since that's what it's used for (rewriting iframe src attributes so they don't show the toolbar))
https://web.archive.org/web/20220719195142if_/https://diziet...
FWIW, I use a non-corporate browser.