The quick summary is that PDFs are automatically downloaded, hosted locally, and links rewritten to the local PDF; other URLs, after a delay, call the CLI version of https://github.com/gildas-lormeau/SingleFile to run headless Chrome to dump a snapshot, which are manually reviewed by myself & improved as necessary, and then links get rewritten to the snapshot HTML. They get some no-crawl HTTP headers and robots.txt exclusions to try to reduce copyright trouble.
I still use archiver-bot etc, they're just not how I do the on-site archives. See https://github.com/gwern/gwern.net/blob/master/build/LinkArc... https://github.com/gwern/gwern.net/blob/master/build/linkArc... for that.
The quick summary is that PDFs are automatically downloaded, hosted locally, and links rewritten to the local PDF; other URLs, after a delay, call the CLI version of https://github.com/gildas-lormeau/SingleFile to run headless Chrome to dump a snapshot, which are manually reviewed by myself & improved as necessary, and then links get rewritten to the snapshot HTML. They get some no-crawl HTTP headers and robots.txt exclusions to try to reduce copyright trouble.