It’s probably outgoing bandwidth costs.

marginalia_nu · on Jan 21, 2024

Given it's mostly text, it's curious there's no content-encoding applied to HTTP responses. Would reduce bandwidth by something like 70-90% in most cases.

Though it's hard to say what's the best configuration without understanding the hardware context.

bombcar · on Jan 21, 2024

Above it’s noted at being north of 300gb, most cheap cloud providers will give you quite a bit of bandwidth for a 300gb server.

marginalia_nu · on Jan 21, 2024

Might be a point to self-host something like this. The value of the project is its extreme longevity, and the half-life of external hosting services is probably something like 5 years.

mmcdermott · on Jan 21, 2024

It probably gets mirrored a lit too, given the target audience.

ghaff · on Jan 21, 2024

As I recall, Jason asks people not to do wholesale site copies in general, but I imagine a great number do.

0xEF · on Jan 21, 2024

You imagine correctly. I cruise open directories regularly, usually looking for books to add to my growing collection (note: books I will actually read/use), generally taking only 2 or 3 books with me when I leave the site. At least once a month, I head to an OD I like to grab a new book or two only to find the host shut the gates because it got posted on r/opendirectories and a bunch of people did pointless site rips, as though they'll read 40k pdf files or whatever. I often wonder if they do it because data caches like that are small currency on other parts of the Internet, sort of the way you had to maintain an ul/dl ratio to stay on a well-maintained BBS back in the day. Who knows.

They're certainly not rehosting the libraries in any useful way, I can say that much.

ghaff · on Jan 21, 2024

I think a lot of it is just that many people have a pretty deeply ingrained data hoarding impulse where collecting files--any files--for themselves is the end goal.

crazygringo · on Jan 21, 2024

I don't know if it's always hoarding. A lot of stuff on the internet simply disappears. Especially when it's hosted by a single person, or an org that might not be around in 3 years, or they get takedowns (whether legal or not).

If you find something you'd like to keep accessing, downloading it all in advance can be a smart move.

ghaff · on Jan 21, 2024

So download specific items as the parent says (and as you suggest). But reflexive "download it all just in case" probably isn't helpful especially if it's just for you.

I'm not disagreeing with downloading stuff that you want, even maybe site sub-sections. But a lot of the time it turns into just doing a mass download.