Given it's mostly text, it's curious there's no content-encoding applied to HTTP responses. Would reduce bandwidth by something like 70-90% in most cases.
Though it's hard to say what's the best configuration without understanding the hardware context.
Might be a point to self-host something like this. The value of the project is its extreme longevity, and the half-life of external hosting services is probably something like 5 years.
You imagine correctly. I cruise open directories regularly, usually looking for books to add to my growing collection (note: books I will actually read/use), generally taking only 2 or 3 books with me when I leave the site. At least once a month, I head to an OD I like to grab a new book or two only to find the host shut the gates because it got posted on r/opendirectories and a bunch of people did pointless site rips, as though they'll read 40k pdf files or whatever. I often wonder if they do it because data caches like that are small currency on other parts of the Internet, sort of the way you had to maintain an ul/dl ratio to stay on a well-maintained BBS back in the day. Who knows.
They're certainly not rehosting the libraries in any useful way, I can say that much.
I think a lot of it is just that many people have a pretty deeply ingrained data hoarding impulse where collecting files--any files--for themselves is the end goal.
I don't know if it's always hoarding. A lot of stuff on the internet simply disappears. Especially when it's hosted by a single person, or an org that might not be around in 3 years, or they get takedowns (whether legal or not).
If you find something you'd like to keep accessing, downloading it all in advance can be a smart move.
So download specific items as the parent says (and as you suggest). But reflexive "download it all just in case" probably isn't helpful especially if it's just for you.
I'm not disagreeing with downloading stuff that you want, even maybe site sub-sections. But a lot of the time it turns into just doing a mass download.