*> “Thank goodness she did that because [otherwise] we would have no records of ...

Bluestein · on Aug 4, 2024

Archive.org is such a godsend.-

The entire information 'substrate' of society is ephemeral, if digital, and none (at least not enough) seem to have noticed.-

InsideOutSanta · on Aug 4, 2024

I wrote a book in 2010. It had a references section with links to about 100 websites. When I wrote the second edition only about five years later, 50% of those links no longer worked.

What we're doing right now is borderline insane. We're putting all of this information on the web, but almost each individual bit of information is dependent on either a company or a human being keeping it online. It's inevitable that companies change their minds, and humans die, so almost all of the information that is online right now will just disappear in the next 80 years.

And we essentially only have one single entity that tries to retain that information.

walterbell · on Aug 4, 2024

> references section with links to about 100 websites.

Books deserve a github repo with PDF web archives of referenced links, the same way that Wikipedia mirrors the content of cited links.

SirMaster · on Aug 4, 2024

But wouldn't that be a big waste if everyone who references the same thing is then keeping a copy of it.

walterbell · on Aug 4, 2024

> big waste

Storage cost has fallen exponentially for decades, https://ourworldindata.org/data-insights/the-price-of-comput...

JohnFen · on Aug 5, 2024

Redundancy isn't really a waste.

progbits · on Aug 4, 2024

Better many copies than none. References usually mean written text and maybe some figures, cost of storage is going down, we can afford the duplication.

Bluestein · on Aug 4, 2024

> everyone who references the same thing is then keeping a copy of it.

... and it would serve as a form of redundancy, imitating the fungible nature of physical media: In order to cite the latest, copied, manuscript (for example) you needed to own a physical copy. They existence of these has enabled survival of works that would otherwise have been lost, or even reconstruction through ecdotics.-

throwup238 · on Aug 4, 2024

One person’s waste is another person’s resilience.

InsideOutSanta · on Aug 5, 2024

There are all kinds of publisher and legal issues. Trust me, I did my best.

walterbell · on Aug 7, 2024

Could Wikipedia or Archive.org offer references-as-a-service to book publishers for a small fee? They already have the infrastructure and legal cover.

ekianjo · on Aug 4, 2024

certainly not GitHub

walterbell · on Aug 4, 2024

What would you recommend instead?

Intralexical · on Aug 5, 2024

Copyright permitting, big QR code containing the plain text.

throwaway48476 · on Aug 4, 2024

That entity goes out of its way to hide information if you're friends with the owners.

jazzyjackson · on Aug 4, 2024

All the more reason not to rely on it, hard to complain when no one else is willing to do what they do

Bluestein · on Aug 6, 2024

> wrote a book in 2010. It had a references section

Out of curiosity, may I ask what it was about?

Bluestein · on Aug 4, 2024

> so almost all of the information that is online right now will just disappear in the next 80 years.

> And we essentially only have one single entity that tries to retain that information.

Will future ages find ours a dark age, a gap in their records, a void ...

... up until the point - if ever - where a sufficiently advanced solution for permanence is found and comes online?

Intralexical · on Aug 5, 2024

> ... up until the point - if ever - where a sufficiently advanced solution for permanence is found and comes online?

Like the laser printer?

The cost of permanent, physical preservation is pennies. People just don't do it for most things. And it doesn't guarantee accessibility, which has hosting costs.

Bluestein · on Aug 5, 2024

> Like the laser printer?

Sure. Whatever works.-

But I meant one that is systematically and systemically and widely used.-

JohnFen · on Aug 5, 2024

> Will future ages find ours a dark age, a gap in their records, a void

I think this is a very likely future, yes.

Bluestein · on Aug 5, 2024

Grim, indeed.-

makin · on Aug 4, 2024

Two big issues with Archive.org are that 1. it's a single point of failure, they don't encourage mirror sites to emerge, and 2. they keep using the "brand" to fight unwinnable battles like hosting books they don't own online, risking the whole endeavor.

I still appreciate it, but just imagine if it goes down due to a lawsuit. Now that Google no longer shows cached results, an entire historical record would be gone.

stateofinquiry · on Aug 4, 2024

Its surprising that archive.org is the only such outfit I have encountered. Just like we have had libraries since ancient times, why are there so few digital libraries? There must be others, but nowhere near the number (or awareness) that we should have.

Heck, existing paper-based libraries should probably each include a digital archiving department.

Maybe this is already happening or already exists, and is trivial to those studying library science or something. I can hope, anyway.

Intralexical · on Aug 5, 2024

There are lots of web archiving projects out there:

https://en.wikipedia.org/wiki/List_of_Web_archiving_initiati...

But the web is large. And public sector or academic librarian teams tend to be small. The IA's the one that people have coalesced around.

walterbell · on Aug 4, 2024

Excellent question.

Local neighborhood libraries could have their own curated digital archive, as cache for fast local search, and archival backup for long-term resilience.

Intralexical · on Aug 5, 2024

> I still appreciate it, but just imagine if it goes down due to a lawsuit. Now that Google no longer shows cached results, an entire historical record would be gone.

Or somebody accidentally `rm -rf`'s an empty variable. Or The Big One hits San Fran. Or somebody in crisis breaks in with a crowbar, matchbook, and jug of gasoline.

They're a rather old-school shop. Own their own servers, all in one location I think. Bare metal admin stuff, and data's only mirrored across two disks per file IIRC. Keeps costs down. It's what makes the whole operation possible. But I also wonder sometimes.

medstrom · on Aug 4, 2024

https://github.com/internetarchive/dweb-mirror

makin · on Aug 4, 2024

If you check the issues, you'll learn this is not a supported project anymore (and honestly, it hardly worked even back then).

Bluestein · on Aug 4, 2024

> Now that Google no longer shows cached results,

That was also the "end of an era" of sorts right there.-

> they don't encourage mirror sites to emerge,

Something over BitTorrent or blockchain would work well here, methinks. As a baseline substrate.-