Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

> “Thank goodness she did that because [otherwise] we would have no records of the early years of the first Women’s Hockey League in Canada,” Azzi said.

A few years ago, Canada digitized many older television shows, https://news.ycombinator.com/item?id=35716982

  With the help of many industry partners, the [Canada Media Fund] CMF team unearthed Canadian gems buried in analog catalogues. Once discovered, we worked to secure permissions and required rights and collaborate with third parties to digitize the works, including an invaluable partnership with Deluxe Canada that covered 40 per cent of the digitization costs. The new, high-quality digital masters were made available to the rights holders and released to the public on the Encore+ YouTube channel in English and French.
In late 2022, the channel deleted the entire Youtube Encore archive of Canadian television, with two weeks notice. A few months later, half of the archive resurfaced on https://archive.org/search?query=creator%3A%22Encore%20%2B%2.... If anyone independently archived the missing Encore videos from Youtube, please mirror them to Archive.org.


Archive.org is such a godsend.-

The entire information 'substrate' of society is ephemeral, if digital, and none (at least not enough) seem to have noticed.-


I wrote a book in 2010. It had a references section with links to about 100 websites. When I wrote the second edition only about five years later, 50% of those links no longer worked.

What we're doing right now is borderline insane. We're putting all of this information on the web, but almost each individual bit of information is dependent on either a company or a human being keeping it online. It's inevitable that companies change their minds, and humans die, so almost all of the information that is online right now will just disappear in the next 80 years.

And we essentially only have one single entity that tries to retain that information.


> references section with links to about 100 websites.

Books deserve a github repo with PDF web archives of referenced links, the same way that Wikipedia mirrors the content of cited links.


But wouldn't that be a big waste if everyone who references the same thing is then keeping a copy of it.


> big waste

Storage cost has fallen exponentially for decades, https://ourworldindata.org/data-insights/the-price-of-comput...


Redundancy isn't really a waste.


Better many copies than none. References usually mean written text and maybe some figures, cost of storage is going down, we can afford the duplication.


> everyone who references the same thing is then keeping a copy of it.

... and it would serve as a form of redundancy, imitating the fungible nature of physical media: In order to cite the latest, copied, manuscript (for example) you needed to own a physical copy. They existence of these has enabled survival of works that would otherwise have been lost, or even reconstruction through ecdotics.-


One person’s waste is another person’s resilience.


There are all kinds of publisher and legal issues. Trust me, I did my best.


Could Wikipedia or Archive.org offer references-as-a-service to book publishers for a small fee? They already have the infrastructure and legal cover.


certainly not GitHub


What would you recommend instead?


Copyright permitting, big QR code containing the plain text.


That entity goes out of its way to hide information if you're friends with the owners.


All the more reason not to rely on it, hard to complain when no one else is willing to do what they do


> wrote a book in 2010. It had a references section

Out of curiosity, may I ask what it was about?


> so almost all of the information that is online right now will just disappear in the next 80 years.

> And we essentially only have one single entity that tries to retain that information.

Will future ages find ours a dark age, a gap in their records, a void ...

... up until the point - if ever - where a sufficiently advanced solution for permanence is found and comes online?


> ... up until the point - if ever - where a sufficiently advanced solution for permanence is found and comes online?

Like the laser printer?

The cost of permanent, physical preservation is pennies. People just don't do it for most things. And it doesn't guarantee accessibility, which has hosting costs.


> Like the laser printer?

Sure. Whatever works.-

But I meant one that is systematically and systemically and widely used.-


> Will future ages find ours a dark age, a gap in their records, a void

I think this is a very likely future, yes.


Grim, indeed.-


Two big issues with Archive.org are that 1. it's a single point of failure, they don't encourage mirror sites to emerge, and 2. they keep using the "brand" to fight unwinnable battles like hosting books they don't own online, risking the whole endeavor.

I still appreciate it, but just imagine if it goes down due to a lawsuit. Now that Google no longer shows cached results, an entire historical record would be gone.


Its surprising that archive.org is the only such outfit I have encountered. Just like we have had libraries since ancient times, why are there so few digital libraries? There must be others, but nowhere near the number (or awareness) that we should have.

Heck, existing paper-based libraries should probably each include a digital archiving department.

Maybe this is already happening or already exists, and is trivial to those studying library science or something. I can hope, anyway.


There are lots of web archiving projects out there:

https://en.wikipedia.org/wiki/List_of_Web_archiving_initiati...

But the web is large. And public sector or academic librarian teams tend to be small. The IA's the one that people have coalesced around.


Excellent question.

Local neighborhood libraries could have their own curated digital archive, as cache for fast local search, and archival backup for long-term resilience.


> I still appreciate it, but just imagine if it goes down due to a lawsuit. Now that Google no longer shows cached results, an entire historical record would be gone.

Or somebody accidentally `rm -rf`'s an empty variable. Or The Big One hits San Fran. Or somebody in crisis breaks in with a crowbar, matchbook, and jug of gasoline.

They're a rather old-school shop. Own their own servers, all in one location I think. Bare metal admin stuff, and data's only mirrored across two disks per file IIRC. Keeps costs down. It's what makes the whole operation possible. But I also wonder sometimes.



If you check the issues, you'll learn this is not a supported project anymore (and honestly, it hardly worked even back then).


> Now that Google no longer shows cached results,

That was also the "end of an era" of sorts right there.-

> they don't encourage mirror sites to emerge,

Something over BitTorrent or blockchain would work well here, methinks. As a baseline substrate.-




Consider applying for YC's Winter 2026 batch! Applications are open till Nov 10

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: