Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Century Scale Storage (law.harvard.edu)
38 points by worik 10 months ago | hide | past | favorite | 9 comments


I’m surprised that this piece mentioned Microsoft, but didn’t touch on Microsoft’s solution to this problem: project silica (https://www.microsoft.com/en-us/research/project/project-sil...), which stores data on etched pieces of quartz glass that are supposed to be able reliably store data for thousands a of years. Of course, you still need to solve the dispersal problem, and need to make sure that the knowledge of how to read the glass tablets is passed down, but hey, nothing’s perfect!


It would be neat if they had a process for interactively "zooming in" on a given artifact with a suitably equipped human. For example, on the slab print human legible instructions for building a microscope. Then, when you look at the slab again with the microscope you built, there are more instructions on building a better microscope, a computer, and a decoding algorithm. This can continue, such that the N-1 level instructions allow the human read the artifact at Nth level density, unlocking further instructions, until you reach "bottom". I assume there are people expert in theory-crafting a message that self-teaches the language in which it is written. This also assumes there is no meaningful interaction of representations at different levels, or ones that can easily be accounted for.


>Answers and Non-Answers

>I have mostly been beating around the bush here for 12,000 words.

You don't say. anyways, long story short the only data that is guaranteed to survive is either data that does not require software, or data that requires software that can be easily/rationally reverse-engineered.


Software is not the problem here, hardware is: Software for storage is typically a rather thin wrapper around mathematics (for example forward error correction). The maths here tells you how to compensate for whatever decay/defect/loss you predict. The software can be made 100% correct. The hardware is the thing that will rot.


Abstracting away from the problems with supports and standards and encodings which will invariably evolve, preserving information in time requires error correction, which in turn requires redundancy of some kind.


Reader mode was the only way for me to read the article.

I sure hope most web pages don't last 100 years


What prevented you from reading the page?


The sparkling bit rot show at the top of the page was far too distracting


> In 2021, lawyer and FOIA expert Michael Ravnitzky filed a request for copies of video footage of a lecture by legendary computer scientist Admiral Grace Hopper that were present in the National Security Agency’s archives. The NSA denied the request in May of 2024, stating that the agency no longer owned a machine capable of playing back the AMPEX video tapes in their collection.

This is no longer correct. Once the denial was issued in May it was shared widely via HN and other outlets. This attention resulted in the video being digitized and put online in August 2024 (https://news.ycombinator.com/item?id=41356528a). As I and many others observed in May, there are quite a few specialist video firms as well as archivists, technology museums and private collectors of vintage video gear who have working 1-inch Type-C video player/recorders. It was by far the most common broadcast video tape format for well over a decade. Almost every broadcast TV station and post production house had several decks, so it's not even considered a rare or challenging format by any serious video archivist. The NSA's pro forma rejection was due to bureaucratic silos, budget and lack of motivation - not any technical inability to play that video format today.

The staff charged with issuing the legally required responses to FOIA requests don't care about archiving, preservation or history. They are measured by clearing the request backlog and the fastest (+ cheapest) way to do that is to find any legitimate grounds for denial and check that box on the form - which they obviously did. Of course, the NSA does have other staff who are measured on preserving and disseminating the organization's history. Once the opportunity to preserve this media was brought to their attention, the right thing happened - and in this happy case it did so with surprising speed.

The article should be updated because there are much better examples of vintage media being lost forever, such as acetate film reels including masters and camera negatives from early cinema. Although in that case, as with many others, the loss is most often due to improper storage of the media accelerating chemical decomposition - not any technical inability to read the media. In fact, I suspect there aren't yet many examples of loss purely due to no longer being able to technically parse a digital storage format. All the losses I'm aware of are due to things like bit rot, fading magnetic pulses and chemical degradation preventing recovering the raw signal in the first place. As evidenced by the heroic recovery and reconstruction of some of the earliest ever experimental television recordings, if the bits, pulses or grains still exist and there's motivation to recover it, we can usually find a way to capture the raw bits, pulses or waveforms and decode them into a modern format.




Consider applying for YC's Winter 2026 batch! Applications are open till Nov 10

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: