A lot of people seem to be conflating backups and archives. They are very much not the same thing. An archive has an archivist and an index. It has structure that makes it useful long after the system that created it has disappeared.
Some years ago when I was Unix sysadmin and responsible for backup I had a call from a draughtsman trying to find a rather important drawing. I asked him when he had seen it last, He said he wasn't sure but that it must have been at least two years ago. He was totally flabbergasted when I pointed out that we recycle the backup tapes after twelve months and that there was very little chance that the file would be recoverable. It had never occurred to anyone in the company that an archive would be a good idea all the files were simply kept on live disks; quite remarkable really when you consider that the products often cost a million dollars and are expected to last half a century and be repairable to last even longer.
I have seen this almost everywhere I go. Large companies used to have libraries and librarians but now such things and such people are regarded as non-productive and hence to be wiped out.
One of the companies I worked for had an undersized file server and way too much data to store (it was before cheap cloud storage). The managers insisted the old data still was business critical so IT needed to 'archive' it, which consisted of copying the data in triplicate on a bunch of tapes... and that was it.
It didn't occur to them that to make it useful we needed to know what was actually on the media. After the only people that knew anything about the data contents and purpose left, the archive became an ever expanding data/tape junkyard that occupied one of the rooms in the basement.
Another useful distinction when talking about long timeframes is the amount of ongoing attention needed to maintain the data. A true archival medium is one that, if stored in a reasonable environment, can be ignored and still be read at the end. By contrast many of the "obvious" ways of keeping data involve copying or regenerating it periodically as media degrade.
Given exponentially decreasing costs for storage, there's an argument for archiving everything of potential value and low liability risk. Ditto for store-width and testing recoverability.
Exponentially decreasing cost of storage is a myth. Storage is super expensive. It may be tempting to consider only some metrics (eg. the price of acquisition of the hardware), but the operational costs aren't getting much lower if at all.
At sufficient scale the per-GB (or per-TB/per-PB/etc) price of the storage dwarfs all other costs. A small business might have trouble archiving everything, but most large businesses should see and be able to take advantage of this difference.
Does your company make Docker images? -- They start at somewhere around 1GB a piece. If, say, you use them for testing, you will have to create roughly an image per commit in your repository. 1K commits per projects -- that's about an output of a small (<5) team working for under a year. We are already at 1TB. Now, these images generate logs when run... Some of them may generate entire test datasets for testing... And we haven't even touched the data we want to store yet!
Now, consider this:
You need to store your Docker images somewhere, right? The registry. This registry needs backups every now and then, otherwise it will... well, one day you'll lose data, perhaps the entire registry full of images.
So, you might think incremental snapshots is the answer, but then there's problem with incremental snapshots: what if one of them is damaged? So, you'd want independent snapshots... And then you want to store them in something that's resilient to failure and can have its parts replaced on demand, so, something like RAID5, so that also adds something like 30% to your storage.
Another thing to consider: what if whatever you want to store isn't composed of independent images that can be reproduced on demand... what if, it's like a Web server with a session? Well, then you might need consistency groups that store your container's state together with the state of the database and whatever else that is using across, potentially, multiple computers... Oy-wey! Now we are looking into buying enterprise-level storage system from something like NetApp... We are probably spending six digit figures only on this single Web server with a database sharded into three servers on yearly basis... And we still don't have geographical redundancy for contingencies like datacenter being hit by a tsunami etc.
If you only have a 5 person team of SDEs, then your company is within the "small businesses" that might not be able to store everything like I mentioned.
Also, if you aren't deduping Docker images by layers, you're doing it very very wrong.
Where did I say that the company has only 5 people?.. I gave it as a unit of measurement. Typically, hierarchies of s/w companies build on small-ish teams, in my experience, 5 would be the average team size. While there's usually some way to aggregate across teams (in terms of resources), largely, the resources a team needs are independent of other teams.
In other words, if you have a team use X resources, then, roughly, two teams will use 2X resources. But resources per programmer don't make sense in the same way how man-hours don't (because you get fractional people).
And, the truth is that even though large businesses have a potential to save during aggregation, developing this ability is yet another cost that they need to pay all the while the business is running. And the more efficient they want this aggregation to be, the more they need to invest into it. Which means, that large businesses are bound to be more wasteful than small businesses.
So, to give a more concrete example of why there's tendency in resource waste to be of the form:
for N teams, where resource use would be X per team, the cumulative resource use is (N + e)X, where e is small, but is a function of N, perhaps, something like log(N).
Suppose your company's artifacts are PyPI-style Python packages (hopefully, Wheels). For a very small company, you might get away with just having a Git repo with your code, and don't even produce Wheel: just do code checkouts, add current source to PYTHONPATH, and you are good to go.
Suppose your company now grows another team that needs to use the code written by the first team. Well, now they need to agree on their artifacts, the format, the location, so that two teams may independently create and access artifacts. The packages are relatively small and don't require much auditing of external dependencies though, so they simply pay for a small cloud instance to store their packages (eg. use GitHub releases or foundries or w/e it's called).
Suppose your company grows to ten or so teams. Now you start getting conflicting requirements between individual artifacts of different teams, you can no longer use simple foundries provided by services like Github to store your artifacts because you discovered you also need to use patched versions of third-party dependencies, and you have increased audit requirements. You now buy services from something like Artifactory, where you store and mirror a whole lot of Python packages you don't develop.
And, as you grow bigger, you'll realize that solutions like Artifactory also have their limits. At some huge scale, you'll get into business of running your own datacenters, and, perhaps, even your own power plants to power those datacenters. At each level on this growth path, you'll have to adjust your infrastructure to generate less waste compared to what would've happened if you tried to use the methodology you used at the previous level + the infrastructure necessary to bridge between multiple groups, however, each such step will never completely eliminate the price of aggregation, it will simply attempt to make it manageable.
----
In other words: a price of servicing of 1GB of storage for Google is lower than it would've been for an small startup, but they also need a lot more storage to manage per team than a small startup. This is why someone like Google would be in a good position to sell their services of managing storage for small shops, for example.
Is storage that cheap? I can get a slightly used 18tb drive for, iirc, 250$ usd, but that's not actually useful on its own. I need two to actually have 18tb of space, so 500$.
One 18tb drive isn't an 18tb drive, it's an 18tb ticking time bomb of catastrophic data loss. I think that cost is kinda ignored when people talk about storage costs.
(Two isn't enough for some... Raidz2 with two drive failure survivability is what I most often hear recommended, because resilvering after one drive failure has high chance of killing another drive, at which point you're just fucked)
This doesn't include the costs for everything else needed, electricity, the other hardware etc etc. Plus, if you want 18tb, you actually need 3 drives: 2 for live use, one as a spare. If any of the live drives dies, you plop in the spare, rebuild the raid, move the data out and change the drives (since it might be impossible to acquire another 18tb drive).
Guys - you're painting the bike shed here. Whether it's a disk or the tape cartridge, if storage costs halve then that means half of everything else (devices, machines, cages, machine rooms, data centers, etc) to store the same bytes, which is ultimately reflected in TCO.
It's not enough to have the data on a drive somewhere - someone has to know which drive and how to access it.
That problem gets harder and harder the more data you have.
Especially as we move away from "everyone in the building can see the archives" style libraries to "Zero trust architectures" where you often can't see even a reasonable portion of the entire data store, only someone with the right credentials and access can.
And this is a challenge right now... I've worked closely with clients that have data lakes that they are struggling to pull value out of because they can't find the data they need, or the data has so many security details (banking) involved that making the decision about giving access to someone requesting data takes WEEKS to resolve)
Essentially - just keeping the data is insufficient. You need the ability to take advantage of it, and that usually costs much more than simple storage.
but we're talking about cost reduction (TCO) and the argument holds: at scale, if you halve the cost-per-bit for storage, then the TCO should drop dramatically and very close to half. The argument isn't about indexing, security, auditing, or whether Hawaiian pizza is an abomination to humanity.
only if the cost-per-bit of storage made up the bulk of the variable costs of backup.
My point is that I don't think that holds. That cost is likely not trivial, but I think the other variable costs make up a large percentage of total costs of a backup (again - because data by itself on a drive in storage somewhere is useless - it's the ability to access and use that data that organizations are paying for)
I mean, there is an argument for it, but the counterpoint is that you have to have an archivist and an index. Storing data for the Long Haul, such that it is findable, is not a simple matter in many cases.
What I found deeply fascinating, as a technie working with/for librarians, is that these are people who have a solid and agreed-upon understanding of where to place physical objects which contain information, but when it came to storing data? They might as well be anyone off the street. Everyone had their idiosyncratic methodology, which wasn't even consistent over time.
M-DISC Blu-Ray discs are still one of the few affordable options for long term storage if the amount of data you want to preserve matches the medium's capacity of 100GB per BD-XL disc. They should last 100 years or more ("up to 1000 years"). The 100GB BD-XL discs are quite expensive however (14€ each). They also seem to get harder to find.
If you have less data you can go with the 25GB M-DISC Blu-ray discs (more drives can read these) or even with their DVD-R discs.
Make sure to keep an external optical drive with USB-C around as well. I know that none of my PCs from the last 5 years had a builtin optical drive.
If you have more data, consider LTO drives and copy them every 5-15(?) years.
Every year i burn an identical set of discs with the past years photos, and store them in separate locations. I use a mix of 25GB and 100GB discs, depending on the previous years photo activity (Covid-19 years fit on 25GB discs :D )
I also maintain a couple of external drives at the same locations that are surface scanned/updated/rotated yearly.
I use no form of encryption or archiving software, and while some discs have a PAR2 file "embedded" this is not my regular practice. I don't want to be in a situation 20 years from now where i've forgotten the encryption password to my archive (though in 20 years todays encryption will probably be "trivial" to break)
> They also seem to get harder to find
I have noticed the same, and i hope that the industry realizes that there is a need for long term archiving at a consumer level price. Considering that all family photos these days are essentially all digital, there is either the option to archive it, or make physical copies, which is something i have considered as well.
When it comes to documents, i make normal backups, but i don't archive them. The archive is disaster recovery, and if everything else has failed (including 3-2-1 backups), i doubt my 10+ year old documents are worth anything to me. Anything important exists in government databases and/or as physical copies. This may of course not be the case for everybody.
I don't understand the lengths people go to undoubtedly discover they can't read what they've saved because they failed to read the future. Just read the present.
SSDs are big enough now that one can hold everything I want. If you want to go bigger for your downloaded movie collection you're never going to watch, whatever your scale, ok you have that many 14TB hard disks.
two years from now, the newer model SSDs and hard disks are going to be much bigger. I'm about to jump to 4TB. Buy them. Copy your stuff to them. The old ones? there's your offsite back up. No bit rot because you're going to do it again, soon.
lather, rinse, repeat.
interfaces and formats change? you won't even notice. Too expensive? Don't use the latest most expensive size, use the last model each time. Yes it's slightly smaller, you said you wanted to save money.
really long term storage: keep doing it, takes a few hours each year. need something fancier, visit it every month.
Hmm... Iirc SSDs need to be powered up regularly. So storing them for years time might work, but also given the rise of qlc SSDs I wouldn't trust them for longer-term offline storage.
The only effective way to preserve is to transcribe.
Think of it - how many ancient texts survived because we found papyrus in a jar in some cave? The Ancient Greek works and other things were mostly preserved by monks and others making copies over time.
I consulted with an archive a few years ago. They had acquired from records to be preserved from another entity - delivered on circa 1997 DAT tapes. That was challenging to recover circa 2017, in 2030 it will be very difficult and expensive. Ditto for MDisc.
As the saying goes, the information wasn't lost when the Library of Alexandria burned down; it was lost because no one had bothered to make a copy before the fire.
Can confirm. SSD are volatile storage. It's pretty much the worst option for long term backups.
I believe the spec only demands persistence if they're powered on at least once every 90 days, the same applies to USB sticks btw.
The data isn't lost after 90days, but bit rot occurs at a highly accelerated frequency. So you might not notice or care if there is a gray pixel in your image you've stored on your USB stick, but that's what it means to use volatile storage.
Basically all digital storage mediums suffer from mid- to long-term longevity issues:
* Magnetic media such as HDD and tape will eventually lose magnetization.
* Ink-based optical media such as CD-R and DVD-R will eventually suffer chemical degradation.
* NAND media such as SSD and flash will eventually lose electrical charge.
* All digital data and media will eventually lose methods of access.
The only way to guarantee digital data long-term is to re-archive on a consistent and frequent basis so that the passage of time doesn't render the data inaccessible or illegible.
Digital data simply isn't durable unlike analog data.
I agree with everything you said up to the last sentence.
> Digital data simply isn't durable unlike analog data.
What I would say is, the most popular digital storage hardware like HDD/SSD are significantly less durable than historically proven analog media like paper and stone.
From a theoretic point of view, digital data is vastly more durable than analog data. Think of what the word means - digital means digits, from a finite alphabet, stringed in a sequence; analog means continuous values on a continuous medium. Written language is digital. Transcriptions of ancient texts survive to the current day and can be copied and recopied indefinitely into the future. Black-and-white barcodes created today can be copied losslessly as many times as we want.
One of the problems with digital data is that there's too much of it. If a photo is 1 MB, can we realistically print ~8 million black/white dots arranged in a grid on a piece of paper that we can quickly and reliably read back? It's hard. That's why analog solutions are so appealing, because viewing an analog photograph doesn't require special technology, and the loss of thousands of dots is completely inconsequential.
Analog media can indeed be very volatile. Look at photos stored on film - will the chemicals degrade? Will the colors change? Will the film get scratched and dusty? And unlike digital, you cannot make a perfect copy of film. You can't duplicate every atom and position them exactly. This goes for all other analog media as well. You cannot duplicate the atoms of ink and paper. You cannot duplicate a magnetic tape. You cannot duplicate a marble sculpture.
> Can confirm. SSD are volatile storage. It's pretty much the worst option for long term backups.
you're making a fair point that should be noted, but you're ignoring what I said which I also think is a fair point that should be noted.
I did not say SSDs are good long term backups. I said (you should read the original but to reduce it) SSDs are reliable in the short term and unbelievably convenient and serially using them in the short term let's you use them for as long as you wish in the long term.
I didn't mention it but it's a good practice to test backups to make sure that they are still backups. Choose the time interval you are comfortable with and visit your offsite backups, make sure they match your current storage.
I do not recommend "longer term", I suggested swapping your old drive for a new one every two years, that's the length of the term. I do every year, because that's how often I buy stuff, but I didn't want others to say "too expensive"
Or just save yourself the trouble and pay for cloud storage - avoid spending your time on maintenance, data checks and off-site storage (in case your house burns down etc.). The value proposition is there unless you value your time very low, really have major security concerns or have extreme amounts of data.
Notably this article is from 2008, which is when Dropbox was first founded.
Umm... NO.
1. Datacentre has limited responsibility - if your data is lost, refund is the best you can wish for. Frankly, they don't care beyond that. And even if they did, there's usually nothing they can do.
2. Disk-as-a-service - you pay, you get, you stop, you lose. Everything.
3. Oops, your cloud provider blocked your account. For the great justice, for porn, because their stupid government decided that your stupid government is their new enemy, or because it's just Google.
4. You don't have Internet anymore. Earthquake caused by either a volcano or drunk neighbors disrupted your only line. And they're too busy to fix it.
There is forever.com, they promise to keep data for 100 years at least. I paid $160 for 10GB of storage. They even promise to transcode your videos for you as file formats become obsolete.
We don’t like to think about this, but when my dad passed we were eventually able to find some of his writing and other things that were important to him and us, that we didn’t really know about.
His google drive kept his digital stuff safe, but stopped doing so after the credit card shut down.
Cloud storage is an important option, among several others. It complements, but does not replace, other forms of backing up stuff that you really don't want to lose.
(It goes without saying that you have to have more than one place to store your backups.)
Cloud storage is useful for personal backup, but it's tricky on the millennium timescale of the article since no company will survive very long.
If you kept duplicate copies at the top N cloud providers at any given time, then you could probably go for 1000 years if N>=3. But this would require an active effort to keep your data alive as companies came and went. A true archival copy can be ignored and still be readable.
Make sure to keep an external optical drive with USB-C around as well. I know that none of my PCs from the last 5 years had a builtin optical drive.
I think finding a device to read the media is likely to be a bigger challenge than finding media that lasts 100 years. I strongly doubt a normal Blu-ray player would last that long unless it was kept in a controlled environment.
Yes. I kept my external blu-ray player in bubble-wrap in a drawer for the last 5 years with no use for it. When I tried plugging it in last year, it was no longer working (clicking noise, probably from one of the motors). In 20 years, absolutely no one except maybe specialized IT companies will have a working optical drive at disposal (just like tape players and sloppy drives).
Like cassete drives, unless they're particularly high end units (unlikely) they will most likely have rotten/badly conformed belts and seized lubricants by the time you want to recover your data.
As long as optical drives are still being built, you should be good (and perhaps ~5 years after that).
Once it's obvious that the most recently built drives that can read the media are getting unreliable (due to their age), it's time to switch to a new medium.
I have bought NEC DVD/RW ND-4570A unit back in the day (and MFG date is Jan 2006).
Somewhere between 2014 and 2018 I needed to read a CD/R, so I plugged it in and... nothing. I tested other discs, including factory made ones but looks like optics are dead in it. Sure, that time I could read the CD on ThinkPad X301 I had lying around, and I think I used it again circa 2020, but it still has a chance of just dying and then I wouldn't be able to read any of my not so big but still a collection of CDs/DVDs.
Sometimes I fancy myself buying a USB DVD/BD drive... but just like I need to search the whole apartment to find that X301 (I have no idea where it is now again) I would not only need to find that drive (and it is even easier to lose it than a 13" laptop) but it can just die like my NEC one.
So if you reaaly need some data, not only you should use appropriate media, you should have the means to read it back and that means should be redundant. Same applies to LTO, because while you still can find anything since LTO3 on eBay, there is no guarantee what it would work at all, because the drives are quite finicky and have a limited life-span in tearms of read/write cycles.
Does that matter for me, who doesn't know from which side I should take a soldering iron? *grin*
But considering everything else worked (it actually tried to read, just couldn't) it's something in optics or it's electronic parts, so it doesn't really change anything for me - it's unrepairable by myself and, without a spare/donor, by anyone else.
Agree. I've been researching data retention for many years and, except for Rosetta projects, there are no better user-friendly ways than using M-Discs for storing digital data. But there's a "but". You need a reader. If you don't, you need a microscope and deep knowledge of how data is stored on all OSI levels. From laser frequency to JPEG compression algorithms. In this case Rosetta analog microprinting wins.
When picking the archival medium, you have to also optimize for how people will read it in the future, yes.
No point in having a disc survive 300 years if no one has working blu-ray readers at that point. You would have been better off with microscopic text etched into metal, or plain old paper. Or maybe some variation in between like metal sheets with stamped out QR-codes that do light compression reversible by humans manually into english and numbers.
Discs: M-Discs are made only by Verbatim/Milleniata factory. Quality of the usual BD-Rs varies. Sometimes you can get decent disks from CMC. But not LTH.
About the storage... I actually had and idea of finding a stainless steel cylindrical lunchbox (hermetic, thermally isolated by vacuum inside walls). But forgot about that ) Putting a spindle of discs inside such container would be enough to protect them for decades, I think. Could even survive flooding. Not lava, tough :)
(if anyone finds such 12+cm container on chinese stores, I'd be very grateful)
This is exactly my strategy as well. Unattended, I don't think its viable for century + scale (as indicated by comments below re: formats, drives breaking, etc), but I do believe it will work for decades (2? 5? not sure). The stability is a benefit compared to cloud company or SSD/HDD life spans- but clearly does not solve all the problems.
This continues to be a good point and for years my dad, a architect, has told the story of a wise(?) records department he consulted with that demanded their budget be spent on good passive temperature and humidity control for a bigger paper records storage space rather than power/cooling for a small data center.
BUT.
I would not give up hope on old digital storage mediums. I have recently read a casually stored CD-R containing code i wrote 25 years ago with surprisingly no issues. I know of friends who have read the 40 year old floppies from their childhood after discovering them in the garage. My ROM based game cartridges also read fine after 2 decades in an unheated barn. curiousmarc and friends (youtuber) have read the runtime state of the memory of Apollo-era computers among other things.
Correspondingly, nearly all of the paper records I've come in contact with over the years have been disposed of, damaged or last (many of them pictures and memorabilia). I think the issue we face is a lot about choosing what to preserve more than what medium to preserve it in.
> I have recently read a casually stored CD-R containing code i wrote 25 years ago with surprisingly no issues.
Glad to hear it. The opposite happened to me, I archived a bunch of music I had made to CD-R in 2004. They're now unreadable. I do still have the final renders of the tracks I finished, but the source files are gone, along with a bunch of sketches, notes and pictures from that time. I'm not sure what I'd do with them, but I wish I still had them.
If you still have them on hand, give IsoBuster a try. I used it in the past to read data from old CDs that were unreadable otherwise and had very good results. https://www.isobuster.com/
I can attest to 40 year old floppies stored in a barn as a viable storage mechanism. I was able to retrieve almost all of the code that I wrote in my childhood this way. I had no idea that those old Apple 2 and Commodore 64 disks would actually still work!
Related, does anyone have ideas or experience on backing up and preserving the digital artifacts of someone deceased? I’ve been meaning to do an “Ask HN” but this post is in line.
I’ve thought of putting the photos, videos, and document scans on archive.org. Also wondered about using IPFS or long-duration DVDs. And of course miniature archiving as in this post (But it sounds awfully expensive.)
IPFS is a content distribution system that doesn't really solve the problem of archiving and preserving data. It can only be relied on to "store" data as long as at least one node has the data pinned, so really it's just pushing the problem around.
Digital storage is something you definitly only should do if you have a plan (and the resoueces) to ensure the integrity of the data and the storage media. And think about the formats stored. Try opening a realplayer video file today and you know what I mean. Assuming you will be able to open that down the line is dangerous.
As of now experience shows that the media that survives abandonement the best is physical. Paper (not every kind of paper tho), Film, Vinyl survives being in an attic for 100 years.
> Try opening a realplayer video file today and you know what I mean.
I keep seeing this idea pop up from time to time and I just don't get it. Even if the format is somehow still undocumented despite once being popular, the original players/viewers for it must still exist. Due to the popularity, sure enough there will be enough copies of that software around forever, including on places like archive.org. From there, you have options:
- Run the original software on hardware from that time
- Run the corresponding OS in an emulator and run the software on it
- Reverse engineer the software, document the format, and build a player/viewer for those files for your modern
In the particular case of RealPlayer videos though, I'm sure I could play one on my M1 Mac in IINA or VLC with no trouble.
To give quality examples for this sort of thing can be difficult - because things that are obscure enough to be nigh-unreadable won't be recognisable to most readers.
After all, if I said "try opening a Generations Family Tree .cht file and you'll know what I mean" - would you know what I mean?
Perhaps the most famous example is the BBC Domesday Project [1] which paid homage to a 900-year-old book in the form of... a LaserDisc. The data isn't entirely lost - but only thanks to the effort of computer history museums.
> After all, if I said "try opening a Generations Family Tree .cht file and you'll know what I mean" - would you know what I mean?
No, I've never heard about neither this program nor its file format. Strangely I can't even find a copy I could download — but there's its manual on archive.org and several sites that could sell me a CD...
But even if the program that made the file is lost, you could try reverse engineering the format by collecting as many files as you could find. It's harder, but it's still worth giving a shot.
But then again, how popular was that program?
> the BBC Domesday Project [1] which paid homage to a 900-year-old book in the form of... a LaserDisc.
I read the wiki article and it's wild how it depended on one particular computer and disc reader model. I didn't even know there was a standard for encoding digital data onto LaserDiscs (granted, I've never seen a LaserDisc in person to begin with).
But my point is that your examples are all from the early days of widespread computing. You can't just extrapolate it like that. There weren't established standard formats for things back then. Also a lot more software was proprietary AND platform-specific. Today, even if a file format is "unknown", poke at it with a hex editor or the `file` utility and there's a high probability that it's actually something standard like a zip archive or an sqlite database. It's not like we'll ever lose the knowledge of how to read and display a JPEG image from a DVD, unless our entire civilization somehow suffers a collapse so severe we go back to stone age.
> Try opening a realplayer video file today and you know what I mean
Any FFMPEG-based player can play it, including bundled ffplay. And lots of others. I actually have a hard time to find a player that does not play .rm/.rmvb files.
Granted maybe realplayer was a wrong example, but I know film conservationists who can tell you that there is a huge gap in their archive that stems from obscure digital formats that nobody has the funds for to reverse engineer.
It is great if todays VLC plays your file. Just make sure to regularily check over the course of the next 50 years if that is still the case. If stable archival is your goal there not many things that beat lasering something onto physical film where in the worst of situations you'd be even able to look at the pictures with your own eyes and rebuilding a projector after the collapse of society is probably easier than rebuilding a computer running an OS that reads that disk with software that plays that file of which you might know precisely nothing without being able to decode it.
This is why e.g. national archives use film for storage.
Not that this is a feasible solution for the private person, but sometimes considering a low tech solution might be the least complicated "do and forget" option.
If your ancestors find your bluray disks or some old hard drives they might find some challenges:
- they might not know what those are
- if they have an idea what those are, they might not have the connectors or devices needed to play them back
- if they have the devices needed the device might not be able to read the file system because it is not supported anymore
- if the filesystem is supported, the file format might not be
This is a lot of hoops to jump through for something where you don't even know what is on it.
Meanwhile a old picture book can be just looked at.
This is maybe a weird corner case, but I had no luck with the RealMedia files here: https://chance.dartmouth.edu/ChanceLecture/AudioVideo.html
on a Mac, particularly the Susan Holmes lecture on "Probability by Surprise". It's some format that has video + synced web pages.
I eventually downloaded the zip and made it into an iso that I could mount in a VM of Windows that had RealPlayer installed. Failed attempts included VLC and some other Mac video player (Elgato?), and trying to navigate to the page from a Windows VM (no https). I would love to here of a less tedious solution.
VLC can play the video! Open HOLME.RM and choose the second video and second audio track using the menu.
--
ffplay plays it just fine by default, but neither VLC nor mpv does, that's rare! Funny I found this thread while searching for a particular discussion about ffplay.
Looking at its output, libav thinks the file has 15 streams: 5 data, 3 audio, and 7 video. Of these only the mentioned two are playable. Evidently ffplay uses ffmpeg's probing and stream selection, while the other players just try to play the first of each type.
Of the other streams, ffmpeg `-c copy -f data` can only dump two of the data streams. These contain the names of the HTM files and what looks like more compressed data. Searching turns up nothing but other university websites describing how to use it - this synchronized slideshow format (apparently not SMIL) looks lost indeed.
note to self: mpv can be started with `--aid=2 --vid=2` (1-indexed), but VLC has only `--audio-track=1` (0-indexed) and lacks a video track option despite having an open issue since 2009. For completeness, ffplay would use `-ast 6 -vst 9` (undifferentiated and 0-indexed).
As someone who teaches electronics and studied film (including the handling of the actual material) I am more confident that the roll of film I first shot will survive the next 200 years than I am in the survival of that prores mov I stored on two hard drives of my latest film.
As far as I know film is what the National Archives have chosen for the archival of all relevant movies even if they have been shot digitally and I guess they know a thing or two about archiving.
A computer that plays back a Prores file has so many more "moving" parts it is not even funny. You have the spinning rust that may or may not be faulty, then you have the read head that may or may not be faulty, then the spinning rust has a disk controller on it, that may or may not be faulty, then you have a connector on that drive that may or may not exist anymore, then you have a filesystem on that disk that may or may not be supported anymore, and then you get the container format, that may or may not be supported in that version anymore and you get the CODEC which may or may not be supported anymore.
But from the outside it just looks like a hard drive. Unless someone really made good labels and they are readable, and you trust the label you won't even know what is on there and if it is worth all the hassle. Meanwhile with a roll of film or a photograph you just see the thing that is on there. It might have scratches, be half rotten, faded colors and all, but in many cases such things are still somewhat "readable" after a century of bad storage.
Archiving of movies on film is often done with three films, one each for red, green and blue. That way you don't have to worry about fading colors, it's just silver, which insanely stable if in non-humid conditions.
and the documentation of how to run it and two or three backup systems just like it (no joke, this is how some museums have to store early digital computer art and they still have to frankenstein things together from multiple units).
I came here to ask a similar question, but for one's own self:
What's the best way to preserve the artifacts of your own life and ensure they're disseminated after you're dead, or at least available, especially if you're a solitary person with no foreseeable descendants?
GitHub?
Why isn't there a service like that? or is there? Something that makes your files public based on some condition.
You could give them to the public via the Internet Archive and a license that makes them publicly available for everyone at no cost, only keeping authorship in place.
Might be that solid writing skills are required, so that your written artifacts are seen as valuable for future generations.
> Might be that solid writing skills are required, so that your written artifacts are seen as valuable
That would be a barrier for many people, especially those with Impostor Syndrome who would never consider anything they do "valuable", but would still like to leave something behind.
Basically a snapshot of your digital footprint at the time of your death.
> What's the best way to preserve the artifacts of your own life and ensure they're disseminated after you're dead, or at least available, especially if you're a solitary person with no foreseeable descendants?
Do it yourself while you’re alive.
Failing that curate everything in the ‘release’ state and then have a law firm execute your last will and testament with everything they need to make it public.
Make sure it's a 125 degree fire safe. And I'd trust tape over flash, even brand new flash, if it's going to be offline for years.
And since you need a second copy to be even close to safe, I'd try to prepay a historically reliable web host for many years of service. And/or get a safe deposit box.
You should know that safe deposits are no longer put into newly built banks anymore and aren't a profit center, so banks are closing their boxes and shipping contents to self storage places, often with items missing in the shuffle. Wells Fargo seems to be the worst for theft. But the bank can and will open your safe and move it to self storage if it saves them a cent.
Surprised nobody has mentioned the use of artificial DNA [0] in archival (e.g. Adaptive DNA Storage (ADS) Codec). Companies [0] exist to create the DNA today. Within a small volume and with an approach of high redundancy and generation of multiple copies (e.g. errors in reading are overcome by reading many copies and using probabilistic means to resolve). Still a ways to go, but DNA within biology has already proven itself to be resilient over long periods of time (oldest sample ever sequenced was 1.65M years ago[2], very compact in physical form for storage and easy to distribute/store many redundant copies.
I've personally wondered what other types of biological processes could be mimicked to help with encryption (perhaps using some sort of enzymatic process to act as a gatekeeper to reading the DNA - this is totally outside of my expertise, so could be way off track). But fun to think about, nevertheless :-)
I have been following The Long Now Foundation for around 20 years now, the clock was still just an idea when I first found out about them and the first model had yet to be built. From the first time I found out about them I wanted to contribute in some physical way, I wanted to have something that I made to be a part of it but for what ever reason I never pursued it. And now for what ever reason I decided to finally pursue it, I just wrote them a letter and that letter is now in the mailbox waiting for the mailman to pick it up.
I think they are right in the idea that humanity lost something important when it ceased building monuments and I hope I will be able to be a part of their monument in some way beyond the rather antithetical monetary fashion.
The thing I want to make more than anything for them is the actual key cap on the button people press or the knob on the lever which they pull to get the time, that thing which people will touch for 10,000 years to come. It is a tiny and ultimately insignificant piece of what they are doing and ultimately I would be thrilled to make anything related to the movement, but to make that insignificant piece that everyone who wants the time has to touch but has no real impact on purpose seems more meaningful than building the clock itself.
I don't expect to get anything from them beyond a form letter, but maybe I will get to make the completely useless knob that sits on top of the lever which allows people to get something as banal as the current time for generations to come and that is their point. They succeeded in their goal when it comes to me, they got me to see humanity beyond what it is for me but for what it is I can offer them 10,000 years down the line even if all that is is a fleeting touch at the push of a button or a bench where they can rest their weary legs after hours of waiting in line to pull a lever or push a button so they can know what time it is.
This is a remarkable feat, no doubt. Without any prior knowledge about the project, however, it seems to be a geeky (and very fun) challenge rather than a real world problem. IOW it's a solution for a non-existent problem.
First of all, humans have already solved this problem successfully with the invention of oral traditions. If you combine a stubborn social practice with a sufficiently effective and fault tolerant stenography, you should have storage that can reach 50K years (Australian aboriginals boast 60K+ years of records). IOW a record keeping written language and a social practice (like a religion) would have the practical survivability of oral traditions but a much higher storage capacity.
Second, and perhaps this should be first, why are we assuming our knowledge is even remotely relevant to the distant futures? Absolutely none of Ancient Egyptian science exists as practice today, their entire cultural horizon is dead (a corpse in Nietzschean terms) and the only ones with a devoted interest are an elite few in a science of archeology that has only existed once in humankind to our knowledge. That the future cultures have the social or biologic capacity or motivation for archeology let alone science rest upon some rather huge and unfounded assumptions.
The only thing I can think of that should be transferred 1000 years into the future which they also have a high probability of caring about regardless of what motivates them would be e.g. where we store nuclear waste.
Yup I was gonna say, we keep everything very well organized like worker ants but then in 300 hundred years, who cares who married who, who worked where, who thought what beyond passionate historians. There simply is no value in minute details of the past if you kinda get and preserve the big picture.
And even if you do, future people are too entrenched in their beliefs to understand past people.
Using Genesis doesn't sound like such a good idea, a better approach might be some neutral text about nothing of any religous or cultural significance. How do you keep it from getting destroyed intentionally? Do you just hide it well enough in enough places and just hope the right people may find it in the future?
> Using Genesis doesn't sound like such a good idea
That was also my first reaction, but thinking about it a bit more there might be a practical reason for it: they included genesis 1-3 in 1500 languages. I bet there aren’t many texts available that have that many translations, if at all, outside of the bible.
The point is the language, not the content, I assume. Although dictionary + grammar textbooks would have been better.
I'm also not sure about thousands of languages (the goal being all of them). Throughout history, there may have been a hundred of lingua francas. A few hundred tops. We also see a constant desire of people to leave writings in the most popular language of the era. It makes sense, too: they want the largest percentage of people to be able to consume their writing.
Creating dictionary pairs + grammar textbooks + science textbooks for a (still) large but limited number of languages popular now and in the past would be more beneficial to future linguists.
They are still going, their clock is under construction and if memory serves they have already hollowed out the mountain. Their website is not great for providing information.
I hope there are suitable caveats around the Genesis text being mythology - I'd hate to imagine far-future generations scratching their heads at how otherwise technologically sophisticated people could image that this stuff was true.
lots of people believe it literally though and the majority of us are religous, we are exactly those otherwise technologically sophisticated people you are describing.
If you're a christian then creation/young earth is very important to protect, without it you have no "fall of man" and therefore no reason for a saviour. Accepting death before sin also complicates matters significantly (as you must if you accept evolution).
That and Jesus' legitimacy as a saviour is based off a bunch of prophecy and lineage from some of the earlier texts, including Genesis.
Tangentially related, but I highly recommend Long Now seminars. The caliber of guests and the variety of topics are really, really good.
https://longnow.org/seminars/
My great grandfather was born in 1845. As interesting as it would be to see photos and video from his time period, I don't think I want to spend thousands of hours going through his stuff. On the other hand if he kept a journal I would love to spend a few weeks reading about his life. I guess what I am getting at is I think we are keeping too much stuff and a curated set of things in a journal format would be a lot easier to digest. Maybe a future business will be creating documentaries from our ancestral archives.
I journal basically daily. Once a year I do a "year compass" and part of the process is skimming all my journals. It's extremely time consuming but very valuable.
Other than that I rarely look at old journals unless I need to find something i forgot. I think I'm just too busy and focused on the now. That's probably ok.
I've been visiting my friend who was recently hospitalized unexpectedly and he says he's really bored all day. I feel like in his shoes that would be a good time to read and reflect on my life a bit, especially if I was nearing the end. I don't know, it doesn't really matter, the process of writing alone is valuable enough.
(Plus I journal in org roam and org roamify all my entries. Maybe one day that'll be useful to me. It already has once or twice when I'm looking up prior software evaluations)
In raw form your journals may be more than anyone would care to read, but AI tools could make them interesting to future people.
(a) Tools that synthesize down your life story and most interesting thoughts into N pages,
(b) AIs that train on all that data to create a virtual version of you after you're gone (several companies already do this).
There is/was the Lunar Library[0]. Sadly, their first mission crashed, and the contents are spilled out on the surface of the moon. I remember reading one of the organizers of the project was optimistic that the disks were likely to have (mostly) survived.
>We can not say the same for digital storage. Pages stored on plastic DVDs are neither stable over the very long term, nor readable over the long term.
I haven't used a DVD in at least 5+ years, so this statement is becoming reality very quickly.
Encrypted backup to two cloud services (eg, AWS Gacier) is the best option in my view. Copy the backup software binary (that will run on any X86 hardware). Test every year. The encryption password must be written down and stored.
Archival at home is time consuming and expensive. If you absolutely must, consider hot storage, say, a ZFS array with ECC RAM, that runs periodic scrubs. You have to attend to the storage system, and migrate the pool every 5-10 years to new technology. It’s a pain, frankly.
> Today, any information stored only on a floppy disk is essentially gone. Imagine the incompatibility of today’s DVD in 1,000 years.
It depends. If it's some unimportant personal data, sure. If it's the only backup of the entire Wikipedia, I'm sure we can find a way to read it off whatever it's on.
I like where they are going with this but I think they should have embedded it in a type of lens system where you might shine high intensity light on one side and project the micro-etchings onto another surface rather than require a microscope.
Does this object also include instructions on building an microscope to read the rest of the text? Might be useful for bootstrapping a civilisation from stone age.
Copying from one digital medium to another is a very easy operation (to the contrary of copying from paper to paper). At equal "care" I don't see why digital backups should be worse than paper ones.
It's easy to interpret their leading zero to mean "use 5-digit representation of years", when (after Y2K and similar problems) I would've expected a better message to be "don't limit the time points that can be represented".
Maybe they have an additional assumption: that humanity won't ever need to consider years past 99,999 in a Gregorian calendar.
Yet they say things like "Our hope is that at least one of the eight headline languages can be recovered in 1,000 years". Come on, we can read text in almost any language from 1,000 years ago, and they didn't care at all about leaving durable texts. A couple of days ago there was this history about a norwegian carved graffiti from around 1,000 AD in Venetia, still readable. They should aim for 10,000 years.
I think they mean about the medium being readable. Carved graffiti (o paper) is more durable than a CD-R. As this media they're using for the "Very Long-Term Backup" hasn't been fully tested yet (we'll need to wait a thousand years for that), they're hoping it will work, but they aren't still sure.
According to the linked article, Their whole purpose was/is to “think 10,000 years in the future instead of quarterly”, so they only need 5 digits for that goal.
This was the first I'd heard of Long Now, but upon reading the article, I had assumed the leading 0 was representative of the infinite continuum of 0's in-front. But all the repetitive 0's had been slimmed down to one, like :: in IPv6
Some years ago when I was Unix sysadmin and responsible for backup I had a call from a draughtsman trying to find a rather important drawing. I asked him when he had seen it last, He said he wasn't sure but that it must have been at least two years ago. He was totally flabbergasted when I pointed out that we recycle the backup tapes after twelve months and that there was very little chance that the file would be recoverable. It had never occurred to anyone in the company that an archive would be a good idea all the files were simply kept on live disks; quite remarkable really when you consider that the products often cost a million dollars and are expected to last half a century and be repairable to last even longer.
I have seen this almost everywhere I go. Large companies used to have libraries and librarians but now such things and such people are regarded as non-productive and hence to be wiped out.