Several people have mentioned the Internet Archive. They're doing god's work and you should give them money [0].
But I think that dapper Vint "One is glad to be of service" Cerf is referring to how difficult it will be for far future historians to piece together records of daily life in the 21st century, especially from the view of individuals. Think of Da Vinci's notebooks, the personal journals of artic explorers, correspondence letters from great artists and statesman. Now imagine that those things are stored on floppy disks in Office95 .doc format. We'd have a hard time viewing that media today. In 1,000 years it might be impossible. A lot more will be lost to history than what the Internet Archive is currently storing.
I can't think of a great solution. When I watch some historical documentaries, it seems like every other scene is based on a quotation from a letter or an old photograph. We don't print those things out any more. They'll be lost, completely lost, when email services and social networks finally shut down. And even if people download personal backups, most file and physical storage formats have a shelf life measured in decades at the most. Letters can sit around in attics for lifetimes, undisturbed. A file formatted for a Commodore 64's word processor program might as well be written in a lost language and then one way encrypted. I'm not sure what to do about it, but it seems like a damn shame.
Art institutions take these problems seriously. XFR STN at the New Museum is just one example of dozens of projects which attempt to rescue obsolete digital content. I think the problem is less serious than you think where file formats are concerned. It is very severe with a platform like Twitter, though. An informed public would reject the useless level of threading offered by Twitter...
Do we need to preserve every tweet, though? The important ones get mentioned in articles and other places. 99% is just noise, although I guess it would be interesting for future people to listen in.
It doesn't seem that important of a thing to preserve though. Sure, if we could hear daily conversations of people from the year 1000 we'd learn a lot. But that's only because we don't have access to much other cultural information from back then, unlike what we'll have now with webpages and news articles.
I don't see what makes Twitter worth preserving any more than everyday conversation, or packaging labels, or billboards, or whatever other junk information there is. Just because it would be somewhat easier to do preserve a digital thing doesn't make it more important to, does it?
I mean I'm sure it'll be helpful to a subset of researchers. But I don't see why it's Twitter's fault for not making it easy to preserve, as much as it is Microsoft's fault for not preserving WordArt bake sale announcements.
I'm not sure if you are joking, but I have a linguist friend who mines Twitter for things like evolving language usage. She shares some pretty interesting things that she learns from it, too. I expect that being able to do that with a hundred-year (or multi-decade) corpus could be really interesting.
I think it ends up being the responsibility of individual families to preserve their digital history as much as their old physical stuff.
For example, I have digitized probably over 5k photographs and several dozen home movies (that I had to record from VHS tapes and even some old film reels), and several thousand pages of documents - report cards, letters, tax forms, etc - of my extend families over the years. That entire archive is now around 20GB big (I have "new" stuff and "old" stuff, pre and post 2000 separate) and I have it backed up in the cloud, on three personal computers, my own external hard drives, and I have it burned in sets of dvds in three places (two homes and one storage unit). I made sure it was all formatted with open standards (extenion-less pdfs, pngs, jpgs, theora, vorbis, odfs, etc). The archives themselves are all UDF formatted.
This kind of stuff is not going to magically become unusable. The formats aren't going anyway. The transport interfaces might, but someone will be able to read dvds, plug usb, and plug sata for several decades at least. And I fully intend to update that collection (probably around 2020) and include the next decade in it in whatever formats are most appropriately modern at the time. At least as long as I still have something like gstreamer to pipe all the audio / video through to transcode it from speex / vorbis to opus or whatever else turns up.
Most families should do something like that. I doubt many of them are.
>Now imagine that those things are stored on floppy disks in Office95 .doc format
I really don't think this is a big problem, in the historical context. Sure, the average person won't be able to casually open those files, but neither can the average native English speaker casually read the original Beowulf!
If a catastrophe wipes out an ancient civilization and all that survives is one laptop-sized cuneiform tablet, historians don't get much from that. If a catastrophe wipes out modern civilization and all that survives is one laptop, historians get gigabytes, maybe terabytes, from that.
I think there's a middle ground though. For example, Egyptian hieroglyphics were around but not understood until the rosetta stone was found. Without a "digital rosetta stone" those terabytes would be similarly useless.
There's also some historical value in journals of every day folks, e.g. soldier's notes from major battles or those present at historical events.
Well, cracking 100 years from now will probably be much harder. For example, could CSS be as easily cracked if you didn't know it was for DVD video and didn't have access to a working reference implementation?
Either way, we're surely capable of leaving a better legacy than what we currently have.
Even if DVD codecs are 100% opaque to future historians, recall that no video of any format existed until the 20th century, and widespread ubiquitous video recording didn't exist until the 21st. If we're entering a dark age now, what does that make former ages?
I agree that the "dark age" designation is not the best (and perhaps sensational), but the basic issue of how to maintain digital data isn't worth writing off...
How many letters do you have from your great grandparents? For me, it's zero. For records to be preserved, someone has to care about preserving them. For digital data, that means copying it forward to newer media now and then.
I'm sorry, but there's a huge difference in preserving a letter and a file. I say this as someone who has worked in the field of preservation (mostly digital) for close to a decade.
Really simplified... For a letter to be preserved: put it in a box and take it with you when you move and hope that the house doesn't burn down. For a file to be preserved: keep backups to avoid data corruption and media deterioration. Also: control physical media obsolescence and format obsolescence. Most people don't have clue about these things. If our cultural heritage institutions don't get their hands on some of these files, most of them will likely be lost.
And no, it's not only about our great grandparents. It's also about the Turings and Einstens. Their day to day mail correspondence has taught us quite a lot. Turing might not have been a good example though... :)
The problems are simply different, not one is better than with the other. With paper and film, making backups is tedious and expensive and rarely happens. They get lost, mildewed, thrown away, looted, burned, flooded, blown up, eaten by insects, etc. Much of my family history got lost that way. Any piece of paper surviving 200 years is a miracle.
Paper archives have the unfortunate characteristic of concentrating treasures together. 9/11 apparently destroyed quite an archive of photos - we don't even know what all was there. The Vatican Library concentrates a huge collection of paper, and one little event could take it all away.
With digital, copies are cheap. I've been copying forward my old stuff for 40 years now (though I did lose my old IBM punchcard decks and paper tapes!), and it does get easier. With cheap terabyte drives, one drive will hold it all, and I can make copies to make it resistant to catastrophe.
The Vatican Library really needs to make it a priority to scan all those papers.
> Would future civilizations go through the trouble of understanding our video codecs? Machine language?
I would say yes; we go to great lengths to extract information from hard-to-read ancient sources, so future civilizations probably would too: http://www.bbc.com/news/world-europe-31087746
True, but how many people do we have working on that problem?
Compare to the number of people and years of effort that went into building our modern computing platforms. How much research and investment in a very lucrative field? A quick assessment of my university's finances shows they spend twice as much money per diploma on the sciences vs the arts. (Honestly surprised it's that much, though it does exclude medicine which is twice as much as science)
Then consider the hardware and engineers. Just getting to a bit stream is very complex.
I estimate decoding a modern piece of data would be similar to building our entire computerized technology field very nearly from scratch. In my mind that's several orders of magnitude of effort off.
But this is just a property of acceleration. If languages had started to change at the same speed as file formats have, or we had started to miniaturize in other ways, those millenia old letters might no longer have been usable either. It's definitely worth being conscious of the value of interpreted artifacts, but this is a continuation of a very old tension.
I feel like http://archive.org/ deserves a mention here for their efforts of preserving digital history. Personal photos or other data are not in their scope, but they do wonderful things with web pages and other stuff that's publicly available. I just read Brewster Kahle's interview from Founders at Work this week and it blew my mind how determined to build the Internet Archive he's been for practically his whole career.
Last summer I discovered, by accident, a little computer museum[1] in the Kvarner Gulf. It's run by a single guy, who collected an incredible amount of old hardware over the years and exhibited them in couple of rooms.
It rained the day I visited the place, and since it was an old building, it actually rained a bit inside. The guy took it with humour, but it was actually a pity. It is a great place and on some days the visitors can even run old programs on those ancient machines or play games.
From their site:
Opened on 22nd of September 2007, Club PEEK&POKE is one of the few permanent displays of vintage computing technology in Europe. Located in the centre of the city of Rijeka and spread across 300m2 of space, it contains more than 1000 exhibits of the world and local computer history, ranging from very early calculators and game consoles to rare and obsolete computers from the nineties.
I've been to a couple of these museums as well. What makes them fun to go to are:
- you may touch everything, often including things inside the computers itself
- the people working there (or, as is often the case, the one guy working there) is knowledgeable, willing to talk about it in depth, cares, and you can have a real conversation with them.
- there's a lot of surprising hardware around I had no idea existed (from failed computer companies, to specialized I/O of decades past, to special-purpose hardware)
You cannot get the same full experience with most regular museums, as then you have to follow their narrative instead of following your curiosity.
On the other hand, these museums are often not much more than an erudite collection of historical artifacts they got their hands on. To preserve is to select with a plan, not just collect everything.
Technical/science museums in general often seem to be closer to this ideal than other museums, because while some of the artefacts may be valuable, more often they are trying to showcase how things worked rather than specific objects.
True. Although the more popular/populous museums tend to offer models you can interact with, whereas the smaller ones more often offer the real thing.
If you're ever offered to visit the archives/warehouse of a museum, accept. Showing (large) stuff is expensive in space, so most exhibitions are only the tip of the ice berg of the enormous amount of stuff the museum has collected and stored. Often an exhibition is a mix of well-known artifacts that visitors expect to be there (and are therefore included, but are not very interesting as you already know them), some artifacts that are good specimens over-all to show (but not necessarily that interesting), and some truly interesting pieces (you probably never have seen before). I found that when you visit the archive/warehouse of a museum, curators tend to show artifacts from that last category in particular.
The Deutsches Museum in Munich is the world's largest and IMHO best museum of science and technology. Especially the mechanical engineering, cars and airplane departments are one of its kind.
Though, the computer history department is a bit too small and similar in size as the equivalent Science Museum in London. In both you find a Cray super computer, first Zuse computers and many older mainframes and terminals, etc. But everything is death, every historical computer sits just there. There is no interactivity, what a shame. They should at least re-work a Cray 1 or 2 super computer and let visitors play around on its terminal - that would be awesome.
Amazing to hear that there is a computer museum in Rijeka. My family spent nearly three months staying right outside of Opatija while working in Croatia. We'd travel to Rijeka regularly when we needed to do any larger shopping.
My wife and I originally planned to organize a workshop or attend some meetups while there, but the tech scene is so tiny it just wasn't feasible.
We'll be going again this spring, this will be on our list of places to visit.
"A company would have to provide the service, and I suggested to Mr Cerf that few companies have lasted for hundreds of years."
Maybe it's just that I'm an academic, but this sounds less like a job for a company and more like a job for a library. (Or more robustly, a job for the world's interconnected network of libraries.) The whole point that I take from this article is "we must preserve our heritage for the common good", and that sounds awfully close to a library's core mission.
Unfortunately, libraries (and museums and other cultural institutions) nowdays are forced to compete on the market - which erodes their ability to preserve and give access to our cultural heritage. If this trend continues, soon there won't be any functional difference between a library and a company.
Ever since Vint Cerf started working for Google, I've been less than impressed with his speeches. His speeches have taken a pretty obvious turn towards promoting stuff that's good for Google, rather than people per se.
He's now basically saying "You're all going to lose your data...better give it to Google to save it for you!".
Librarians and archivists are working on this challenge, yes. I work for a company (Artefactual Systems) that, informed by the work being done in academia, develops open source tools related to this (extracting metadata about digital content, etc.).
We observed this problem first hand when the inventor of Powerpoint emailed us to ask for help in opening old presentations which he could no longer view !
When the guy who wrote the software which produced the file can't view it 20 years later you know you've got a problem.
This is why I'm a firm believer in open, free standards. Of course they don't necessarily guarantee you'll be able to open those files, but it's a great step in the right direction.
We shouldn't have to rely on reverse engineering to preserve what's ours.
Unfortunately public administrations don't realize (or don't want to realize) this, at least where I live. I even discussed this with information professionals and they wouldn't understand! We're right in the middle of some big change and it will be too late when they finally realize.
It feels weird to me to see Vint Cerf propped up in this title by being associated with Google. It seems the guy's name holds enough merit on it's own; I didn't even realize he was working with Google these days!
It is Hacker News, not Computer Engineer/Scientist news. It was always supposed to tilt towards Silicon Valley. I don't think that's a good thing, but it's a free service from Y Combinator, so it's very understandable.
Well, since it was pg's idea, I'd guess "hacker" would slant more towards MIT-style hacks than anything to do with the valley.
I think the lack of Vint Cerf stories has more to do with the temporal nature of news than any biases that people might have. I'd chalk it up to him just not making news lately. Any text on the history of the Internet, TCP/IP, and networks in general liberally mentions him.
It's interesting to examine which historical figures have been remembered and which have been forgotten. Money is part of it but not the full story. Political leaders during times of chaos are remembered. Artists are remembered, many of which died penniless. Mass murderers are remembered. I think in general people just need a simple story to attach to a single person to remember them. Building a computer and making a billion dollars is a simple story. Walking on the moon is a simple story. Designing packet switching protocols is not. It's not a tragic situation it's just interesting to note.
Interesting... maybe an age thing? I'm 39 and Vint is right up there in my mind with the heavyweights of computer science.
That said, I wasn't aware he was with Google either. I imagine his place in my mind was solidified back in college during the .com days when history internet was a common topic of discussion.
He was supposed to give a talk here at Goddard Space Flight Center recently, but it was snowed out. I've heard him speak before, on his Interplanetary Internet work, and he's a good speaker.
I see a lot of comments amounting to saying that contemporary digital content isn't worth much, and its loss wouldn't be a big hardship. This isn't the only kind of digital content though. What I am worried about is older, pre-digital matter which is thrown out because digital copies now exist (often crappy scans of marked-up, faded pages, but that's another rant.) When the digital copies disappear, we will no longer have the durable pre-digital copies to revert to.
Indeed. Every time we scan a document and throw the original away, we're betting for the continued existence of current technological civilization. If it collapses, the thing we digitized and thrown away is the thing future generations will have lost from the cultural heritage. The Dark Ages may end up extending far beyond the beginning of information age.
While I think that Vint Cerf is correct in talking about the dangers of losing the ability to read files in various formats, my hope for long term access lies in open and standardized file formats and services like the fantastic archive.org (http://web.archive.org/web/*/markwatson.com is their history of my little web site, starting in 1996).
I think that HTML files in a standard character set like UTF-8 could be readable a thousand years from now if human civilization has not destroyed itself.
I hold out less hope for formats like various ogg formats, TIFF, JPEG, MPEG, etc. Software like computer games is even more problematic.
I am hopeful that the technology will improve for archiving digital assets. New storage technologies will become more reliable, much more information dense, and less expensive both to build and provide power for.
I don't think so. There will always be usage cases involving a couple orders of magnitude less capacity or bandwidth than the leading edge. Lossy compression will always be appropriate for these.
Anything involving wireless is a good start. There's a hard physical limit to the amount of data you can cram over 4G or 802.11 spectrum. It's physically impossible to losslessly stream video at 4k 60fps over these, so that's why we use lossy compression (currently MPEG) and always will.
You may be confused about TIFF. Long after JPEG dies a horrible death some medical facility and some state ran organization are going to have a system that reads TIFFs that's been running for 30 years.
No one ever considers the volume of photos and documents we are now creating. My parents probably took dozens of photographs when I was younger, and kept maybe one or two albums. Nowadays, most people have hundreds if not thousands of photographs and videos. So it seems to me that the converse will actually be true. If you've got a 100 fold increase in the number of documents and photos being created, and only 1% of those make it, you're still preserving the same number of documents. And frankly, I think that's a conservative number.
As some of the other posters here mentioned, with the advent of cloud based services and easy backup systems, retrieval is getting easier. So if storage and retrieval is solved, that leaves format evolution problems. But somehow I doubt in 20 years or even more that JPEG is somehow going to be harder to read than it is now.
A couple of years ago, Kodak's online photo storage service (in the UK at least) shut down. We just barely managed to get copies of the hundreds of pictures we had stored there in time.
Other recent (non-photo specific) examples of vast amount of data disappearing:
Geocities. Yes, large chunks of it was archived last minute. MegaUpload.
And we have Rapidshare on it's way to disappearing.
Cloud services can and will shut down, and it is not at all a given that we manage to preserve the data.
This creates a relentless churn where some proportion of older data disappears every day. All we really can do is to fight to keep the churn rate low enough, because we have no realistic prospect of saving everything all the time.
I don't think this invalidates the point though. In those alone we may have already lost more pictures than were ever taken before 1990 but at the same time the amount we have left is staggering. If you go back 100 years, there are famous people that we have maybe 1 picture of. No matter how sloppy we are with the majority of internet content today, I can't imagine that 100 years from now, they'll only be able to find a couple pictures of Obama and Putin.
But that's not really any different than any other medium. If there exists two copies of a book in different libraries and then one of the libraries burns down it's not any different. I mean we still talk about the library of Alexandria 2000 years later.
"The project was stored on adapted laserdiscs in the LaserVision Read Only Memory (LV-ROM) format, which contained not only analogue video and still pictures, but also digital data, with 300 MB of storage space on each side of the disc. Data and images were selected and collated by the BBC Domesday project based in Bilton House in West Ealing. Pre-mastering of data was carried out on a VAX-11/750 mini-computer, assisted by a network of BBC micros. The discs were mastered, produced, and tested by the Philips Laservision factory in Blackburn, England. Viewing the discs required an Acorn BBC Master expanded with a SCSI controller and an additional coprocessor controlled a Philips VP415 "Domesday Player", a specially produced laserdisc player. The user interface consisted of the BBC Master's keyboard and a trackball (known at the time as a trackerball). The software for the project was written in BCPL (a precursor to C), to make cross platform porting easier, although BCPL never attained the popularity that its early promise suggested it might."
And I believe the content on the original Laserdiscs has been reverse engineered more than once. Hopefully once content is on the web (unless it's behind robots.txt) it gets sucked up by the Internet Archive.
That of course doesn't take care of any rendering issues.
I feel like we are already experiencing that. I have a drawer full of pictures given to me by my parents, some of them 100 years old,but I can't open pictures I took with my digital camera 10 years ago,because the CDs I burnt them to are unreadable. Obviously the answer to that is that I should have printed at least some of them,but I don't know anyone who prints pictures nowadays. Nowadays I backup all of them to Picassa(google plus albums),but I have no idea what happens if I can't pay for monthly storage anymore? Or if I die and no one knows my google password? All of that data will be gone permanently.
I know it doesn't answer the problem at large, but if you are worried about this (or even if you're not and you just want your google data to survive you), Google released an inactive account manager a couple of months ago:
In the event your google account becomes inactive for a long period of time you choose, you can trust specific contacts with access to specific data. It has other niceties. I highly recommend setting it up.
>> "I don't know anyone who prints pictures nowadays"
Really? Most of the big supermarkets here (UK) and quite a lot of pharmacy chains have machines where you connect your device/USB/SD, select the pictures you want, and press print. It's relatively inexpensive too and I know people who use them all the time. Personally I don't actually mind if all of my digital photos are unreadable so long as I have the important ones in a physical format so this is a nice solution and takes away the pain of owning a good printer, buying lots of ink and photo paper.
Oh maybe I didn't phrase it correctly. I meant that I don't know anyone who prints out their own pictures - most of my friends only keep their pictures on their phones/digital cameras, physical copies are only ever printed as presents and such, never to archive anything.
>but I can't open pictures I took with my digital camera 10 years ago,because the CDs I burnt them to are unreadable.
We still had jpegs 10 years ago. All of my digital pics from 2005/6/7 till now are still on multiple hard drives and multiple systems. Each time I get a new computer I transfer them over.
Why are they unreadable? It sounds like you put them on a proprietary format or some obsolete picture software you used to burn them on.
I think its absurd that you are paying for picture storage and think its your only option. Amazon prime customers can upload pics for free now [1]. Why not use one of the free ones like Dropbox, Google Drive, SkyDrive etc...
They're unreadable because CDRs aren't an archive mechanism. They break down rather rapidly. Their shelf life is similar to a floppy disk. After 10 years, the odds of being able to read a CDR that wasn't stored in perfect conditions are pretty low.
Are you sure it would be as quick as 10 years? I have music CDs and mixes easily from the early 2000s that still work. A few years ago I found a Jurassic Park soundtrack that still worked.
My parents have CDs from the 90s that still work.
Would there be a difference in a music CD you bought from a store and one that burned yourself?
CD-R's, and especially older CD-R's, turned out to not have as long a shelf life as initially assumed. I recently found a bunch of old CD's I burnt ages ago, back when I got my first CD burner, and only one of them was still completely readable.
>>Amazon prime customers can upload pics for free now [1]. Why not use one of the free ones like Dropbox, Google Drive, SkyDrive etc...
All of them have space limitations - and I have over 400GB of pictures backed up to Picassa. So if I am not paying google, I would be paying Amazon or Microsoft or Dropbox or someone else. And my main point was - what happens to data on all of these paid-for services once I am dead? Amazon Prime backup is "free" as long as you keep paying for prime.
And yeah, CDs become unreadable after sitting in their envelopes for 10 years. Not all of them,but mine certainly did.
DRM also plays a role in this. Obsolete DRM techniques whose algorithms have long been forgotten will make it difficult to archive data protected by DRM onto new storage mediums.
It sort of is a big issue if you consider that preserving the history of something relies on lawbreaking motivated by piracy to enable it. Outside of the against-the-law bit it requires sufficiently motivated attackers and insufficiently good DRM.
The status quo is probably stable for games and big name software packages for the time being, but it doesn't inspire confidence for the long term.
You can virtualize a PC (or replicate the filesystem like Dropbox), but it is harder to virtualize a distributed system like Facebook's or Google's. And most mobile platforms are DoA without their vendor signing and cloud services.
It's always been hard to migrate out of social systems--convincing a well-connected user of a photo service to move away from the place where they've accumulated comments and tags is really hard. That metadata is not portable because identity is not yet portable, and it's what we're spending time on (lots more time than we spend making spreadsheets).
I think we might manage to keep "JPG as file" alive for 30-50 years, but there is lots more to manage.
We are getting better at archiving files. And thanks to a combination of people dedicated to emulation and the rise of virtual machines, there are few popular pieces of hardware we can't emulate in excruciating detail.
But many of the services I used a decade ago are already gone. And pretty much all the services I used two decades ago have completely disappeared, with some very few notable examples. And with them, vast amounts of data.
Some of it the Internet Archive have at least captured static snapshots of (and they really should have magnitudes more funding), but ten times that - or more - was data in walled gardens, behind logins or otherwise restricted in ways that means it is lost forever unless we're lucky and it turns out some admin held onto backup tapes they weren't really meant to keep.
And the problem with there is not to create a snapshot of a single server, but as you say that distributed systems are far harder. Even recent. Twice I've been contracted to help companies take over infrastructure that involved systems I'd worked on, and try to "package it up", and it was incredibly hard, because no matter how much you try to tear down and bring up individual servers or groups of servers and automate deployment, very few places running complex services ever try - or could afford to try - to tear down and bring up a full copy of their entire infrastructure.
Suddenly all kinds of nasty interdependencies and bootstrap problems nobody had needed to think about shows up.
If there's trouble with image, video or audio files, it's probably going to be because of DRM. The number of formats for those is small relative to the number of items stored.
Text documents in obscure formats can be more troublesome. There were many early word processors, and many file format versions. Those can be hard to convert, and there will be obscure text documents some historian will want to see a century from now. Converting stored text documents into some self-explanatory form like XML for archiving purposes is helpful. Even if the software doesn't survive, the text will still be there and someone can probably figure out the encoding.
Structured graphics files from old programs are a real problem. This is a big problem in the CAD world. CAD files aren't just pictures any more. They're detailed descriptions of physical objects and how they're made. Moving them from one present-day CAD program to another is tough. Going back 20 or 50 years will be tougher. People will have a real need to do that; buildings, aircraft, and industrial machinery last that long. The present compromise is to export such things in well-know formats that are viewable, but not necessarily editable.
Cerf is talking about preserving execution environments, so you can run old software years later. With so much "cloud based" stuff, and network oriented DRM, that's not going to work once the servers have gone away.
I've been thinking about this too, but I believe that Vint Cerf's solution is not feasible. Or, rather, it's a typical technologist "solution" that only prolongs the problem.
Imagine you're an archaeologist in the year 4500 or so, by our calendar. What we know of as modern Western civilization collapsed thousands of years prior, all you have are physical artifacts dug out of the ground in your attempt to reconstruct the history of this lost civilization. What would you see?
Circa the late 20th century you'll notice a precipitous decline in the volume of any surviving cultural material--meaning printed books, magazines, business papers of various sorts. You'll find fewer sound recordings, ticket stubs, even purchase receipts. The various detritus of daily life will appear to rapidly dry up and virtually disappear, around the world more or less simultaneously.
What conclusion would you make? The population at the time seemed to be stable or growing as measured by ruins of settlements, yet they seemed to be doing less or producing less? Perhaps there was a crisis in education and illiteracy became rampant? Maybe there was an ecological disaster and paper itself became scarce?
You see that a huge proportion of our culture that we take for granted--not just pop culture, cat gifs, etc--but substantial business and scientific research information as well would be completely lost to these future historians. On a more trivial level, when was the last time you saw a comprehensive, printed guide to iOS 8 (for example)? You could consider that a significant cultural/artistic artifact from our time, yet how will it be preserved in any meaningful way for future scholars?
I'm pretty sure any future archaeologist would quickly figure out the overwhelming number of rounded rectangles with glass screens had something to do with the decline of other media.
I'm not sure I agree with your hypothetical archeologist's assessment. We haven't seen any kind of real drop in the amount of garbage being produced[1] (it's barely leveled off in the last decade, and that doesn't include commercial waste). Sure, the kind of garbage is changing, but that happened in the past as well (from bronze garbage to iron garbage), and it indicated a change in technology, not a failure of the civilization. And an indicated change in technology is the pointer our future archeologist will need to start looking for something else.
There seem to be at least two future scenarios people are discussing here:
1) A future where society loses the ability to build and maintain tools to process digital data at scale, and as such, are only reliant on analog tools to reconstruct the past.
2) A future where due to the extreme advancement of technology, old file formats are "forgotten", and therefore they are difficult to decode.
I think Vint statements are to be taken in the context of scenario 2, although I kind of think that even if we forgot the format, we'd be able to rediscover it with enough analysis and computing power, which would be a non-issue in future scenario 2.
For scenario 1, I don't think we have answers short of burying long lasting dormant computing devices in bunkers all over the world, along with maps of how to find them cast in a very stable medium.
Strange, I would call these preoccupations BS if not for the source. This was a common concern some 20 to 30 years ago, when the internet was practically non existent and people still used old floppy disks and cds. Today, a good part of our personal files are stored in the cloud and storage devices have become more standard through the usage of universal interfaces. Some of our personal data will be lost (whatever is left in the hard drives of old computers and not backed up) but the trail of information the world is leaving behind is so huge that what worries me is rather that time seems to have frozen. Everything we leave behind us, documents, pictures, looks as fresh today as it was the day it was produced. And as easily retrievable.
On another thread people were saddened by rapidshare-like website being put down because a lot of content was hosted there. Let's be sure the cloud will stay long. So far most 80s data I see on the web comes from .. magazines. Ha paper.
The trick with digital data (including that stored in cloud services) is to keep it in motion. Movement from one service to another, leaving behind a copy that may eventually get deleted but just might stick around a while, is the best way to keep data available. Whatever disappeared with rapidshare would have been just fine if people have moved it to the various "latest and greatest" systems when they popped up, the only way it gets lost is if it stops moving.
What happens when one dies, who keeps moving that data? Companies eventually close. Millennia old documents made their way to us without anybody carying about them. Digital data seem to be intrinsically different from analog ones in that continuous care is needed. What you propose is a solution but I can't see who's going to do it for free as walls and caves did for paintings and inscriptions.
Not necessarily. Analog data decays too. The issue is to be blind to the fact that neither `technology` is timeless and perfect. People are giving in too easy and too deep in digital data nowadays.
Sure but at what rate? Most 20 year old CDs are still fine, to say nothing of paper stored with a sliver of care, what percentage of web companies from 1995 are still around?
Consider this fact, for perhaps a hundred years prior to digital technology consuming just about everything we collectively produced gigatons of paper output. How much of that is available today? A thousandth of a percent perhaps? Almost all of it has been lost and most of what does remain is available because it was related to someone famous or some famous event, or because it was widely replicated. Digital data is easy to replicate and spread widely. Given how cheap it is to store and how it gets cheaper every day it is not inconceivable that every bit that is generated and "donated to the public" will survive forever.
Not so fast. If you abandon your house anything in it will be trashed in 20 years. Humidity, fungus, animals, whatever. Data you mention has been wiped or the medium has been destroyed. I have some old Seagate 6GB IDE hard drives that are still working properly. So maybe web companies data could have been around if their hdd had been stored in a nice location.
It's not BS and you've laid out why. There are games and software on 5.25 and 3.5 disks that we can't access now. Even if we could, the software was written for a world that doesn't exist anymore and we've have to write other software to adapt them to our machines now. This isn't a problem that is just going to go away, formats and OSes and hardware will come and go as our technology needs and abilities change.
First, you're talking about a very short timespan right at the beginning of the digital age. Even if we were to lose everything, a missing 20 years timespan (1975-1995) of videogames wouldn't amount to a digital middle age.
I don't know about digital dark ages, but before GOG this was a serious problem. It's not that games were not accessible at all, it's that it was so hard to access them that most people would not bother. Are you really "preserving" history if it's just some CD in a box no one ever opens?
You are. Most of our history has been preserved in books that were inaccessible to almost everybody and that nobody opened for hundreds of years. The important thing is that things are preserved somewhere, even if to read them you need a fully equipped laboratory and boffins in white gowns. That's how we read our most ancient documents anyway. Once the information has been retrieved and copied, it's safe.
I think you are confusing history with archeology. They are obviously related, but they are not the same thing. Preserving history requires keeping it alive. Those books you speak about were endlessly copied and read. I am willing to bet that most of you know about history you have learned from articles or books of people who read other books, and so on. Archeology steps in when history ends up as a CD-in-a-box or tablet-in-a-tomb. And even then we often loose context and meaning.
I work for a Science museum that is struggling to keep its twenty-year digital collection (web, media, art, etc) archived. Resources seems the be the common factor. It takes time and energy to keep old projects from falling into disrepair or losing them altogether, and many institutions don't have the bandwidth or resources to dedicate to the long term. It's a problem, and I'm grateful to Vint Cerf for putting a spotlight on the issue.
The only long term answer is to detach the data stores from the applications which process them. It is the migration between apps that leads to this data orphaning.
This also leads to the conclusion that a standard mechanism for interfacing to the data from multiple apps on multiple data backends is needed. The Android storage framework is probably the best effort at this so far, but it's far from clear how used it is.
I was expecting more obvious suggestions to move all our stuff to google cloud "for the greater good". Apparently it was just a preparation…
But seriously, these complaints are dubious at best. Changing hardware? Well, maybe. Still, we won't migrate to newer hardware unless we can bring our stuff with us. Maybe only if 2D pictures will eventually be considered obsolete and never used since, but I doubt it as well. Changing software? I'm struggling to imagine how text documents could become unreadable. Even on completely new architecture it won't be hard to write a translator. The same way I cannot imagine bitmap images becoming obsolete, and every single format we use is just moderately complicated compression algorithm wrapped around bitmap, and every curious historian will be able to recreate it by himself. The same stays true for wav/flac,ogg,mp3, etc.
I can imagine how Adobe swf will become obsolete and it might be hard to find software to open it in 100 years. Or Microsoft Office slideshows. But it feel almost right.
Camlistore seems relevant - it's whole intention is to last (at least) 100 years, primarily by making the data format simple and making data migration (ie, moving between providers, between hard disks, etc) a common thing. https://camlistore.org/
That reminds me of the Encyclopedia Galactica http://en.wikipedia.org/wiki/Encyclopedia_Galactica. I wonder if this is a scheme to get comp scis to a remote Location and eventually start a new galactic empire.
When I think of future civilizations pouring over our archives I imagine they'll have invented new technologies that give them capabilities to discover facts in ways we cant imagine. So long as the bits are still correct I could envision an AI that can basically "crack" the codes and bring about a human readable version. Its sort of like cryptanalysis in that you have a bunch of "cipher" texts (eg, .DOC files) and maybe you even have some known plain texts (.txt files with similar file names) . Yes i realize .txt is ascii and that is a cipher in itself, but I am presuming that a 1-1 keyless mapping will be easy for them to figure out.
Encrypted stuff maybe totally unreachable though besides brute force.
I think this is one reason to keep the 8-bit computing world alive. The act in and of itself is worthy from the perspective that a lot of the older software is still perfectly useful, fun, and applicable to the modern world, and as well will inspire the perpetuation of platforms as we fall ever further over the abyss of hardware relevance. On one end is 'how relevant is the new hardware' and the other is 'how relevant is the old hardware' .. as soon as these become equivalent, we have a stable progression of human digital culture. But, we don't have that: everyone upgrades their new iDevices as soon as they can, and even in just the last few short years we start to see apps that just don't run any more. This is a given.
Which is why I think that the resuscitation of 8-bit Computing, specifically, is such a valuable thing to do: it provides context. When you've spent the evening actually having fun with 30-year old software, the urge to splurge on soon-to-be-redundant newgear is de-composed. Eventually, a person can understand that all computing architectures over the Age, So Far, are of use. That's how they got to be a working program in the first place: someone found it useful.
I recently downloaded a PDF of 80 or so BASIC programs, written to be as compatible with the plethora of machines that were available in the 80's, as possible. What a joy it was to see linked lists, self-modifying code, and competent optimization of program space while also using simplified interfaces, to be cross platform as possible. A modern comp-sci student can even still today, learn a lot of very important lessons about computers by reading such archives and going through 30 or so years of history. It factually is not a long amount of time. All those lost floppy disk collections, out there in the dumping grounds, or even the ones still working, hidden in the closet, have the potential to be just as relevant in 100 years as they were on the very first day of publication.
I urge anyone with an 8-bit stash to dig it out, soon enough, and find your active community. There are few 8-bit machines out there which don't have a thriving scene.
The irony is that Google have also participated in acquishutdowns and data deletion. They still have the Deja archive online, but I think it's reasonable to worry whether they might just bin it at short notice like Google Reader or Wave.
How is this different than any other point in history? Throughout time, the vast majority of records and artifacts were lost, and only some given percentage survive. Let's say it's ~2% for argument's sake...actually, the amount of material that survives would decay over time - 40% after 10 years, 20% after 50 years, etc.
I have no idea what the rate of decay for digital records, information, and archives would be, but I would think it would be higher than information stored as hard copies of paper, books, etc. Of course, we are also producing orders of magnitudes more information than we were in the past.
Who pays the cost of storage? It's not a simple question at all, especially when you consider that most of the information being stored is only marginally useful now and the future value is a complete unknown.
I quote a page of Stross' Glasshouse which I'm reading right now.
"We know why the dark age happened [...] Our ancestors allowed their storage and processing architectures to proliferate uncontrollably, and they tended to throw away old technologies instead of virtualizing them. For reasons of commercial advantage, some of their largest entities deliberately created incompatible information formats and locked up huge quantities of useful materials in them, so that when new architectures replaced old, the data became inaccessible."
The Long Now folks have been warning of this for at least twenty years. Their definition of a "Dark Age" is one for which we have no extant original records. Which describes the state of digital data precisely. Their concern with this is how to encode the data alongside the clock for distant descendants to read without access to the originating technology. From their exposition I learned to backup complete computers, not just workfiles., not even just the hard drives. Still not enough, even for decades let alone millennia.
"A company would have to provide the service, and I suggested to Mr Cerf that few companies have lasted for hundreds of years. So how could we guarantee that both our personal memories and all human history would be safeguarded in the long run?"
Meta: I know this is the Internet, and we're all supposed to complain all of the time. And I am aware of HN's submission guidelines. But do we really need to refer to him as Google's Vint Cerf? The guy is famous in our community. It's not like he needs another adjective, and it's not like Google owns him. How about just his name? I find this phraseology a bit disconcerting.
I find this strange. I know a lot about my parents, a few bits and pieces about my grandparents, and essentially nothing about my great-grandparents. Very, very little of those days was kept, so how can we be entering a "dark age"? Heck, archaeologists are digging through trash heaps in Jamestown trying to figure out how they lived.
This is valuable (as I understand it, they are using a metaphor for virtual machines), however the need for this in most cases could be obviated by using open data formats. The problems of physically degrading storage media, "bit rot", and the ability to interface with obsolete technology are also a big deal
This is why proprietary binary formats are evil.
We here so much of IoT now, but what we're really going to get is MS's IoT, Samsung's IoT, Apple's IoT. We are not ever really going to have a true IoT (assuming we even need an IoT).
"Vint Cerf is promoting an idea to preserve every piece of software and hardware so that it never becomes obsolete - just like what happens in a museum - but in digital form, in servers in the cloud."
The point is that the description/emulation would need to be a virtual machine, so it can be transported and duplicated, as long as you have the basic system to run it on. Otherwise the problem is just as bad when those "regular servers" die.
There are at least a few different things to be considered here, and I'm instantly reminded of a big stinker of a post by Mark Pilgrim about proprietary apps and data formats as well as another old article about how stuff that was archived by some government agency was not even readable or accessible later on due to technology changes (it's not one of those referenced in the Wikipedia article for "digital dark age").
Firstly, the medium of storage, the encoding and the interface form one angle to look at. Just like how floppies and CDs are now (almost) obsolete, there would be a future when there won't be any machines that recognize a USB storage device. The same would hold good for other technologies that we have quickly run through, to mention a few - PATA, SATA, PCI, PCI-X, PCIe and so on. Can you read an MFM hard drive today with any computer that you have? With adequate care, data can be moved from one medium to another, like how people learned to move music from tapes to CDs and then to flash drives and hard drives, then to the truly nebulous thing called "cloud", etc.
Next, consider the data formats themselves and the applications that support them. This is where proprietary formats, especially those that are not widely popular, would hurt the users. So if you're using, for example, Apple's document formats on Pages/Numbers/KeyNote, it's likely that those files will soon become obsolete (as they already have, where Apple does not support older formats in the newer iWork). Commonly used and supported formats that don't change rapidly, like JPG and PDF, for example, are safe for a much longer time because there are many applications to process these documents with, both proprietary as well as FOSS. Even the web pages stored by archive.org or any doc or xls files lying around from about 20 years ago - how many more years do you think browsers, word processors and spreadsheet programs will keep bloating up (like they have been so far) just to support older versions of doc, xls, ppt, html, etc.? At some point, the bloat will have to be cut and a decision made that older formats will not be rendered like they used to be. That would leave some clobbered text or gibberish or both showing up for any interested future humans to figure out whether it's worth preserving or not and to convert it to a newer format if they care.
Now, assume the data format is a long surviving one, like say, mp3 or jpg. How do you protect it from bit rot wherever it's stored? Just because you put it on Amazon or iCloud or Dropbox does not mean you can't lose part of the data to corruption of different kinds or lose all of your data due to system failures. Among the people who do regularly backup, perhaps only a fraction of a percentage actually verifies backups (if at all). With consumer level cloud options, there's not a lot of hope for data longevity for non-tech savvy people.
If you're dreaming up a beautiful future in the cloud for all data storage, how can you be sure that your data just doesn't vanish or that it doesn't diminish in quality? People have had that happen to their precious photos by sites that had sneaky terms and conditions about deleting old photos (or if photo prints are not ordered regularly), didn't allow full downloads at any point in time, and sites that went out of business. We can see people treating social networks as reliable cloud storage for their photos as well, disregarding the risks of making such an assumption.
Ignore data integrity and data format issues for a moment, and imagine a distant future where 64K displays are common. All your current and older photos and videos would either look terrible on those or may even be completely indistinguishable. Of what use would these artifacts be then for anyone?
Considering that a lot of data on the cloud is actually insignificant at the level of the human species (like rants and comments on social networks, LOLcats, etc.) and also the huge amount of data being created every second, how would someone in the future even sift through all this? It's somewhat similar to the NSA/GHCQ looking for a needle in a haystack the size of a huge mountain, except that this haystack is to preserve history, culture, etc. The current haystack being built would need a lot of archivists from different backgrounds working continuously to separate the wheat from the chaff and to also look at consolidating (or "packing") the information concisely (like gathering summaries, sentiments, trends, statistics) if we're ever to have an archive that future generations would even want to look at (say centuries ahead in the future). Leaving it to governments alone or corporations alone is not the solution since each would shape the archive in its own image.
This is a very complex topic for most individuals to deal with, and the above points didn't even touch upon cultural and linguistic shifts that happen over time for any data to be usable. I'm sure I've missed many other aspects about prolonging the life of (usable and useful) data.
P.S.: The best everlasting format, as many tech savvy people know, is plain, unencrypted text for textual content (this still assumes that the media can be read, because encoding may play spoilsport).
P.P.S.: All LOLcats may actually be cultural items to preserve for eternity! :P
I, for one, feel that humanity would benefit from this generation's digital presence being wiped from existence. Everyone on this planet is now dumber for having experienced it. I fully support and will tell my congressman to vote for a digital dark age.
The whole point he's making that we do not know what might be valuable to future civilizations. We simply cannot foresee the way they would study our culture, in the same way the ancient civilizations would not be able to comprehend our current civilization.
We don't preserve muddy door matts today "for posterity." That we can't know exactly what will be important in the future doesn't mean we can't reasonably predict what won't be.
Google won't, that's the fact. That completely useless snapshot will live till the end of the digital age, whenever it will be. That's why I don't understand Cerf's concerns.
But I think that dapper Vint "One is glad to be of service" Cerf is referring to how difficult it will be for far future historians to piece together records of daily life in the 21st century, especially from the view of individuals. Think of Da Vinci's notebooks, the personal journals of artic explorers, correspondence letters from great artists and statesman. Now imagine that those things are stored on floppy disks in Office95 .doc format. We'd have a hard time viewing that media today. In 1,000 years it might be impossible. A lot more will be lost to history than what the Internet Archive is currently storing.
I can't think of a great solution. When I watch some historical documentaries, it seems like every other scene is based on a quotation from a letter or an old photograph. We don't print those things out any more. They'll be lost, completely lost, when email services and social networks finally shut down. And even if people download personal backups, most file and physical storage formats have a shelf life measured in decades at the most. Letters can sit around in attics for lifetimes, undisturbed. A file formatted for a Commodore 64's word processor program might as well be written in a lost language and then one way encrypted. I'm not sure what to do about it, but it seems like a damn shame.
[0] https://archive.org/donate/index.php