Let's Stop Talking About "Backups"

gaius · on Dec 14, 2009

It's not us you need to tell Joel, it's your business partner. And if this is your way of telling him, isn't it a little passive-aggressive to do it in a blog post?

spolsky · on Dec 14, 2009

I think that a lot of people think they have backups, but they've never restored them, so I thought it would be a good practice if everyone starting thinking in terms of "have we restored" rather than "are we backing up." Of course, there's no question that the thought process started when Jeff Atwood's personal blog was lost, but don't think for a minute that the only way I communicate with him is by blogging... we talk all the time, over skype, over email, over FogBugz, and sometimes, when there's something other people can learn, in public on the internet.

telemachos · on Dec 14, 2009

The weird part is not that you made a blog post about this issue now.

The weird part is that you didn't mention the 400 pound gorilla in the room. ("I've been thinking about this for a long time, but it really hit home recently when...")

To my mind, that's why the post seems odd or passive aggressive. (It's a relatively short post as well. It just feels clipped somehow. The reader inevitably says, "What's not here?")

mechanical_fish · on Dec 14, 2009

It may be difficult to remember now that Twitter is all the rage, but essayists often aim for that "timeless" quality. You want the essay to seem as relevant three years from now as it is today.

The web has more than enough content that feels stale after a week. Better to aim for something with a slightly longer shelf life.

pyre · on Dec 14, 2009

Yea, but how is "my friend/business partner recently lost his entire blog due to poor backups, and it got me thinking..." not 'timeless?'

forensic · on Dec 14, 2009

To be honest, I don't give a shit about his friend/business partner.

But I do like his idea. So I appreciate that he omits pointless personal details and just sticks to the meat of the article.

When a writer expects me to read a bunch of self-indulgent cliche boilerplate I usually hit the back button.

mattmanser · on Dec 14, 2009

Not poor backups, poor restores. You're missing the point of the article!

oscardelben · on Dec 14, 2009

It's also poor backup. They thought the backup was being done by their hosting company while in reality it was failing silently.

Periodic · on Dec 14, 2009

A good backup is one that can be restored.

pyre · on Dec 14, 2009

A poor backup becomes a poor restore, while a good backup isn't necessarily a good restore. It's like quibbling over whether I said, "his car was completely totaled in the race" rather than, "he didn't win the race." No one thinks that not crashing your car implies that you've won a race, but you can't win a race with a crashed car.

nollidge · on Dec 14, 2009

This doesn't make any sense. Unless you have some other reason to suspect there's animosity between them, why would you invent drama between the lines?

mistermann · on Dec 14, 2009

Check the URL in your browser....and the author of the article being discussed. Sufficient explanation for anything that will transpire here.

flogic · on Dec 14, 2009

It's generally better if you try to avoid putting your business partners or employees down in public.

bhousel · on Dec 14, 2009

"Let’s stop asking people if they’re doing backups, and start asking if they’re doing restores."

No, because restores wouldn't have helped Jeff Atwood. You can backup a database onto the same server that hosts the database, and still successfully restore it later.

We should instead be talking about the concept of Continuity of Business. Decide how important it is for your data to continue to be available (constant availability vs. several days downtime vs. don't care if it disappears entirely?), then make a plan that gives you that availability.

What I'm describing isn't necessarily complicated. For example, I keep all my backup data (photos, music, etc) on a USB drive. Every morning it's rsynced to a hot spare. Every few weeks, I swap the spare with another one in a safe deposit box at the bank. (This means that we can recover from both immediate accidental deletions, and also ones that we don't catch for a few days, and disasters like theft or fire). I check every few days to make sure the drives are still working, and my wife knows how to switch the drives and has a key to the safe deposit box, just in case something happens to me.

Anyway, we feel that these are appropriate steps for protecting the data that we don't want to lose. But the important point is that everyone should make a plan that's appropriate for them.

spolsky · on Dec 14, 2009

In the article, I wrote

"If you’re running a web service, you need to be able to show me that you can build a reasonably recent copy of the entire site, in a reasonable amount of time, on a new server or servers without ever accessing anything that was in the original data center."

nomoresecrets · on Dec 14, 2009

Yes, in many ways, Jeff 'took one for the team' in a big way, in that he reminded a lot of people to "Er...I'll just check my backups and make sure they're working."

We're tech types, but e.g. I know a programmer who never backed up his laptop (with important stuff) because he'd never had a hard drive go bad on him. You'll never guess what happened...

The hard part is getting 'normal' people to back up. It's really hard. They have all their data/documents on that laptop, and all the digital photographs from the first 18 months of their baby's life or whatever, but if you advise them to back up, you sound like some crazy lady who throws cats at people in the street.

diN0bot · on Dec 14, 2009

lots of people want to, but it can seem overwhelming. i'm quite technical as a programmer, but even i avoid IT muck whenever possible.

that's probably why companies such as dropbox are doing so well. making backups/restores easy is sweet.

(ps - thanks superduper! http://www.shirt-pocket.com/SuperDuper/SuperDuperDescription... "heroic system recovery for mere mortals")

access_denied · on Dec 14, 2009

[ ] Your normal people do use Mac OS X Leopare [X] Your normal people do not use Mac OS X Leopard

Time Machine is the solution: everyone gets it (at least the normal people I introduced it to).

oscardelben · on Dec 14, 2009

Also I see your business partner saying that the chance of a fire in the datacenter is so small that he should also worry about meteors at that point. I think that event is not that rare tough, just recall the last one in Seattle...

lucifer · on Dec 15, 2009

What's ironic (and somewhat irksome) is the preachy tone of your post.

wglb · on Dec 14, 2009

Good blog post, particularly since it agrees with something I noted a little while ago: http://news.ycombinator.com/item?id=990903

mechanical_fish · on Dec 14, 2009

Unless I've missed further developments in the story, his business partner trusted the "backups" of the hosting company. Which did not properly restore.

So this is aimed one step farther up the chain.

Or, rather, it's aimed at all of us. It is a lesson that a lot of people need to learn.

andreyf · on Dec 14, 2009

Well, Jeff was asking his host "do you back up?", not "can you restore?". That's what the post is about. (Reading comprehension FTW!)

jseifer · on Dec 14, 2009

I think this was directed more at his business partners hosting company.

gaius · on Dec 14, 2009

Who chose them and what due diligence was performed? This is business 101.

mahmud · on Dec 14, 2009

Give me a break! (Hosting) companies will throw more references and specs at you than you know what to do with, and you have no choice but to take them for their word. The only thing you can do on your part is keep your own backups, independent of them.

pyre · on Dec 14, 2009

> he only thing you can do on your part is keep your own backups, independent of them.

That's just common sense. What happens if your hosting company decides to be dicks about some billing dispute and holds your data hostage? Even the if hosting company-powered backups work, you're still hosed without your own backups.

mahmud · on Dec 14, 2009

Exactly. Companies can say one thing and do another. My argument was against the accusative tone of "whoever did the due diligence".

axod · on Dec 14, 2009

Doing your own backups isn't rocket science. Why you would outsource such a valuable thing is beyond me.

Periodic · on Dec 14, 2009

With backups being such a common thing, and something you probably could do but would rather not worry about, I don't see why you wouldn't outsource them.

Simple backups are easy. Understanding when something is sufficiently backed up and how to know whether it is backed up can be quite complex, particularly if large amounts of data is involved.

USB drives work great for a desktop with 1 TB, but not so well for 20TB. Then there is the question of how you know whether everything is sufficiently backed up, whether it is free from corruption, how you will move the backup data to a live server in the event of a restore.

DenisM · on Dec 14, 2009

I find it disturbing how many people decided to comment or up-vote on Joel-Atwood angle, or insult each other.

Remember: great people discuss ideas, normal people discuss events, shallow people discuss other people.

ghshephard · on Dec 14, 2009

What I find someone charming/quaint about Joel's Posts on "Operations" over the last six years that I've been reading him, is that he is, slowly but surely, discovering the "Art of Operations" - albeit at a glacial pace.

Most People who work in Production Operations environments of any scale, discover what has taken Joel the better part of a decade, in the first two-three years of their career.

I almost feel like that Airplane passenger sitting beside Brooks Jr - Brooks saw him reading his book, "Mythical Man Month" - and asked the guy (who had no idea who he was sitting beside) what he thought of the book - The gentleman responded that it was basically a summary of things he knew already. Joel is a giant in the industry, but he does have a tendency to discover/restate the obvious.

"It's not backups, but the restores that matter" - is kind of the mantra of every single person who has ever been responsible for backups.

Then, you go to _any_ class on running a production environment, and you discover things like RPO, RTO, Dress-Rehearsals, etc.. and the whole "It's restores that matter" begins to look quaint.

theBobMcCormick · on Dec 15, 2009

What I find amusing about the whole thing is that it's like a microcosm representation of how developers always think operations is trivial and unimportant..... until they have to do it themselves! :-)

josephkern · on Dec 14, 2009

Totally agree on these points. The Joel-Atwood experience is not that of two programmers starting a company. It's a story of two programmers learning about System Administration.

lifeisstillgood · on Dec 14, 2009

the "Art of Operations" suggests a book or similar that you are referring to - is there one, or am I reading too much into some double quotes?

josephkern · on Dec 14, 2009

Yes there is a book: The Practice of System and Network Administration, Second Edition. This is the best Operations book I've ever bought. Worth every penny.

vdm · on Dec 15, 2009

Because Google isn't links:

http://www.amazon.com/Practice-System-Network-Administration...

wglb · on Dec 14, 2009

This is good food for thought. Let's also add to this a concept from a different realm: Everybody has at least two dns servers listed in their /etc/resolv.conf, right? The reason is that in case one of them goes down, there is the other one.

So this seems like a good lesson to take about backups. Mebbe three? One by your hosting provider, one at tarsnap, one on a separate dat tape, one on a usb stick?

A good point is though that even something as big as a dat tape looks pretty small by the standards of what we need to back up today.

Goladus · on Dec 14, 2009

It's helpful to boost signal for this message. It's an old message, but planning and practicing system restores can be as expensive in terms of equipment and manpower as actually making the backups. This leads to a lot of neglect.

Sukotto · on Dec 15, 2009

Also this, similar take on the horror of backups gone wrong: http://www.penny-arcade.com/comic/2005/8/10/

Don't blindly rely on your partner to do it... Trust, but verify.

loupgarou21 · on Dec 15, 2009

... shouldn't it be common practice to test your backup system to make sure that the restore procedure meets the requirements of the client (company, etc?)

The IT company that I work for creates a backup system based on the requirements of our clients and then demonstrates the whole backup and restore procedure to make sure that it falls in line with what our client actually wants. It's really not difficult to do. Sure, some of the restore procedures may be slower (depending on other requirements, such as cost,) but the client knows that will be the case and signs off on it.

mark_l_watson · on Dec 14, 2009

Common sense and obvious. In the 1980s I worked on a large DARPA project where a huge hit was taken because our admins never tried to restore from backups. It is the kind of lesson that is (hopefully) learned with just one bad experience.

This is another reason why I like EC2 deployments: it is fairly easy to take your backups (automated deployment scripts, application, data) and spin up another copy of your whole system (except for flipping the DNS). Make sure those EBS-backed EC2 AMIs are really bootable and functioning :-)

DenisM · on Dec 14, 2009

What happens if Amazon runs out of machines, has a system-wide corruption that was noticed too late or decides to be dicks about something or other?

mark_l_watson · on Dec 14, 2009

Good questions.

I would think that if they are making money with AWS then they will keep buying more servers.

I usually trust S3 for backups and restores but periodically back my own data off of AWS to local storage.

I expect both Amazon and Google infrastructure services to experience outages from time to time. However, they have far more resources and expertise than I do to provide scalable services for a low cost.

DenisM · on Dec 14, 2009

I was thinking about making a set up where my S3 backups are automatically mirrored to rackspace or something like that. That would be a really neat setup.

hexis · on Dec 14, 2009

I tend to be a little paranoid about backups, so I have a few different disks backing up my main, desktop, machine. But I also use one of my backups to sync data to my laptop, not quite a full "restore" due to a big size difference in the respective drives. But, generally speaking, the two machines are in sync and I can be sure that at least one of my backups works reasonably well.

DannoHung · on Dec 14, 2009

This whole ordeal is getting me motivated to actually buy a cloud backup service (personal use, not business use). I was thinking of Carbonite or Backblaze. Anyone have any experience with those?

slig · on Dec 14, 2009

I used carbonite in early 2006 and the software was horrible. Eventually, I tried to uninstall and the process failed. I got a half installation that wouldn't work and couldn't be removed. I didn't try too hard after that, because I was planning to format and start over.

jsz0 · on Dec 14, 2009

Full disk image backups are a good solution for this problem. No worries about partial backups or a complex restoration process. It's totally inefficient but storage is cheaper than man hours.

0xbadcafebee · on Dec 14, 2009

I'm assuming you're talking about some kind of atomic file-or-block-level backup such as LVM snapshots? Large files such as databases can change while reading them over a long period of time, so a standard disk image or file copy wouldn't be reliable for a live system.

ryanpetrich · on Dec 15, 2009

zfs send/receive is amazing. I wish other filesystems had it

orblivion · on Dec 14, 2009

As if I didn't lose enough sleep pondering backups already.

lazyant · on Dec 14, 2009

Testing your (off-site) backups is an obvious first item in many lists http://watsec.com/article/49

dnsworks · on Dec 15, 2009

What he's saying is, "We failed miserably at having good process and procedures. Because of this we are going to lecture others, and point the blame at everybody but ourselves, in hopes that they'll stop pointing out how much credibility we lost over this."

aw3c2 · on Dec 14, 2009

Pointless write-up about linguistics. He says "restore" is the important thing, not the "backup". Well, duh.

itgoon · on Dec 14, 2009

Not really. That's actually one of my favorite "health check" questions: when was the last time you restored?

Most places have very reliable backup procedures. Most of those have very poor restore procedures - I'd say about half fail when put to the test.

aw3c2 · on Dec 15, 2009

To me that is like saying "You can put money on your bank account but you should make sure you get it back". Of course you have to test your backups.

camccann · on Dec 15, 2009

Running a restore is also a very effective way to discover that your backup procedure wasn't actually very reliable after all.

michael_dorfman · on Dec 14, 2009

If you think this post is about linguistics, you're missing the point

Since the restore is the important thing, that's the one you have to test. And if you haven't tested restoring, your backup is (quite possibly) worthless.

flogic · on Dec 14, 2009

It's a well duh, but still most people fail to test restore. This may be the greatest thing about distributed version control. Every clone is a restore. A limited restore but still a restore.

jfoutz · on Dec 14, 2009

backups are for suckers. keep the data on a few different spinning disks. if you can solve data synch between two sites, just keep your data synched.

it's much better to ask yourself how long to replicate your existing system then how to back up. pxe boot to a kernel that you can install over the network with, bcfg2 to get the thing up to spec, start copying data.

a lot of machines can be back and configured in 5 minutes.

that said, i'm not you. i don't have terabytes of data to do statistics on. maybe there are other horrible details i'm forgetting. fast rebuilding is a pretty awesome strategy for a lot of cases.

mechanical_fish · on Dec 14, 2009

maybe there are other horrible details i'm forgetting

Yes.

Why should I bother to write this? I'll outsource the task to the authors of High Performance MySQL, Second Edition, page 475:

Backup Myth #1: "I Use Replication As a Backup"

This is a mistake we see quite often. A replication slave is not a backup. Neither is a RAID array. To see why, consider this: will they help you get back all your data if you accidentally execute DROP DATABASE on your production database? RAID and replication don't pass even this simple test. Not only are they not backups, they're not a substitute for backups. Nothing but backups fill the need for backups.

diego · on Dec 14, 2009

This is the incremental backup script I use on my Linux box at home, a quick-and-dirty imitation of what Time Machine does. Obviously $HOME/backup is a different physical disk. Feel free to improve on this.

--------

#!/bin/bash

HOME=

date=`date "+%Y-%m-%dT%H:%M:%S"`

rsync -aP --link-dest=$HOME/backup/current /home

$HOME/backup/back-$date

rsync -aP --link-dest=$HOME/backup/current /etc $HOME/backup/back-$date

rm $HOME/backup/current

ln -s back-$date $HOME/backup/current

#see if the disk is getting full

FREE=`df -lk|grep sdb1|awk -F" " '{print $5}'|awk -F"%" '{print $1}'`

#alert me if the backup disk is getting full.

T=80

if [ "$FREE" -gt "$T" ]

then

    df -lk| mail $myaddress -s"disk alert $T% capacity"

else

    echo "backup disk is less than $T full"

fi

Locke1689 · on Dec 14, 2009

You should think about keeping an offsite backup if possible. A second disk won't necessarily help if your computer gets dropped in a pool, house catches on fire, etc.

jfoutz · on Dec 14, 2009

Completely reasonable response, forgive me for being an inarticulate noob in my post.

I thought delayed replication was one of the main strategies they advocated in that book. I don't have it on hand. my mistake.

sandGorgon · on Dec 15, 2009

that is very interesting - I though replication was the nearest to realtime "backup" as one could get. Is there any other way, one can do incremental, realtime backups for, say, mysql/postgres ?

gvb · on Dec 14, 2009

Obligatory xkcd link: "Bobby Tables" http://xkcd.com/327/

idlewords · on Dec 14, 2009

Replication doesn't defend you against deleting the wrong file, or messing up an update to your database, or any of a large class of PEBKAC issues.

It's also not useful if your main files get corrupted and you diligently propagate the corruption. See: ma.gnolia

jfoutz · on Dec 14, 2009

Setting aside the database, file problems are fairly easily solved via svn, .snapshot or a good configuration engine.

As far as the database goes, I'm big on stored procs + archive tables, but i'll leave that to the grown ups ;)

masomenos · on Dec 14, 2009

...until you forget to check in important changes, or neglect to add a seemingly trivial configuration file to your SCM.

Whereas, a policy of automatically backing up everything except for your exclude list would have saved your bacon.

spolsky · on Dec 14, 2009

What if you hit rm -rf * and that gets synched?

I don't mean to be rude, I don't know anything about you, but if you were a system administrator working for me, today would be your last day.

jawngee · on Dec 14, 2009

Of course it would be his last day, he's a sysadmin and he just typed rm -rf * on /

mechanical_fish · on Dec 14, 2009

You know, you tell yourself that you understand rm -rf. And yet somehow you always find a way to type it on something you shouldn't. Computers are perverse objects.

It's like reading an airline crash report, which I'm told often ends up sounding like a comedy of errors. Most airline crashes have an entire handful of causes, all of which are individually innocent, but on the one extremely rare occasion when they all happen at the same time they add up to disaster.

ajross · on Dec 14, 2009

I fat-fingered the destination directory in a script once and accidentally shrunk my entire photo archive to 128x128 thumbnails. Thankfully I did have backups of all but a handful.

I'm always amazed at the number of people who spend their time thinking about replication strategies without understanding that data can be accidentally deleted in production too. I guess backups look "easy", so it's not as sexy an area of architecture planning.

pyre · on Dec 14, 2009

People just don't always grasp the big picture. People that see RAID as a backup strategy are only looking at 'hard drive failure' as potential scenario. The same for machine replication. They are only trying to prevent against what happens if 'the machine dies.' As long as they realize that this doesn't protect against, "I accidentally deleted the files," then it's all good.

mechanical_fish · on Dec 14, 2009

In a previous job I had a so-called system administrator laugh at my paranoia when I suggested that his awesome backup system -- protecting the equivalent of several dozen person-years of scientific data, millions of dollars' worth -- wouldn't protect us against a fire in the building. Or, for that matter, a thief who liked to steal computer hardware.

Job security note for sysadmins: When someone suggests a disaster scenario, don't open your response with a laugh.

pyre · on Dec 15, 2009

Well, you know that you're talking to someone that doesn't plan for the future when, "What happens if the building burns down?" is responded to with, "You're just being paranoid." I guess the appropriate response would be to point out that using his logic tons of taxpayer money could be saved by nixing the fire departments (and tons of corporate money could be saved by not paying for fire protection -- alarms, detectors, escapes, etc).

gaius · on Dec 14, 2009

if you can solve data synch between two sites, just keep your data synched.

LOL! And if you get a corrupt block on your primary site, what're you going to do? All your standbys are instantly tainted!

Better leave this one to the grownups.

jws · on Dec 14, 2009

Replication is not backup. It only protects against hardware failure.

Spinning disks are good though. A lovely spot for backups. Just put them in a different building.

Edit: In the time it took me to write this 5 other people also lambasted this poor fellow. Ouch.

InclinedPlane · on Dec 15, 2009

It doesn't matter if your data is replicated across 3 different continents. If all of your data is accessible behind the same admin login, or if it is all vulnerable to the corruption in the original, then it still has only 1 point of failure. A proper "spinning" backup is an immutable journaling system that can't be trimmed except via extremely secure methods (e.g. physical access to the machine).

apowell · on Dec 14, 2009

A real-time mirror won't help if your data gets corrupted (via a fat-fingered shell command, errant script, or a compromised system). When it's appropriate, it's awesome, but I don't think it's a replacement for a static "offline" backup,

InclinedPlane · on Dec 15, 2009

Sorry, this is not only WRONG but actually HARMFUL advice.

Redundancy is not a backup. If someone with full admin control to your system can destroy all of your data then you do not have backups. A proper backup is physically separate from your primary data and, preferably, can't be destroyed with mere admin access to the system. The number of sites that have had catastrophic data loss due to relying on mirroring instead of true backups is quite significant.