My apartment got robbed the day after I got a brand new external to back up my MacBook. Guess where it was at time of robbery? Plugged into the laptop. Luckily, I had a second, much crappier-looking external with most of my stuff that the thief didn't take. But I still lost tons of pictures and music. I've been Dropboxing ever since. But would love recommendations on secondary online storage providers in case Dropbox ever whiffs..
I use BackBlaze. I have 3 accounts with them, one of which is linked to a Mac Mini that I use as a general storage machine. My set up at home is:
3 x Laptops
1 x media centre (Mac mini)
1 x family machine (Mac mini) attached to 2 x 1Tb software-raid drives.
Everyone backs up to the Mac Mini, and this is backed up to the 2 drives. This machine also backs up to Backblaze.
I too had an experience where I thought I'd lost all my photos of my kids (5 years worth) when a HD died. Luckily I had an earlier backup and I lost around 3 months worth in the end
Serious question - I'm a photographer so generating 5-10GB of new data in a batch is far from unheard of, I've got hundreds of GBs stored and it's all on external drives. Mozy's FAQ doesn't seem designed for me ;-)
When I've looked into this in the past, every provider has had issues with the new and archived content volume, and sometimes even the use of external rather than internal drives (!) that has made it unviable. I couldn't see from Mozy's site whether this was the case with them as well? If not and I can genuinely back up several hundred GBs of data (phone calls from ISP notwithstanding) and potentially add tens to low hundreds of GBs new data per month, from whatever drive source I choose, this sounds like what I want. Would this be the case from your experience?
Mozy was a big sponsor of a photography conference I attended last year so they definitely want photographers. I use them myself. Backed up a couple hundred gigs without any trouble (it did take awhile though). I don't use external drives so I can't comment on how friendly Mozy works with those.
Mozy will indeed work for the scenario you described. I easily have over 100GB of photos and 300GB of home videos and have had no problems. When I purchased my Mozy account, it was for unlimited data (not sure what it is now). Syncing of new data is transparent and works very well (though sometimes I've had 5000 small mozy_temp files in my TEMP folder that never got cleaned up). Restore from Mozy will not be fast but they do offer DVD burning service at a decent price. Mozy has an online login but don't expect Dropbox.com quality browsing. I also use Dropbox for personal documents and it works perfectly fine with Mozy.
If you're adding 50GB/month, that's at least 1GB of upload per night. You'll definitely have to have a good internet connection. I haven't had any network speed issues with Mozy personally.
As a counter-point, I had Mozy for several years and eventually dropped it because they throttled the upload and it took forever to get my data onto their servers. If you are aiming to backup a lot of data then I would suggest you find the cheapest all-you-can-eat service to use as a backup to your real backup process and keep a mirror drive that you archive to every week and take offsite. An online backup service will save your ass if you get lazy or forgetful, but it may have a hard time keeping up and will be a PITA if you ever do suffer a catastrophic event and need your data back quickly. Think belt and suspenders, with the regular practice of mirroring your important drive being the primary backup plan.
Thanks to you both; sounds worth a try. This wouldn't be my primary backup because I've already got a (semi-) regular mirror to a backup drive in my workflow, but domestic practicalities make getting an offsite copy complicated so that's what I'm really looking for here and Mozy sounds like a good fit.
I should stress too - 50GB / month is a worst case scenario, not a typical usage! Raws and processed JPEGs for a couple of big events can hit that together but I don't often have a couple of big events in the same month.
The only feature I'm still waiting for, is so you can specify different backup sets to archive to your local HD and to their online sevice, but that feature should be realeased with their next update. Other than that I love crashplan, especially the eature that you cannot disable it for longer than 24 hours.
You can also seed your initial backup, that way you don't have long upload times.
Side-bar: I sort of feel like a sitting duck walking around the city or subway at night with my white Apple ear-buds, especially the ones with the mic, screaming I have an iPhone in my pocket.
I've been a happy customer of SpiderOak for a few months now. Windows, Mac, and Linux clients, with file syncing between your computers, and public sharing of any files you've backed up. Data is encrypted on their servers, and restoring files is very easy.
I did this over 5 years ago (Fujitsu fi-5110EOX w/Acrobat 6.0 Standard running on WinXP). It been a complete success, but we've also been a victim of it's success: I read about all the cool new features of the newer Scansnaps and am very jealous (chiefly because either automatic OCR + indexing isn't part of the setup we have, or I'm too dumb to have figured out how to enable it). This one feature is ALMOST enough to get me to buy a new Scansnap, but (a) the old one works just as good as it ever did, and our "filing system", such as it is, is "good enough" (to let us retrieve needed docs), and (b) new or old, these devices aren't cheap (and I'm not clear how much benefit automatic OCR + indexing would be). Anyway, it seems it should be a software only feature; if only I could use the old scanner with newer software... but I investigated a year ago and my scanner is in the "obsolete" category as far as the vendor is concerned (sigh; is there any open source software that can interface to any of the Scansnap scanners?). Also when I read the swooning reviews of the new Scansnaps, it seems the platform is usually Mac, which I have no plans to move to (are all the coolest features also present on the Windows version?). And yes, shredders have been far less reliable than the scanner: we must have gone thru at least 4 so far...
If you already have PDFs, isn't OCR simply a PDF --> PDF+metadata conversion? Why wouldn't some open source OCR software work, where you feed it your old PDFs and it outputs OCRed PDFs. Is this some kind of tactic for scanner manufacturers to sell newer versions of scanner or am I understanding this wrong. (Maybe OCR happens at the pre-PDF level where the scanner can read the document in a higher resolution?)
You can indeed do OCR at any stage.
The only potential issue I see is that JPEG compression (as used by PDF) might interfere with some OCR algorithm and make it less reliable, so OCRing directly on the raw output of the scanner might yield a somewhat better recognition.
Then again, I know nothing of the protocols used by scanners over USB; it might well be that scanners send their result already compressed, or that the resolution is so high compared to the letter size that this doesn't matter.
One security consideration that I didn't see addressed -- if someone unauthorized accesses your electronic docs, it's easier for them to do it without your knowledge. If someone breaks into your house, you'll usually know about it.
This may not apply to government docs which are presumably stored somewhere electronically, anyway.
This is a good point, and one that I hadn't considered. However, like you said, most of my docs probably also exist in the databases of companies and the government. Ultimately, it's not something I worry about that much; I'm probably more concerned with losing access to my documents than someone else gaining access to them.
This is a very dangerous viewpoint. You should not ignore security simply because the data exists somewhere else. If data exists in multiple locations it is all the more reason to try and keep it safe.
Some simple steps can be taken to protect data integrity and security:
-Don't leave the data connected to the Internet 24/7.
-Power off the storage medium when you are done with it.
-Keep it in a physically safe location when not in use.
-Encrypt the data or the volume.
-Have a backup of your backup on a different medium and if possible different location.
-Don't advertise the data.
These are activities which are easy to set up and can help insure a more pleasant digital lifestyle. You could go to more extremes but then it becomes invasive. Not worrying about it much because you find it unlikely or just don't care is bad. Not worrying about it much is what you want to do because you have taken the important but easy steps to keep yourself safe.
You're missing my point. It's not that I'm ignoring security; I was responding to JabavuAdams' pointing out that if someone accesses my electronic records, I likely wouldn't know about it, while I'd find out if someone broke into my home and accessed my physical records. Since there are other places that my data can be accessed electronically, it seems short-sighted to avoid digitizing my copies to avoid undetectable access to my documents, since that can happen anyway.
Having said that, I really am not worried about someone accessing most of these documents, and certainly not enough to go through the hassles of what you outline in your comment. If someone wants to go through the hassle of breaching my security so they can pour through my phone records for the last five years, or peer at my receipts for lunch at the restaurant around the corner, they're welcome to. What exactly are we worried about here?
One difference between the threat model of the original query and your response is that if someone hacks into BigCo and accesses the digital records between you an BigCo then they know about all of your transactions with BigCo. If they hack your home document system then they know about your transactions with BigCo and with every other company you have ever done business with. The level of paranoid protection being suggested is probably not warranted, but unless you can think of a good reason why you would need to access these docs remotely it is probably a good idea to keep the drive with this data separated by an air gap from your systems when it is not in use...
You mentioned rental properties (mortgage docs?), tax papers, both of which likely have SSN's, maybe you have credit card numbers and bank acct numbers in these pdfs. I'd think you'd want this stuff encrypted. Thanks for a great post though, I'm totally inspired.
Meh. I was in the military, so my SSN has been spread far and wide. Plus I probably get 2-3 letters a year from companies who managed to lose this info. I keep an eye on my credit report for issues, but nothing so far.
Credit cards and bank account numbers are slightly more of an issue, but in the US your liability for theft from these sources is quite limited, even if you disclose the info. Plus, my bank account number is on the bottom of every check I write. What kind of scheme is that, anyway?
Don't get me wrong, I shred any paperwork with personal info, and I use encrypted services for backups, etc. But I don't lose any sleep over the worst case scenario of someone gaining access to this data.
Another option to consider is NeatReceipts. The base version comes with a very slim USB-powered scanner that can travel around with you, and they sell a higher-end version with a sheet-fed scanner. It can also be set up to work with certain third-party scanners, I believe the ScanSnap is one of them. The default is that each page goes into its own PDF, which is pretty annoying, but you can merge files together to make bigger documents.
What NeatReceipts offers over bulk PDF is the ability to capture and extract lots of metadata that is specific to the document type (it supports three types: document, receipt, and contact). You can use it to extract and export contacts from your big stack of business cards, or to create a formatted expense report from a pile of travel receipts.
What you lose is portability and accessibility, since all of the files live in a "cabinet" on your computer.
I'm in the early stages of de-papering, and I'm considering using NeatReceipts for business cards and receipts, and dumping all of the docs to PDF's that I tag and organize using Punakea (a great tagging program for MacOS).
I bought that same scanner. Freaking amazing. My intention is to scan 8 years of business documents, just need a plan to organize them for backup purposes. I did manage to scan my dad's novels he wrote on a typewriter in the 70s and 80s for him in record time. Tha alone probably justified the cost.
When I moved from Ontario to BC about a year ago, I scanned about 95% of my books using a similar system. Carrying them in my computer (+ Time Machine + Carbonite) was much easier than hauling them across Canada in boxes.
I kept some books: my Tufte books (e.g. Visual Explanations), an old book of poetry that I inherited, my Physics 101 textbook (of sentimental value)...
I didn't run the OCR. I found it too slow and the final results didn't impress me. OCR can't handle mathematical equations. I suppose I could have used OCR on the novels, but I didn't bother.
Scanning my books was a gamble because there was no tablet or e-Reader that seemed good enough, but there were rumors of an Apple tablet coming soon. Thankfully the rumors were true!
One thing I really like about his approach is he keeps it simple: instead of going crazy with a folder structure he relies on search. Instead of using some complex software product, he uses Spotlight. Very nice.
I did the exact same thing a year back, except for the books. Spotlight for the searching, Dropbox for the backups and an scansnap for the pdfs. It's working great, whenever I want to look something up I just search for the key words and there it is. It's a pity there is no search functionality in the Dropbox site. I don't think I'll part with my books though, but being able to search through them would be awesome.
Same here, but used the ScanSnap-Evernote integration feature, so Evernote takes care of both the OCR and the searching, and if anybody is feeling particularly motivated they can also tag.
I'm working on similar goals right now, but focusing on making a bookscanner (see http://diybookscanner.org ) instead, namely because many of the books I have are out of print. I figure I can later just use the bookscanner for documents as well
The pdf's it created were pretty small and the result of the project fits into a very small chunk of my Dropbox account (50GB). Its great having all my files for the past 20 years available everywhere I go.
I'd really love to do this too, but I just finished my studies and feel bad about spending 300 Euros on a scanner. Does anybody have good alternatives?
The problem with selling it is that society as a whole hasn't gone paperless, so you still have boatloads of inbound paper, which means you continuously have to scan stuff.
I don't have anything this elaborate. I have a cheap 3-in-1 printer that I scan some stuff with, and a cheap ~$70 digital camera that I use for stuff that it would be inconvenient to scan. The images are good enough to read on the computer though not great.
I went through my only filing cabinet the other weekend and realized I had no reason for holding onto 99% of the crap that was in there. I threw it all away, and I have one small file folder with stuff in it, and most of it sentimental. Good riddance.
Why doesn't someone make a good scanner like the Snapscan that also has a cheapo laser-printer attached for when you need to print out something like an airline boarding pass or a cheatsheet etc?
That's kinda like asking "why don't more toasters have coffee pots built in?" They're two separate devices, which do two separate things. (Actually, you could come up with an argument for the toaster+coffeepot thing, since they're at heart both resistance-coil devices...)
Most MFPs are horrible amalgamations, basically discrete devices held together with cheap plastic shells and shitty driver software.
Back in the 90s I remember seeing some dot-matrix printer addons that turned the printer into a scanner, and there might have been similar gadgets for inkjets, but modern scanners and printers wouldn't share many parts. There's little to be gained from an engineering standpoint by combining the two.
I have a Canon MX850 all-in-one. It's almost two years old. It's a good printer, scanner and fax. The printer has two paper trays and a cradle for printing CD/DVD labels. The scanner has a flatbed scanner and an automatic document feeder with double-sided colour scanning. It's listed at US$275 on Amazon.
I've used it with ABBYY FineReader Express[1] on my Mac to achieve the same result as the author for myself and my mum.
If you are after a toaster with a kettle, you can get one from Breville[2] and other manufacturers, including one with an egg fryer.
And from an Amazon review (November 30, 2009): Also of interest is that the scanner works under Ubuntu GNU/Linux using SANE drivers and gscan2pdf. However, in my experience the scanner controls are not as complete under SANE and gscan2pdf as with Fujitsu's proprietary drivers on a MAC and PC. Other seem to have had better experiences than I. Ultimately, Linux users should know that it is possible to use this scanner; however, it may take some tweaking. But if you use Linux, you are most likely used to this.
I don't collect a ton of business cards, and when I do, I usually just type it into my address book, or increasingly, friend them on Facebook, Twitter, and LinkedIn.
Windows 7 comes with Windows Search as part of the system, and it indexes PDFs and is excellent (unlike the prior iterations of Windows Search/Indexing Service)
The ScanSnap Organizer that comes with the ScanSnap is actually pretty good. Other than that, Windows 7 will index OCR'ed PDFs so you could do the same sort of thing as the article without needing "extra" software.