I was going to do RAID for disk failure tolerance, either with a consumer NAS bo...

8fingerlouie · on Aug 23, 2023

>Isn't RAID parity slightly more space efficient than versioned backups?

It depends on your storage array. The more drives, the more space efficient RAID becomes, but RAID is still only a single copy of your data.

>Or is there a better way to do redundancy that doesn't involve just replicating entire files to multiple disks

Most of the industry is using erasure coding these days (https://blog.min.io/erasure-coding/) which allows for spreading your parity and data across multiple sites. Erasure coding usually runs a layer above the filesystem, as opposed to RAID which typically runs below the filesystem (Snapraid, Mergerfs and others excluded).

My personal "backup vault" is a Raspberry Pi 4 with a single 4TB external drive attached. The RPi runs Minio, and all backups are done through the S3 interface or SFTP/SMB. It is not the fastest box in the world, but it backs up (incremental) ~2TB in 30 minutes, which is "fast enough".

It consumes on average 4W, which means even with worst case electricity prices of €1/kWh (which we saw last winter), it costs less than €3/month.

For comparison, my NAS consumed around 50W, and at €1/kWh, that would cost €37/month in electricity alone, and then you need to add the cost of the actual hardware itself.

I switched off the NAS, and purchased ~10TB of cloud storage (main storage and backup storage at two different locations) for €20/month, and keep sensitive stuff encrypted with Cryptomator.

tremon · on Aug 23, 2023

but RAID is still only a single copy of your data

Btrfs has 3-copy and 4-copy RAID1 (profiles raid1c3 and raid1c4), doesn't ZFS have something similar?

8fingerlouie · on Aug 24, 2023

ZFS has RAIDZ1, RAIDZ2, RAIDz3 and Ditto blocks, which do much the same thing, although a bit differently.

My point was that even if you have 4 copies of your data, you still only have a single machine where your data is stored, and you're essentially just one flood/lightning strike/house fire/burglary away from all of it being gone. Or one bad power supply away from 4 dead drives.

With versioned backups, you have higher latency on restoring data in case a disk dies, but your data is also safer.

As i initially stated, RAID is for availability. It is great for making sure that data is available 24/7, but that is rarely what the average home user needs. Most home users access their data infrequently, and would be perfectly fine waiting a couple of hours while restoring data from a backup.