How does the rebuild time of ZFS's raidz compare to RAID5/6?

craigyk · on Sept 11, 2014

My largest ZFS pool is currently ~64TB ( 3 X 10 3TB (raidz2) ) The pool has ranged from 85%-95% full (it's mostly at 85% now and used mostly for reads).

Resilvering one drive usually takes < 48 hours. Last time took ~32 hours.

Something else cool: When I was planning this out I wrote a little script to calculate the chance of failure. With a drive APF of 10% (which is pretty high), and a rebuild speed of 50MB/sec (very low compared to what I typically get) I have a 1% chance of losing the entire pool over a 46.6 year period. If I add 4 more raidz2 10X3TB VDEVS that would drop to 3.75 years.

otakucode · on Sept 12, 2014

What is APF? And why not use the typical URRE rate to calculate your stats?

I'm always mystified at how stupid our storage systems are. Even very expensive SAN solutions from EMC and the like area just... stupid. We've got loads of metrics on every drive, but figuring out that those things should be aggregated and subjected to statistical analysis just seems to have not been done yet.

What I really want is a "pasture" system - a place I can stick old drives of totally random sizes and performance characteristics and have the system figure out where to put the data in order to maintain a specific level of reliability. Preferrably backed by an online database that tracks the drive failure rate of every drive on every deployed system, noting patterns in 'bad batches' of certain models and the like. If one of my drives would have to beat 3-standard-deviations odds to survive for the next week, move the damn data to somewhere better. And if you've got 2 150GB drives and 1 300GB drive, then each block on those drives has a rating of 2.0 - adjusted for the age and metrics of the drive.

Oh well, maybe when I retire in 30 years storage systems will still be as stupid as they've remained for the past 30 years and I'll have another project to add to the pile I don't have time to work on now.

wtallis · on Sept 12, 2014

btrfs at least offers the flexibility to use a collection of mismatched drives and ensure that N copies of the data exist across separate devices. Once it's parity-based RAID modes are stable, it should be able to retain that flexibility while offering a redundant option for cases where N=2 wastes too much space. That combined with regular scrubbing should suffice for moderately sized arrays.

Treating drives differently based on their expected failure rates seems like it would only matter for very large arrays trying to keep the redundancy as low as possible.

XorNot · on Sept 11, 2014

I have a 24TB RAIDZ3 array. Losing a disk out of that takes about 12-18 hours to rebuild.

diplomatpuppy · on Sept 11, 2014

It really depends on the load and if your raidz has 1,2, or 3 spare drives. From what I've experienced, re-slivering a the new drive in a raidz seems to happen at about 30% of the maximum ZFS throughput under medium load. And even better than many RAID5/6 implementations - it only re-slivers enough to store your data and not just the entire drive.

fiatmoney · on Sept 11, 2014

It is generally superior because it can skip blank blocks. RAID is byte-by-byte.