ZFS is also crazy good on surviving disks with bad sectors (as long as they stil...

comboy · on March 13, 2017

Pretty fascinating. But just based on this comment, I reckon that these 1500+ bad sectors drives aren't worth your time. So, why? Is it just that you wanted to play with all these options and don't really care about the data on these drives, or do you actually believe it's reasonable bang for the buck?

nisa · on March 13, 2017

I forget the disclaimer that you should not do this, ever :)

We had a cluster for Hadoop experiments at uni and no ressources to replace all the faulty disks at that time (20-30% were faulty to some degree from the SMART values - more than 150 disks). So this was kind of an experiment. All used data was available and backup up ouside of that cluster. The problem was that with ext4 after running a job certain disks always switched to readonly and this was a major hassle as this node had to be touched by hand. HDFS ist 3x replicated and checksummed and the disks usally worked fine for quite a time after the first bad sector. So we switched to ZFS, ran weekly scrubs - only replaced disks that didn't survived the srub in reasonable time or with reasonable failure rates and bumped up the HDFS checksum reads that everything is control read once a week. The working directory for the layer above (MapReduce and stuff like that) got a dataset with copies=2 so that intermediate data is still fine within reasonable amounts. This was for learning or research purposes where top speed or 100% integrity didn't matter and uptime and usability was more important. Basically the metadata on disk had to be sound and the data on a single disk didn't matter that much. This was quite a ride and it's long been replaced since then.

Just thought it's interesting how far you can push that. In the end it worked but turned out there is no magic, disks die sooner or later and sometimes take the whole node with them.

Don't go to ebay and buy broken disks out of believing with ZFS these will work. Some survive a while, most die fast, some exhibit strange behavoir.

That RAIDZ is more or less for "let's see where this goes" purposes, backups are in place it's not a production system.

comboy · on March 13, 2017

Hah, thanks for the story.

It seems that limited resources often lead to some interesting solutions (and learning new things). A factor that is not very common in VC backed companies.