"A relevant observation from our Operations team on the Seagate drives is that they generally signal their impending failure via their SMART stats. Since we monitor several SMART stats, we are often warned of trouble before a pending failure and can take appropriate action. Drive failures from the other manufacturers appear to be less predictable via SMART stats."
~10 years ago, I remember google research put out a highly cited paper wherein they found that SMART stats were not a particularly strong indicator of impending drive failure (50% of drives had no SMART indications of problem before failure). http://research.google.com/pubs/pub32774.html
Has this now changed (at least for Seagate)?
Reliability/longevity is nice but a signal of impending failure is far more valuable from an operations point of view.
Hi! Yev from Backblaze here -> Yes, we only report the stats of what we have in our environment. As much as we'd love to have a test of SSDs in a pod (augmented for SSDs of course) they're just not feasible from a cost per GB perspective. Hopefully sometime though :)
Input/Output rate, bandwidth and IO roundtrip delay.
* even the slowest SSDs have significantly higher I/O rates than the best mechanical drives, and the comparison between best-in-class mechanical and enterprise-class PCIe SSDs is just ridiculous: a 15K SAS drive will do 200 IOPS, a high end SSD will do a million
* 15K SAS drives will top out around 250MB/s on bulk sequential reads (that's a best-case scenario), high-end PCIe SSD are in the 2.5GB/s range
* HDDs have a latency of 10~20ms, SSDs have a latency of 100~200µs (RAM has a latency of ~100ns)
Have you productized these learnings in a a powertop-like tool for Linux?
Smartmontools are not intuitive enough for the layman to use in any meaningful way.. And backblaze
has really built some serious learning here that could be of use to everyone.
I suspect rotating drives have a variety of several failure modes, some of which could be predicted by SMART, others which it's unlikely to be predicted.
Each new model is probably bound to have a different pareto of failure modes.
Now, if only Seagate had human-readable SMART values.
(I say this as I've been recently built a freeNAS box with a combination of Seagate NAS and WD Red HDDs - the WD's make it easy to look at the smart stats and know what's going on. The Seagate ones, not so much.)
"A relevant observation from our Operations team on the Seagate drives is that they generally signal their impending failure via their SMART stats. Since we monitor several SMART stats, we are often warned of trouble before a pending failure and can take appropriate action. Drive failures from the other manufacturers appear to be less predictable via SMART stats."
~10 years ago, I remember google research put out a highly cited paper wherein they found that SMART stats were not a particularly strong indicator of impending drive failure (50% of drives had no SMART indications of problem before failure). http://research.google.com/pubs/pub32774.html
Has this now changed (at least for Seagate)?
Reliability/longevity is nice but a signal of impending failure is far more valuable from an operations point of view.