Hacker News new | past | comments | ask | show | jobs | submit login

This is interesting...

"A relevant observation from our Operations team on the Seagate drives is that they generally signal their impending failure via their SMART stats. Since we monitor several SMART stats, we are often warned of trouble before a pending failure and can take appropriate action. Drive failures from the other manufacturers appear to be less predictable via SMART stats."

~10 years ago, I remember google research put out a highly cited paper wherein they found that SMART stats were not a particularly strong indicator of impending drive failure (50% of drives had no SMART indications of problem before failure). http://research.google.com/pubs/pub32774.html

Has this now changed (at least for Seagate)?

Reliability/longevity is nice but a signal of impending failure is far more valuable from an operations point of view.




From our experience, we've found 5 SMART stats that are useful in predicting failure: https://www.backblaze.com/blog/hard-drive-smart-stats/

Many SMART stats aren't particularly useful in predicting failure as they simply correlate to the age of the drive in some fashion.

Also, here is our data on every single SMART stat for all of the drives we have: https://www.backblaze.com/blog-smart-stats-2014-8.html

Gleb (CEO, Backblaze)


Gleb,

First thanks for all your company's sharing of such data, as well as the pod open platform. Kudos.

Second, can you do a Writeup specifically and only about SSDs?

Thanks


Hi! Yev from Backblaze here -> Yes, we only report the stats of what we have in our environment. As much as we'd love to have a test of SSDs in a pod (augmented for SSDs of course) they're just not feasible from a cost per GB perspective. Hopefully sometime though :)


As far as I know, they don't really use SSD's because of the higher cost/GB. So they probably don't have much to say about them :/


Is there any good use case of having ssds in a data center, if you did not care about cost?


Input/Output rate, bandwidth and IO roundtrip delay.

* even the slowest SSDs have significantly higher I/O rates than the best mechanical drives, and the comparison between best-in-class mechanical and enterprise-class PCIe SSDs is just ridiculous: a 15K SAS drive will do 200 IOPS, a high end SSD will do a million

* 15K SAS drives will top out around 250MB/s on bulk sequential reads (that's a best-case scenario), high-end PCIe SSD are in the 2.5GB/s range

* HDDs have a latency of 10~20ms, SSDs have a latency of 100~200µs (RAM has a latency of ~100ns)


They're used in DCs plenty when speed is required


Yep, I work for a very large government department here in AU and we have a tiny bit of SSD in our DC for the stuff that really needs it.

It's probably not 5% of our total storage, though.


IOPs


They had some in some of their other reports but too small a population it would seem...


This is awesome!

Have you productized these learnings in a a powertop-like tool for Linux?

Smartmontools are not intuitive enough for the layman to use in any meaningful way.. And backblaze has really built some serious learning here that could be of use to everyone.


Will this turn into a tradition ?


> Has this now changed (at least for Seagate)?

I suspect rotating drives have a variety of several failure modes, some of which could be predicted by SMART, others which it's unlikely to be predicted.

Each new model is probably bound to have a different pareto of failure modes.


Now, if only Seagate had human-readable SMART values.

(I say this as I've been recently built a freeNAS box with a combination of Seagate NAS and WD Red HDDs - the WD's make it easy to look at the smart stats and know what's going on. The Seagate ones, not so much.)




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: