Apparently a few months ago it became known on the Chinese internet that the 980 Pro, 970 Evo Plus with new controller, and OEM versions are prone to getting unreadable sectors, where SMART 'Media and Data Integrity Errors' increases on every read attempt.
How I came across this: Ran into this last week(!) on a 6-month old drive -- but I'm not in China....hmm. Not just one bad batch? Interestingly, it's non deterministic - the data is backed up but trying ddrescue, it occasionally succeeds at reading a few kilobytes from the 5 MB of several runs of 512-16384 bytes that can't be read or written. Curious to see what happens with a firmware update and secure erase.
PS: I'm one of the victim with a 970 Evo Plus. The company that provided aftersell services, Lobcom, did not want to provide any RMA services and claimed nothing wrong is found.
tl;dr: All 3 of my Samsung M.2 NVMe SSDs have failed in less than 3 years. 100% failure rate.
My first SSD was a 1TB Samsung 970 EVO. It failed after 2 years and 8 months. It was replaced under warranty with a 1TB 970 EVO Plus.
That replacement has now also failed after 1 year and 9 months.
I bought a 2nd 1TB 970 EVO Plus in May 2019. It has now also failed (2 years and 7 months).
Both are expected to be replaced under warranty.
The 2 970 EVO Plus SSDs clearly had hardware errors (that were not accurately reflected in SMART data) that caused everything from system hangs, game crashes to file corruption on OTHER drives. I couldn't believe it at first but after 5 days of testing and trial and error, I had it confirmed. As soon as I removed those SSDs, my PC was completely stable again.
In the meantime, I have bought a Kingston KC3000 1TB drive as I no longer trust Samsung M.2 NVMe SSDs. On the other hand, I have a Samsung EVO 850 SATA drive which has been rock-solid.
The article mentions issues with the 900-series drives. It seems like the 800-series are still rock solid (also been running them for s few years now without issue)
There may be multiple, different issues with Samsung parts at play here. The 900 series issues seem to have been addressed with a f/w update; the 870 EVO issues were - allegedly - caused by bad NAND and the devices needed to be replaced.
ofc part of the problem here is the lack of public acknowledgement / information from Samsung on these issues.
As an example, an old Asus board of mine has trouble with modern m2 drives. A PICe m2 adapter solved the problem and the Samsung ssd worked without issues thereafter.
Worth checking if you have any thermal issues with it. Mine failed in a similar way due to presumably a rookie mistake of forgetting to remove the thermal pad tape on the mobo.
It's not likely that thermal issues would cause bad reliability on these things. At worst you could expect intermittently bad performance. You can check for this condition with `nvme smart-log`. If your device was often overheated, it would have "critical composite temperature time" non-zero. My Samsung that has been in service for years and has no thermal solution has a value of 1 minute and I happen to know that is because I heated it with a hair dryer to find out what would happen if it crossed the critical temperature.
Ha, interesting! Makes sense, the drive is supposed to just throttle itself before it can reach unsafe temps. I’ll def try to check, didn’t know the drive recorded that - thanks for the tip. In any case, now I know RMA is in order
The controller is less thick than the NAND flash so don't make proper contact with the thermal pad. I just discovered mine is affected by this. After heeavy reading the controller is at 67C while the NAND is at 42C.
Hmm I'm going to need to check my Samsung ssd from oct 2021 that failed the first week of Jan 2023. I had started noticing some quirks in spring 2022 but it wasn't a super important drive so I ignored it.
I have similar issue. It started failing mid last year. Then it got more and more frequent toward the end of the year. Last month I got tired of reinstalling OS for the 4th time and got a new system.
https://www.reddit.com/r/buildapc/comments/x82mwe/samsung_ss... https://www.reddit.com/r/DataHoarder/comments/x8arle/psa_sam...
How I came across this: Ran into this last week(!) on a 6-month old drive -- but I'm not in China....hmm. Not just one bad batch? Interestingly, it's non deterministic - the data is backed up but trying ddrescue, it occasionally succeeds at reading a few kilobytes from the 5 MB of several runs of 512-16384 bytes that can't be read or written. Curious to see what happens with a firmware update and secure erase.