Most HDDs are going to Server Market. And from that point of view the capacity increase has been very very slow.
Edit: It seems lots of our computing components has reached plateau or soon within this decade. From HDD, DRAM, NAND, Chip Processes, etc. That is not saying they wont improve, but their Unit Cost aren't dropping any more.
And for the server market there's a sweet spot for each workload for how big you want your spinning disks to be. These disks aren't getting any faster and rebuilding a 22 TB HDD is going to take a while and that imposes serious durability risks.
It's not just the rebuild,some storage software (CEPH for example) also validate the data on the disks from time to time, and since the iops is quite limited on spindles, it takes more and more of that iops bucket just to verify the data that you have on the disks.
ZFS also periodically re-silvers the disks to keep the FS in top shape. IIRC ZFS tries to re-silver disks when the traffic is low but, it's not always possible.
I bet that the disks is 16+ TB range will be used for colder tiers of the storage. Also, they should be useful in the OST of the LustreFS since the random read storm hits the MDT more severely.
> And the scrub has to be initiated somehow, typically cron job, it's not automatic in ZFS.
The devices were Oracle/Sun ZFS appliances (I tortured a 7320, we liked it & bought a full-out 7420) so, maybe it was set up to scrub/resilver automatically in some cases.
As I've aforementioned, we're more of a Lustre shop and retired the systems some years ago.
Thank you. I'm no expert in ZFS TBH (We use Lustre much more) but, IIRC, when I was benchmarking the then new Sun Oracle ZFS 7320, I remember it was resilvering the disks after especially torturous loads, at night.
Maybe it was specific to the appliances (Our behemoth 7420 did the same) or, something was wrong. I remember Oracle/Sun guys jokingly asking me whether I succeeded to make it resilver the disks and, hearing it did indeed resilver the disks a dozen times visibly upset them. They've only said that "Pack it up, we need to go".
Since all heads are mounted to the same arm, only one head can lock onto a track at a given time. What if each head had independent micro-actuators so that all heads can be locked to their track of the same radius, with the data being distributed across all heads. Wouldn't this improve throughput n-fold?
Edit: Seems like modern hard drives already have micro-actuators for each head to overcome precision and bandwidth issues with the main arm actuator, but it seems like none of those enable a range of motion that is sufficient to lock multiple heads simultaneously.
For a long time, mainframes had hard disks with a set of heads on opposite sides of the platters. I'm not sure the use cases for spinning disks these days justify that kind of investment - they aren't (or shouldn't be) used in random-write-heavy workloads (as in frequently updated databases), but more for archival and, often, they sit behind a flash disk acting as a cache (I do that for my home server - a lot of write traffic never hits the disk because it's overwritten before being evicted from the flash)
Data verification to predict drive failure seems like a decent use case though. The drive is always spinning, so you essentially gain an extra data path to do read-verify-fix IOPs. Whether you can build this in cost-effectively is a big question though. But at rebuild times climbing to upwards of a week, being unable to do continuous health monitoring starts to get real problematic.
This would be cool, but it can be done with a single set of heads if the utilization is less than 100%.
Multiple sets of heads would be useful if the limiting factor is positioning.
BTW, I don't know how a multi platter drive records its disk blocks. Is a block contained in a single platter or does it spread across all platters and read/writes from all heads at the same time?
Dual arms would be handy if the drive had a RAID-like checksumming scheme between platters. If a platter is corrupted but is still readable/writable (it wasn't a head malfunction), the drive could rebuild itself without the help of a computer.
Even if it is a head malfunction, the data could be redistributed between the other platters, reducing the drive capacity.
The specifics of how data is physically arranged across multiple platters is undocumented, complicated, and varies between models. But with some clever benchmarks, much of that information can be inferred: https://blog.stuffedcow.net/2019/09/hard-disk-geometry-micro...
A hard drive will only use one head on one platter at a time. A single logical block will be contained within a single track on one platter. The next logical block will usually be on the same track or an adjacent track on the same platter. Seeking from one track to the next using the same head is generally a bit quicker than switching to a different head on a different platter and getting it lined up with a nearby track.
If you've got slow HDDs, the usual solution is to RAID them together. At that point your limiter starts becoming how fast you can slurp data down the line. RAID-0 would sort of emulate what you're talking about here.
I think a big problem too is that SSD prices haven't gone down fast enough. Basically the only reason that spinning rust is still a thing, is because SSDs are far more expensive.
Apples and oranges, right? You need ten thousand hard drives to match the IOPS of one SSD and even with the 10000 disks your service latency will still be three orders of magnitude worse. The other advantage of an SSD is in terms of bytes per volume, in case rack density matters to you.
Putting 24TB parts in a Backblaze Storage Pod 6.0 would presumably allow 1440TB in a 4U rack mount server.[1] In practice, how would you reach the same density with SSDs? (I haven't looked into it, just curious if you know that SSD options more dense than that exist, and whether they are equally openly documented.)
There aren’t many full open-source solutions available, but you can buy 100TB 3.5” SSDs for data center use.[1] At Storage Pod densities, that’s 6000TB in 4U. I’m not sure if some other factor comes in that limits density, but that’s a first-order estimate.
There are 1U servers with 32 EDSFF slots, which were advertised to reach 1PB with 32TB SSDs. But 16TB is more common, and that's still 2PB in 4U that you can buy today without getting into exotic pricing.
You can put 36 15TB NGSFF SSDs into a 1U height enclosure and only 12cm deep. SSD volumetric density is a lot higher than disk and has been for a few years now.
SSDs are smaller, so you can pack more of them in the same volume. Not aware of really open designs (maybe in the OpenCompute project there are some), but just from a quick look at Supermicros homepage:
That depends very much on the workload. Four hard drives can deliver in aggregate a sequential bandwidth that exceeds any one consumer SATA SSD, or the sustained write bandwidth of most consumer NVMe SSDs. But when people are discussing IOPS, the usual implication is that they're talking about non-sequential access of relatively small block sizes. For those workloads, the difference between consumer SSDs and hard drives are still measured in orders of magnitude.
So, what workloads did you have in mind when you said "generally"?
I'm not sure whether the smaller size is an asset or liability here. There's likely some tradeoffs for manufacturing tolerances and/or thermals. (physics people speak up?)
Today you can fit (with off-the-shelf components) four 8TB M.2 drives (in RAID) in the volume of a 20TB spinning disk drive. Yes the cost is higher, but when it comes down ... Isn’t the industry expecting solid state density and cost might eclipse that of disk drives in the next 5(?) years?
SSD / Flash becomes more fragile with node shrinks, so flash (currently) kind of has hit the limit on shrinking, and they do "layering" to get more storage/density.
Not sure how much they can keep pushing layering, I think they are already at 96 layers at consumer and are having problems above 128, but I haven't checked the state of ssd in 6 months.
I suppose the investment cost will age out and ssd multilayer will continue to drop, but flash supply (and HDD supply) is effectively a cartel these days of not-that-many players/competitors.
>"Seagate is confident that heating the media using laser (HAMR) is the best solution possible, while Toshiba and Western Digital believe that using microwaves to change coercivity of magnetic disks (MAMR) is more viable for the next several years. Furthermore, Western Digital even uses 'halfway-to-MAMR' energy-assisted perpendicular magnetic recording (ePMR) for its latest HDDs. Meanwhile, everyone agrees that HAMR is the best option for the long term.
HAMR requires new heads and immediate transition to glass platters with all-new new coating, whereas MAMR only needs new heads and can continue using aluminum media with a known coating. Even if HAMR offers a higher areal density than MAMR, it is possible to increase platter count to expand the capacity of a MAMR drive to match that of a HAMR-based HDD. There is a catch though: thin MAMR platters will have to rely on a glass substrate."
This is interesting; it seems that we might be halfway between metal and glass hard drives...
Which brings up an interesting thought, phrased as a challenge, for all Physicists out there...
Given a piece of glass, ordinary glass, ordinary glass and no magnetic metal platter, my challenge is:
a) How do you write data to it?
b) How do you read that data back from it?
c) What's the smallest size / depth you can accomplish this at, and why?
d) Does a formula govern c, and if so, what is it?
e) What do you do about imperfections in the glass?
?
We currently have CD's, DVD's and Blu-Ray discs that use lasers to write to chemical mixtures inside of those media.
My challenge is -- can we do it without those chemical compounds, could we use simple pure glass, and if so, how?
?
(Note to Future Self: Work on this in the future... <g>)
I don't get why they should all vanish in one go. Unless there is a problem with all the heads and all the platters then you should still be able to recover a lot of data.
But what do I know, I returned an 8TB drive three weeks ago after it started developing bads and soon crashed with 2TB of workload in less than 24h of runtime. First the OS did not see it, then the BIOS stopped seeing it and now I think I won't see it any more. Good to know it has 5 years warranty but it shattered my trust in the model.
One head crash kills an entire drive because the access arm is a single block of aluminum with one head each for each platter that swings in and out. They do not move individually.
Imagine a vinyl record player, stack up six of them vertically, connect all tone arms at counterweights to a single bar running height wise.
Edit: It seems lots of our computing components has reached plateau or soon within this decade. From HDD, DRAM, NAND, Chip Processes, etc. That is not saying they wont improve, but their Unit Cost aren't dropping any more.