Hacker News new | past | comments | ask | show | jobs | submit login
Backblaze's Hard Drive Stats for Q2 2017 (backblaze.com)
307 points by LaSombra on Aug 29, 2017 | hide | past | favorite | 143 comments



I'm not a customer of Backblaze, but I really love how they publish this data for the public. I wish more companies did this kind of thing.


Yev here -> Thanks, so do we! Honestly we always thought that once we published what we see in our environment that others would follow suite, but that hasn't quite happened yet!


Again thanks for these stats. I love them and often refer to them when people ask for reliable harddisks, even for consumers.

I'm not a really big expert on your file system. How easy is it for you to replace a faulty drive? Just pop it out and put in a brand new one, even if it's other capacity and/or brand and/or model? Is firmware upgrade of harddisks supported?

Sometimes i wonder if it's possible to make a sort of small consumer (nas4free?) edition of the storage pod. Must be awesome to use almost any drive and still have a reliable big nas at home.


It's windows only, but storage spaces is a pretty good match for what you describe.

There are still some rough edges, but it's overall a pretty nice setup for me. I setup the virtual drive to require at least 2 copies of the data on the underlying disks (but there are options for 3 copies and more I believe), and then you can add and remove disks to the array kind of whenever you want. They can be different speeds, sizes, whatever.


Yup yup! ^


Just throwing in to also express my appreciation. You guys are second only to okcupid in terms of awesome data and visualizations, and I think yours is way more useful. I suppose that depends a lot on one's current life priorities though!


I...uh...love those folks. <3


Are you planning on purchasing more Toshiba drives?


Only if they reach a price-point that we like! Right now we've been buying up a lot of Seagate drives because they work well and are at a pricing sweet-spot!


Which Seagate drives? With your reports, it seems Seagate can be good or terrible.


We have a large swath of 8TB drives going in that are performing well for us!


Their data[1] recently saved me from snapping up a "deal" on a Seagate 3TB.

Of the drives installed in 2012, by 2015 32% had failed, 62% were removed out of caution for failing some diagnostics, and 6% were still in place.

Not worth the risk to save $20. I have no way of knowing if Seagate fixed the issues and kept producing them or if they're still trying to dump old stock of bad drives.

Thanks, Backblaze!

[1] https://www.backblaze.com/blog/3tb-hard-drive-failure/


Recently? Are those faulty Seagate drives still being sold in 2017?

EDIT: Ah, I see that you meant you're turned off by 3tb Seagates in general. You could just look at the model number.


The bad model is ST3000DM001, and I believe the 001 at the end is a revision number. The one I was looking at was a higher revision number of the same model, but I didn't see any Backblaze data for it and didn't want to bet my data on Seagate having diagnosed/fixed the problem.

Got a decent deal on an HGST a few weeks later,


I recently got the ST3000DM008 model, so time will tell. :) I was aware of Backblaze's previous reports, but I doubt they are valid for consumers. A couple of cheap drives in a tower case with occasional read/writes and hundreds of them tightly packed together working 24/7 are two completely different scenarios.

Home users should just backup their important stuff and buy whatever they want. A super reliable drive won't save them from an accidental drop, a fire or theft.


It also depends on where you're keeping your drives. Their drives are in temperature-controlled data centers, mine are in a cheap consumer NAS in a home office that can breach 34°C in summer.

I imagine most of their drives don't even see much traffic if they're used for backups. Write once, read rarely. I wonder how B2 changes that equation.


Seems like their 4TB series was shitty to the point I wonder how it's possible. Larger drives seem ok though.


You are reading the data wrong. The 4tb have 5k hours for 400 drives while the others have order of magnitude more spin time. The other type of 4tb have a sample size too small to conclude anything (a single bad batch is possible). Given that the drive failure risk is a curve over time, you need more data. Drives are not reliable when new or very old, but usually ok in between.

That being said, I will never ever buy a Seagate HDD again. They joined Maxtor in my blacklist a long time ago. I lost every single one I ever bought (and RMAed) within a year.


I was on CrashPlan but just recently decided to switch to B2, backup up a bunch of files from my linux server (which I use as a NAS). I primarily knew of Backblaze from these postings.

I basically just have a cron job setup to run b2 sync at 2am, and then kill it at 8am. It generally doesn't take that long to run now, but when I was first syncing everything (couple hundred GB) this allowed it to take place over several days without affecting my daytime bandwidth.


Yeah, I had considered doing the same. Unfortunately, BackBlaze still doesn't offer a Linux client for their backup offering. :(


One of their guys on Reddit said that they intentionally don't offer it because they don't want to attract the datahoarder crowd.


Yeah, they basically don't want people like me (or other Crashplan users for that matter) as a customer. I can't say that I blame them.


I've been very happy with Borg[0] combined with the B2 command-line tools for Linux backups. I switched from Duplicity + AWS and it reduced the cost as well as made the backup process simpler - Borg doesn't have the same full vs incremental difference to think about, you just take a full backup each time and it dedupes everything automatically.

0: https://borgbackup.readthedocs.io/en/stable/


Hmm, so they offer a Linux client? If they don't then how come you have access to command line tools for Linux?


They don't offer it for their personal backup solution. Only for B2 which is a cloud storage like S3 or google cloud.

Btw, they should include Amazon Glacier on the pricing page comparison, not just S3.


For my use I'm primarily concerned with an off-site copie of a folder full of files (mainly pictures/videos), so the basic sync works just fine for my case. It even keeps versions of files [1] in case you corrupt/delete/overwrite a file.

What is it missing that Crashplan and Backblaze Personal Backup have?

$5/mo of B2 is about 1TB of storage, so if you're above that then maybe cost is a factor?

[1] https://www.backblaze.com/b2/docs/file_versions.html


Someone linked this in a previous thread: https://www.time4vps.eu/storage-servers/

It's cheaper then backblaze, slightly. It's not really comparable, because it's not distributed and there is real risk of data loss. However, for my needs, it works as part of a comprehensive backup plan. I get a bit more bang for my buck and more flexibility.


Same. I'm in the crashplan soon to be migrating boat and can't use Backblaze for that same reason.


I'm currently testing 'rclone' with the backblaze b2 backend (+crypt).

Seems to work OK so far. Won't be as cheap as their unlimited plan but I'm OK with that.


Duplicacy + B2 works great, though.


CrashPlan is shutting down their consumer products and BackBlaze is at the top of my list largely due to these great posts.


There are lots of issues with Backblaze also. You should search past threads to read the comments which folks have posted.

I'd have to say the 30 day retention policy is one of the main issues. I'd also add their lack of restore in their client myself is a big issue.


I assume you recently switched because CrashPlan cancelled their Home plan for unlimited storage.


I love the fact that they kept doing this on a regular basis even more. It's like a gift.


Brian from Backblaze here.

> It's like a gift.

We're glad you like it! It isn't hard for us, and we get a little press out of it so it is TOTALLY worth it for us to do. Once we setup all the analysis scripts to pull the data, it's mostly just an automated system.

What baffles me is why nobody else reports drive failure stats? I mean, people hear about Backblaze and buy our product because we're providing this data, why don't any OTHER companies want this free stream of customers?


Hello,

I have no idea why and am wondering too. Maybe a tradition of closeness, or they didn't think it would matter to anyone. Also I think hard drives are a special case because drive reviews are rare if non existant nowadays; and you need large amounts of them to do analysis. Which until companies like your came, was probably relegated to big business only; unlike today where cloud storage is mainstream so the average joe may be involved...

I think more sectors should have automated statistics like these; consumer struggles to assess the quality of goods; relying on adhoc comments or tiny (paid) reviews...


Maybe IoT can be the answer, although not the kind of security hole of devices talking straight to their cloud, the kind of IoT where devices talk to a central command and control device located inside the home. Then this controller can ping the devices/ask for their status and upload a statistics of how often devices say "Repair required.". And obviously no manufacturer will volunteer to do this...


> What baffles me is why nobody else reports drive failure stats?

Amazon, Microsoft and Google probably have too much to lose from calling-out the vendors of poorly performing drives and the ever-present risk of a lawsuit - which is probably the overriding concern: if Contoso Storage Ltd had a single bad batch that coincidentally Azure used for their storage operations, they'd report a on-the-whole inaccurate failure rate, and Contoso's revenue and stock price would dip accordingly.

Given the size, scale and marketing of AWS, Azure and Google's Cloud services respectively I don't think them publishing their hardware failure rates would positively affect their cloud services revenues any detectable amount - all for more work to analyse and publish the findings and the subsequent liability.


The other thing I would be extremely keen to hear from is SSD failure stats. But I don't know who would run a large enough park of SSDs to have meaningful stats.


AWS, GCP, Azure, etc. have enough drives for sure, but doubt any would be willing to be this transparent.


Maybe digital ocean?


> What baffles me is why nobody else reports drive failure stats?

My wild-ass guess is that organisations with enough hard drives to make reliable estimates give performance/failure data to whoever installed/designed the arrays (who i imagine get some competitive advantage from knowing which drives fail) and so there's probably a pretty serious culture of secrecy around this stuff.


Not a culture of secrecy, they just have good relationships with the vendors, so it's professional courtesy not to publicly compare.

Where you do see it is when relations are strained, see for example YouTube bandwidth reports.


I'm glad to see that HGST have the lowest failure rates across the board for the 3rd(?) year in a row. I was concerned they'd lose their place when they were bought by Western Digital.


I started buying HGST drives exclusively based on the Backblaze reports. Not a single failed drive (out of 8) failed on me so far.


Ditto. My new NAS has HGST drives because of Backblaze.


i bought western digital stock after seeing that they had bought HGST and that backblaze found them to be among the most reliable and WD paid a not bad dividend. stock has more than doubled and although i've sold out I wouldn't have bought had it not been for BB's data!


Yev from Backblaze here -> Saw this go off in my twitter feed, I'll be here for any questions if you have them.


Big Backblaze fan. You guys have saved me a couple critical times!

I'm curious if your guys' view on NAS options is evolving at all?

My interest is this:

Here at Pixar we have several folks who I'd call "lazy power users" at home. Folks like us are familiar with computers, and we want a strong home network, but want to spend as little time as possible sysadmin-ing the thing. That generally means powerful, easy to manage wifi, proper firewalls, etc.....and networked storage/sharing & backup of all the family computers, from personal machines to spouse and kid setups.

For the circles I run in, this is a fairly common case, and no single service seems to fit the bill.

Backblaze seems so close (especially WRT "it just works"). If it could offer a "Home backup solution" as a service...oh man, I know of at least a hundred people who would sign up in a heartbeat.


How's this sound?

- MikroTik firewall, centrally monitored by "The Dude" - Unifi wireless on a hosted controller if the size justifies, otherwise do MikroTik CAPSMan or just straight integrated wifi AP on the bridge. - VPN tunnels on the MikroTik to HQ (or not). - Synology NAS on-site in 2-5 bay config (hot-spare). - Time Capsule the Macs. - Windows File History the PCs - rsync the lunix. - Use the cloud connector to back it all up to a central Backblaze B2 bucket (straight from the Synology). - Do more with Dude like alert to order toner when printers SNMP fires.

Multiply ad nauseum.


Sounds like a lot of work


> I'm curious if your guys' view on NAS options is evolving at all?

Our Backblaze "B2" product line was designed so that you get the exact same cost of storage of the online backup product line but you were free to write ANY policy you like (such as backup NAS boxes). Developers can use these APIs: https://www.backblaze.com/b2/docs/ And if you are a "lazy power user" who wants something that just works, maybe check out one of these 3rd party tools: https://www.backblaze.com/b2/integrations.html


> I'm curious if your guys' view on NAS options is evolving at all?

It's simple, Backblaze "home" doesn't work on NAS boxes, but Backblaze B2 does, Synology NAS supports it natively via the Cloud Sync package.

Backblaze home is $5 flat rate for a single machine.

Backblaze B2 has granular pricing but it's like <$20 a year cheap for several of my clients.


They have a pricing calculator on this page:

https://www.backblaze.com/b2/cloud-storage-pricing.html

For what I want (a fairly static 2 TB backup), it would cost around $130 / year. If my QNAP box supports it, I think I'm going to sign up.


Just to be accurate Cloud Sync IS NOT a backup solution as one would typically imagine. You can set Sync to be one way (DSM -> B2) but it's not available for Hyperbackup so it's not a real backup service.


Too bad my upload speeds to B2 are horrid. Maybe it is because I am uploading from Europe, but I have no such issues with Google Drive.


Increase simultaneous uploads or divide your backup into more parts and upload it simultaneously, works for me - uploading from OVH DCs, France.


I can max out my 100 Mbps connection if I use use multiple parallel transfers for it. This is in East Europe.


Hey Colin -> I'll echo what Brian said below (he's our CTO though didn't identify himself, I'll shame him later). Our B2 service is designed for that. Adding NAS support to our computer backup service would skew the math too much in the wrong direction, but we developed B2 to be flexible for the majority of use-cases and it's been integrated with a lot of popular NAS platforms as well.


I think it's interesting that you seem to innately recognize a linux box is basically a NAS. I run a full linux box at home that I use as a personal cloud / NAS.


> "Home backup solution"

Do you mean some kind of _managed_ on-prem NAS or a NAS hooked up to Backblaze online storage?


I remember learning (years ago in university) that hard drive failures follow a bathtub curve: https://en.wikipedia.org/wiki/Bathtub_curve

In other words it either fails early or it will last for a while.

Can you plot the time to failure distribution for various models to confirm/deny this rule of thumb? I think it'll be a good addition to list of current charts you have since it's a bit more meaningful that overall failure rate.


Why do you think Backblaze is able to maintain cheap unlimited support in the wake of so many other providers recently jumping ship? Amazon, SOS and now Crashplan. I've heard stories of people using Amazon to store over 1PB of data which would be thousands of dollars monthly even at Glacial cost. Do the "ultra" users weigh down on your margins?


Backblaze is NOT able to maintain unlimited support:

'While files are expunged from the servers after 30 days if they're removed from a computer, your most recent backup snapshot will be retained for 6 months if your computer is completely unable to contact our servers (either it's shut off, or no internet connection). As long as your computer can contact Backblaze at least once every 6 months and perform a full Backblaze file scan operation, you don't delete or transfer the backup and you retain active billing, your most recent snapshot will be retained.'

https://help.backblaze.com/hc/en-us/articles/217664898-What-...


I think the question was about our unlimited model, not data retention/versioning, though if it was you've got the right answer :D


Hey there! Great question. We use our own storage infrastructure called Storage Vaults (https://www.backblaze.com/blog/vault-cloud-storage-architect... we build on top of our Storage Pods (https://www.backblaze.com/blog/open-source-data-storage-serv...). That means we can store a lot of data, at a lower Cost per GB than a lot of folks, and we pass the savings on to our customers. Of course we have some customers that store WAY more than we break-even on, but we also have A LOT more customers with a more manageable data-set.


~7 yrs at CrashPlan. Never backed up more than 30-50GB at a time and it's not gonna dramatically change at all in the near future.

I use it strictly for "personal" and critical data that I cannot recover or get from anywhere else once lost - personal pics, personal videos, my notes - diary, some mails. No, not all mails - most of them are left with Google, MS, on my VPS and if they are gone, I am not gonna miss them terribly. I don't even store my code on CrashPlan servers - for that's there gitlab and bitbucket (mirrored there) and my external hard disk.

In fact I have a ~3GB folder in Dropbox that I have named "Emergency Backup" and if all is lost I might be happy with just that.

It's not that I am being a model "low storage" customer so that CrashPlan can function. But I want to keep my backup habit disciplined (in my own way, of course - many would find my backup strategy as stupid for their own use cases and that's fine).

So please, for the love up proper backup, bring unlimited versioning and file retention for customers like me :-) Or float a cheaper plan where you limit the storage. Or hell, bring something like cold storage and dump my backup there (I know B2 but that's not what I am looking for; something baked in your main backup service). All I would do is periodically keep checking whether my backup is there or not and I will just leave it be. (Okay, if not really unlimited then something close to it).

We need a CrashPlan alternative. I am willing to stick with and wait for almost a year but after that I would like to move to a better alternative and more trustworthy which you are - except, in all honestly - that glaringly missing (or omitted) critical backup feature. Also, my backup is something I want to pay and let someone else handle in a very solid way.

Here's a discussion I had with Brian few days back and there are some points I have raised. I am not saying they are brilliant ideas, actually that is a wish-list but please have a look if you can - https://news.ycombinator.com/item?id=15074647


Thanks I'll take a look. Honest question, if you're not backing up more than 50GB of data, why not just use Backblaze B2 w/ another front-end GUI like ARQ or Cloudberry or Hashbackup? You'd be paying $0.25/month for the storage, much less than with our Computer Backup service, and you can make life-cycle rules that would keep versions for as long as you'd like.


Because I want one service that has everything built in. Like CrashPlan backup app, or your computer backup app. I have mentioned this in my comment.

I understand the price difference but I want the ease, peace etc, not something where I need to hook two or few things together.


I would rather see them leverage/contribute to existing cross-platform client-side backup software like ZManda[0] making the improvements needed to either make it work directly with B2 or integrate with Amanda.

Keep your focus on the storage angle, there are ways to accomplish this, I use a "glue box" (Synology NAS) to collect the data and fire it off to B2, could easily be a linux machine running b2 cli tools[1].

[0]: https://wiki.zmanda.com/index.php/Zmanda_Windows_Client

[1]: https://www.backblaze.com/b2/docs/quick_command_line.html


I run a personal home server that acts as my cloud / NAS. I personally think RAID is overkill for most home users. Instead, I use my 2nd drive to mirror the data. This saves me from fat finger errors and such. I also use BTRFS so I can take snapshots and provide my own file versioning.

For offsite backup, I use a time4vps storage server, but I might transition to B2 in the future. I like what I'm hearing.


Understood! Thank you for the additional color!


Are you able to dedupe common data between customers? I'd assume there's storage savings in storing a chunk of data (multiple copies of course) only once storage-system wide.


We don't do account-wide deduplication, but we do deduplicate on the machine before backing up.


Part of it might be because they don't support Linux, which tends to have a larger percentage of "ultra" users. https://news.ycombinator.com/item?id=15125543


Brian from Backblaze here.

> Part of it might be because they don't support Linux

Backblaze DOES support Linux through the B2 product line and third party applications. There is a list of applications that support Linux here: https://www.backblaze.com/b2/integrations.html (scroll down and look for pictures of Penguins).

Supported applications include Dupicity which ships inside of several Linux distributions such as Ubuntu and Debian. So you may already have your local client application pre-installed on your linux computer ready to backup to Backblaze!


Right, I meant Linux with the Backblaze personal service, since that's what the GP was referring to:

> I've heard stories of people using Amazon to store over 1PB of data which would be thousands of dollars monthly even at Glacial cost. Do the "ultra" users weigh down on your margins?

(Obviously with a price that scales with the amount of data stored, "ultra" users wouldn't be a problem on B2.)


It sounds like BackBlaze requires you to have local copies of the data, which would stop most people from storing a PB.


One reason that Backblaze is so cheap and B2 is much cheaper than Amazon is that they don't have multiple data centers. That means two things: you don't have the fault tolerance you have at Amazon:

If their building catches fire, you're SOL.

The second consequence is that yiu can't choose a data center that is geographically close to you.

That makes B2 unsuitable for off site primary storage for critical data and I wouldn't use it for more than a backup.

That being said, I'm a happy backblaze customer and if I ever get a NAS, I would definitely use B2 as the backend for my backup solution.


Brian from Backblaze here.

> If their building catches fire, you're SOL.

We call this the "meteor hits our datacenter" scenario. With the Backblaze Online Backup product, the hope is your laptop isn't hit by the same meteor so you still have a primary copy of the data.

But I'm a HUGE believer in having two lower redundancy backups stored with completely different technology backed up by two separate companies that don't share a single line of code. In two separate locations. For example, make a local copy onto an external USB hard drive, and use Backblaze for a remote copy in case your house burns down. It would also be Ok to put one copy in Amazon S3 and a separate copy in Microsoft Azure, and try to use two separate pieces of software to do each of those backups.

The main reason to use two different companies is in case a bug exists in the backup software. The same bug won't hit both backups at the same time.


> We call this the "meteor hits our datacenter" scenario. With the Backblaze Online Backup product, the hope is your laptop isn't hit by the same meteor so you still have a primary copy of the data.

Just reading this right now I realized, actually, usually for online services you want to use the server closest to you, but for backup I guess it's the opposite - you want the server that's furthest away from you!


100% :D


Amazon doesn't include multi-region redundancy with S3. What you pay for is redundancy within a region, just like Backblaze.

Source: I emailed Jeff Barr and asked.


It does include multi-AZ redundancy, which means that your data is spread across multiple physical locations within a region.

AZs are usually within something like a 50-mile radius, which doesn't get you meteor level separation but does get you fire level separation.


Can you provide a citation to that level of geographic diversity per region? My understanding is that all AZs are within the same physical facility, but independent networking and power.


AWS has a map that shows the physical location of all data centers. us-west-2 for example is split up among three cities in Oregon. Probably 60-70mi between each of them.


> all AZs are within the same physical facility

The TL;DR from AWS documentation [1]:

An Availability Zone is represented by a region code followed by a letter identifier; for example, us-east-1a. To ensure that resources are distributed across the Availability Zones for a region, we independently map Availability Zones to identifiers for each account. For example, your Availability Zone us-east-1a might not be the same location as us-east-1a for another account. There's no way for you to coordinate Availability Zones between accounts.

The long and confusing explanation:

At least not in us-east-1 and us-west-1-2, but I am pretty sure many of the large regions are also run in multiple physical facilities.

The so-called availability zone is an abstract and virtual concept. Let us use us-east-1 as an example.

Assume the following:

* Physical DC buildings: Queens, Brooklyn, Manhattan, Staten Island

* AWS accounts: Joe, Alice, Bob

* AZ: us-east-1a, us-east-1b, us-east-1c, and us-east-1d

Every AWS account in us-east-1 region is assigned three AZs. But for sake of this explanation, we assume only two.

* Joe: 1a, 1b

* Alice: 1a, 1b

* Bob: 1a, 1c

You now ask, "WTF?" but you let this go, think this is done for capacity reason. So do we actually have four different physical facilities, one per each AZ? Nope.

So is 1a and 1b in the same facility? Not necessarily, but very possible.

So 1a and 1b in Queens, 1c in Brooklyn, and 1d in Manhattan? Nope.

So what the fuck is AZ? What is the relationship between AZ and physical facility?

Think about virtual memory address space.

Joe's 1a and Bob's 1a are in Queens, but Alice's 1a is in Manhattan. But Joe's 1a and Bob's 1a are on a different floor, different racks, while Joe's 1b and Bob's 1c are in Brooklyn and on the same floor. This is why certain customers run out m3.xlarge in 1a but others don't in their 1a.

In essence, AZ is a label and is unique per account. AZ is very similar how virtual memory address in OS looks like.

We learned this because our EMR failed due to low capacity in one account.

[1]: http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/using-reg...


I don't mean to be rude, but your answer was very handwavy without any citations or proof.

https://datacenterfrontier.com/inside-amazon-cloud-computing...

"Amazon initially said little about the physical layout of AZs, leaving some customers to whether they might be different data halls within the same facility. The company has since clarified that each availability zone resides in a different building."

“It’s a different data center,” said Hamilton. “We don’t want to run out of AZs, so we add data centers.”

To make this work, the Availability Zones need to be isolated from one another, but close enough for low-latency network connections. Amazon says its zones are typically 1 to 2 milliseconds apart, compared to the 70 milliseconds required to move traffic from New York to Los Angeles.

“We’ve decided to place AZs relatively close together,” said Vogels. “However, they need to be in a different flood zone and a different geographical area, connected to different power grids, to make sure they are truly isolated from one another.”

So, distance of availability zones from each other is limited by speed of light in fiber optics (which is slower than through a vacuum or microwave wireless).

Based on this calculator: http://wintelguy.com/wanlat.html, availability zones can't be more than 0.5-1 miles apart (about) to retain their 1-2 millisecond network latency, so they're different buildings in the same industrial/business park. We could confirm this by pulling permits (public record) in Amazon's (or their subcontractor's) name.


No worries, you are not being rude.

That is not a guarantee. AWS doesn't actually publish more than what I cited (well there are photos of the DC flooding around the Internet). But there are different physical facilities, and they are some miles apart. Like I said above, 1a for Joe and 1a for another customer don't have to be in the same building, or on the same floor.


He is talking about multi-region. He already acknowledged S3 is single region reduant.


It helps that they only backup local storage - no NAS or network drive support. Obviously anyone storing 1Pb of data in Amazon isn't exactly uploading that from the drive(s) in their laptops...


Been looking at using B2 to back up a few systems including a remote server. I've gone through the docs and didn't find anything on this so figured I'd ask here since this post popped up with perfect timing!

Can you have multiple API "keys" set up for different buckets within one account? And can these keys be permissioned to constrain an API key to bucket(s), and to control the operations possible within that bucket?

My use-case here would be for the remote server to have a B2 API key available to it which only permits data to be added to the bucket. That way, if the server were ever compromised, an attacker couldn't tamper with or erase the backups.

I'd use another (more privileged) API key to do any maintenance of deleting from buckets as-and-when it was needed.

The reason for asking about having separate keys for buckets was purely for "isolation", so a laptop backup bucket API key wouldn't be able to do anything to the desktop bucket (and vice versa etc.)


>Can you have multiple API "keys" set up for different buckets within one account? And can these keys be permissioned to constrain an API key to bucket(s), and to control the operations possible within that bucket?

B2 user here. No, and its the one thing keeping me from migrating more data from S3 to B2.

One key at a time only, and there is no permission granularity.

Since the backblaze folks seem to read these threads, please make this happen! I'm ready to give you more money per month.


Thanks for your reply.

That's a real shame - I am sure I could segregate the different devices by making multiple accounts under the "groups" system, but that gets a bit messy!

I'd love to see this happen! Even if it was just a list of the API functions from the docs with a checkbox against each, hidden behind a "danger" screen, it would make me feel more confident using this! Even if it wasn't granular per-bucket, it would make it a little safer when leaving credentials in cronjobs or bash scripts.


Brian from Backblaze here.

> two separate credentials

It is literally at the very top of our list to do. I'm staring at my task list and helping out on that is my number one task. You should see it coming soon!


I'm staring at Brian starting at it...


Hey there! Short answer is: No. Long answer is: It's a high-runner on our task list, no ETAs, but we're aware there's a need!


Do you think you'll ever offer the $5 a month plan for desktop GNU/Linux users? I have an office full of Ubuntu users.


Hey Matt! Doubtful. For our consumer backup service allowing servers, NAS boxes, or Linux would skew our unlimited model in an unfavorable direction. We've been sending folks over to our Backblaze B2 service (it's still relatively inexpensive at $0.005/GB - so depending on the amount of data it might actually be less expensive). We have a lot of integrators, and one of them Duplicity ships with a lot of Linux boxes already (https://help.backblaze.com/hc/en-us/articles/115001518354). They don't have a GUI, but Deja Dup (https://wiki.gnome.org/Apps/DejaDup) does offer one, though Backblaze B2 isn't integrated in to it yet - we're crossing our fingers. The are some GUI tools like Cloudberry, qBackup, and Duplicacy that do have GUIs you can use as well. I know it's not the same as an unlimited plan at a fixed cost, but it does offer a bit more flexibility for the end user.


I understand why you don't support Linux. And while I wish you did, I respect the decision.

However, can you provide me with a backup client integrated with B2 that makes backups as simple as the regular home service? I don't want to deal with CLI or anything complicated.


Hey Zanedb! We don't have one of our roadmap, but we DO have a lot of B2 integrators that are actually quite simple to use, take a look at -> https://www.backblaze.com/b2/integrations.html (most of them are either free or are a one-time license cost + storage over time).


Why do you keep saying my home Linux laptops with 256 and 128GB are NASes? Why aren't Windows Servers some other people use with 12TB of storage marked that way?

I get that you don't want 150TB uploads, but why do you insult your potential customers by calling them hoarders just by the OS they use on their home devices?


I'm talking about just desktop users though. Nothing stopping someone plugging a 8TB drive into their Macintosh and using the service afterall...


Completely understand, and we do back up Drobos and external drive bays as well, but opening it up to NAS, Servers, and Linux would not work out well. That said we're constantly running modeling on it, but for now we send folks to Backblaze B2 and most folks tend to be pretty happy with it. Plus with B2 + Backblaze Groups you can centrally manage those backups as well, so you still get the web-usability of Backblaze, just need someone else' client to tie in to B2 (or roll your own w/ CLIs and APIs).


If you ever start offering regular backup for $5 for desktop, we'd love to be a customer.


Sure, but the % of users with 8TB drives—and the connectivity to push that in the first place—would be low enough to still work out favorably for them.


I run Linux on my laptop. That's not a server. Just provide a GUI application so no one in their right mind would put it on a server.


Any backup client has to be headless, otherwise it will fail to maintain your backups while you are logged out or automatically resume after a reboot. It's a non-starter.


My home directory is encrypted then anyway.


2nd, I was about to upgrade my Crashplan service to multi user to add my Linux laptop. Now they're going away entirely so I'm back to square one.


3rd. Another Crashplan refugee. I loved using Backblaze when I was on a Windows & osx.


Now that the Phoenix DC is online, will there be geographical redundancy between Phoenix and NCal for Backblaze customers?


There's currently redundancy inside of Backblaze datacenters with our Vaults architecture (https://www.backblaze.com/blog/vault-cloud-storage-architect...). Georedundancy is on the roadmap for Backblaze B2, but not currently for the Computer Backup service.


Any plans to build a client for Synology NAS devices? I'd love to have an off-premise backup for my home/small-business data, and it'd be nice if I didn't have to go through a PC to get it there. And I'd be willing to pay a little more - I feel guilty dumping 18TB on you with a consumer plan.


Hey there! Don't feel guilty...use Backblaze B2! We're actually already integrated in to Synology's Cloud Sync service! Take a look at-> https://www.synology.com/en-us/knowledgebase/DSM/help/CloudS... (and more B2 info/integrators -> https://www.backblaze.com/b2/integrations.html)


Have you guys tried hardware RAID? I've been a hardware RAID user for about 0.5PB. Then I built a your box from your kit kit and was shocked by the terrible performance of ZFS. About half the speed of my Adaptec builds, and much worse for other loads.


ZFS needs a lot of RAM for good performance. Like 10s of GBs. How were your boxes configured?


I'm by no means a ZFS expert, but much of what I've read from "authoritative" sources [1] suggests that this is a myth. The 1GB of RAM per TB of disk is largely a suggestion from the FreeNAS developers a while back that was specific to FreeNAS (not ZFS in general) and more of a gut feeling than something backed by measurements. ZFS deduplication can be memory-hungry, but it's more "adding memory helps" than "not having lots of memory is catastrophic".

[1] https://linustechtips.com/main/topic/738402-zfs-memory-requi...


Storage Pod 6.0, ~190 GB of RAM + ~48 cores.

30 drives gets like 600 MB/s on ZFS, compared to 1200 MB/s with Adaptec hardware RAID.

[ps] Also got a lot of bad stuff to say about backuppods. Mostly that they don't sell the correct number of wire harnesses. (They need to sell two but they're only selling one, I assume I'm the only person stupid enough to buy from them...)


First ZFS don't need much RAM, you are fint with as little as 2GB.

You can still use hardware raid with ZFS if you export each disk as its own device and assemble the raid with ZFS. It is not the parity calculations that are slow, it is the additional io operations that are required and ZFS writes these directly to disk. A hardware raid controller writes these io operations to memory. Another advantage with ZFS is if your data compress well, you will get additional speed benefit from that.


Frankly, I suspect the ZFS community is in denial, or optimizing for cost.

I got what I think is a correctly configured ZFS system and its getting the same speeds as most people report online.

NOW the thing is that the ZFS system doing about 2x worse for the dense sequential write workloads compared to the RAID6 HW solution. Not to mention eating a dozen or so GBs of RAM.


ZFS autotunes memory allocation based on how much ram you have, if you have 128GB ram free, ZFS will use that. In fact all operating systems eat GB's of free RAM just like ZFS.

Software raid has a larger write penalty, that is why you see slower write performance versus using hardware raid. As I said earlier, writing directly to memory helps. Recommended read: http://rickardnobel.se/raid-5-write-penalty/


So, basically, if you're building a $15,000 storage box, cough up $500 for a hardware RAID solution.


Have you guys considered offering cloud service backups, e.g. Google Apps backup?

There are providers like Spanning or Backupify, but I'd prefer to buy it from you because I've been a Backblaze customer since the early days and I trust you.


We've toyed with the idea, but honestly Google does a great job of making sure that its Apps are backed up, so while there's definitely a market, not sure we can spare the dev cycles to address it.


Looks like those Seagates are still disappointingly failure-prone, while HGST remains the most reliable.


Backblaze won me over because they publish this stuff. I'm actually working with the backblaze b2 api now because of some stats they published earlier.


I understand that backblaze probably uses spinnies all around, but would love to see an SSD version of this!


These guys need an Internet tip jar.


It's called, "using their service."


Yes, there's also that :D


Yev from Backblaze here -> I can send you my Paypal address? I..."promise" to distribute it evenly...


And I would keep Yev to that promise. :-)


HGST continues to be the most reliable HDD. I am not surprised but I hope WD doesn't fuck it up. Lenovo bought IBM X-series , while that continues to perform well, I have little confident in Lenovo and its commitment.


Wow, what's the deal with Seagate's 4TB drives failing so often?


Seagate's recent history shows they haven't been great at producing reliable drives:

https://www.backblaze.com/blog/wp-content/uploads/2017/01/Al...

Notice the rows where the number of failures is higher than the number of drives. That means the replacements failed too.

Edit: the 3TB Seagate is apparently infamous enough to have its own Wiki page: https://en.wikipedia.org/wiki/ST3000DM001

Only the "large" Seagates (6TB and above) seem to be doing OK.


Impressive performance by HGST..


Although WD bought HGST [1], it is interesting WD performs worse than them.

[1] https://www.wdc.com/about-wd/newsroom/announcements/wdc-acqu...


It makes sense, right? Keep selling decent quality WD branded drives to consumers who don't care for best-in-industry MTTF, often in external enclosures, but buy HGST so you also get money from super nerds.


The way Black & Decker bought deWalt.


They bought the Cadillac of shooting nails.


I have heard (third hand) that WD has been pretty hands off with the HGST part of the business thus far. It remains to be seen just how long that state of affairs can last.


My own Hitachi Travelstar 5K160 is now at Power_On_Hours=81569 (9.3 years) Load_Cycle_Count=16793410 It doesn't work very hard though.




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: