Backblaze is now a Terraform provider

mattjaynes · on March 11, 2021

If you'll be using B2 for live production data, a couple points:

First, B2's pricing is pretty amazing, especially compared to S3 and similar competitors: https://www.backblaze.com/b2/cloud-storage-pricing.html

Second, be aware those savings come with some downsides. The major one for us has been their maintenance window every Thursday from 2:00-3:00pm Pacific Time. Usually there's no outage, but sometimes there is. There's no warning - it's just down sometimes during that window. So, if uptime is important for your data, consider the cost of also implementing a fallback solution to cover your production use during those maintenance windows. https://www.backblaze.com/scheduled-maintenance.html

qw3rty01 · on March 11, 2021

Also keep in mind all the data is in a single datacenter, so if something like OVH happens to backblaze, you'll probably lose the data

I mainly only use them for backups instead of production data

EDIT: As of 2019 there are multiple datacenters, but it doesn't seem like the data is stored redundantly across them

jiofih · on March 11, 2021

They have four datacenters, three in the US and one in the EU. Details are not given regarding how the 20 shards that comprise your data are distributed geographically, but they state eight 9s of reliability.

spondyl · on March 11, 2021

Just to clarify, Backblaze states eight 9s of durability, not reliability.

Durability refers to the idea that your data will still be retrievable (ie no corruption) similar to S3's claim of eleven 9s of durability.

Reliability however would be say; the actual availability of the Web UI or API server that you download your data through. If it were down, that wouldn't impact the actual integrity of the data itself.

For anyone interested, any more than about five 9s of reliability is basically impossible anyway when it comes to human intervention. As an example, 6 nines of reliability would allow you 3 seconds of unavailability a year so 250 milliseconds of unavailability a month.

From a users point of view, being "unavailable" includes everything from going through a tunnel and having your mobile connection drop out to a shark biting one of the undersea cables in the middle of the ocean.

As you might imagine, with a human involved, they couldn't even get acknowledge an alert fast enough to meet that deadline let alone actually going about doing any repairs and diagnosis :)

It could be spread over multiple instances and redundant hardware as well but as with any system being touched by humans, it's near guaranteed that something will go wrong eventually.

andrewaylett · on March 11, 2021

At that scale, a complete outage is unlikely. I have services which haven't gone down _at all_ for longer than a year. But we lose requests every now and then -- during a deploy, or due to a bug. So we've moved from a time-based view of outage to a request-based view.

This helps, too, as it lets us build out services to be more reliable in combination, rather than less reliable. With retries and fail-over, an outage in an entire region may not necessarily result in any user requests failing.

For scale, pre-pandemic our published figures claimed >100M MAU.

I find https://andrewaylett.github.io/multi-burn-rate-calculator/ helpful for visualising error rates -- largely cribbed from the project it's forked from :) but with the tweakables switched around and the time between alert and error budget exhaustion in the tooltip.

It's worth noting that we only evaluate our alerts at most once a minute.

chrisandchris · on March 12, 2021

Said OVH. Then their datacenter burned to the grounds.

Said Oracle. Then their DNS was misconfigured and their whole cloud went offline for 2 hours [1].

Shit happens, always, at all scales.

[1] https://ocistatus.oraclecloud.com/incidents/qjxllgkywysj

Edit: typo

andrewaylett · on March 12, 2021

OVH didn't suffer a complete outage. If you were relying on that single DC, then you're probably not sufficiently large for this to apply to you.

But perhaps my point wasn't clearly enough made: a claim of "100% uptime" on a service level isn't particularly _useful_ when our users still only see a 99.9% success rate.

chaz6 · on March 12, 2021

I think the weak point is their domain name. I think cloud providers should have a second domain, with a different registrar and managed sompletely independently, so that if one is subject to a problem (hijacking, dns outage, etc) clients can fallback to the alternative domain.

sp8 · on March 12, 2021

That literally happened, they blogged about it recently. https://www.backblaze.com/blog/recent-outages-why-we-acceler...

myself248 · on March 12, 2021

You say that, but telephony used to be obsessed with uptime. These are the people who invented live patching of a running system, after all. Case in point, the #1 ESS telephone switch was designed for less than 2 hours of cumulative downtime over a 40-year service life. In practice, most achieved less than 1 hour. (How many nines is that?)

Which is to say, individual subscriber lines can have all sorts of faults, but they only affect that subscriber. Trunks between offices can fail, but they only affect a certain number of circuits. The central call processing ability, of the switch to react appropriately to lines changing state and numbers being dialed, for calls to be completed if both stations are available, is what had to meet that target.

I was skeptical when I first heard this, so next time I was in an office that still had a #1 (actually a 1A, it had been upgraded in the late 80s), I talked to the switchman for a bit, and he showed me the downtime counter. It's a mechanical thing like an odometer, and once a second when the switch is executing its main loop, it touches a register that keeps the counter from incrementing. If the processor halts or isn't processing calls for some reason, the counter starts counting.

The switch was installed in the mid-70s, and from the moment it took over for its crossbar predecessor, including an in-place processor upgrade, it had logged less than an hour. Most of that came a few seconds at a time when swapping between active processors during software upgrades, he said. At the time (this was around 2002 I think) it was slated to be replaced with a DMS100, but the replacement activity hadn't commenced yet. I don't know what sort of reliability numbers the DMS machines achieve, but they'd do well to match their predecessors.

cperciva · on March 12, 2021

To answer your question: 1 hour of downtime after 40 years is 5.5 "nines": 99.9997% availability.

irjustin · on March 12, 2021

So to the grandparent comment, 6x 9's of reliability is unrealistic to guarantee as an SLA.

If you get better, awesome, but SLA? Too unrealistic.

d1sxeyes · on March 13, 2021

Depends. You're only thinking of one half of an SLA, the metric, rather than the measurement period.

What's the SLA penalty vs the extra the customer is willing to pay? If I think I can achieve 6 x 9's on a monthly basis, but probably I'll only achieve it 10 months out of 12, I can offer the customer an SLA of 6 x 9's for 100 USD per month.

My penalty can be 50 USD for failing to meet, and then I as a supplier walk away with 10 x 10 + 2 x 50 USD = 1100 USD for offering something I knew I couldn't achieve (consistently).

cperciva · on March 13, 2021

Indeed, in the early days of Amazon S3, they failed their SLA every single month. And I got... a 10% refund every month.

gumby · on March 12, 2021

> For anyone interested, any more than about five 9s of reliability is basically impossible anyway when it comes to human intervention. As an example, 6 nines of reliability would allow you 3 seconds of unavailability a year so 250 milliseconds of unavailability a month.

Not sure what the qualifier “when it comes to human intervention” means, but if I ignore it then five nines is quite standard in certain sectors. For example phone switch SLA (back when I was in the biz) was measured in minutes per decade (as in “cumulative unavailability under 1/2 minutes per decade”). Large baseline power plants can and must run uninterrupted for decades.

Of course it’s a systems issue not a point solution.

spondyl · on March 14, 2021

Ah, apologies.

I realise in hindsight that I was rambling from the point of view of offering five nines for software that is inherently flaky/unreliable. Companies where developers cycle through and knowledge is lost, technical debt accumulates and systems are used almost counter to their intended purpose (ie Redis as a database)

In that sense, it's like constantly massaging applications to stay alive or at a reasonable level of service and so hence the assumption that someone will be paged and respond in order to preempt a failure or restore service during an outage.

Dylan16807 · on March 11, 2021

Given that the parity can only lose 3 shards it's easy to show that a datacenter loss will always result in data loss, so there's no reason to distribute and we should assume that all 20 shards are always going to be in the same place.

qw3rty01 · on March 11, 2021

Oh how new is that? Last time I was looking through they just said their datacenter was in a bunker so nothing outside of a major natural disaster would affect it.

jiofih · on March 11, 2021

Since 2019 it seems: https://help.backblaze.com/hc/en-us/articles/217667468-B2-Se...

dx034 · on March 12, 2021

A fire like that in SBG a few days ago is extremely rare. The data lost there compared to all data stored in data centers probably justifies eight 9s. The problem with these numbers is that it's extremely unlikely to lose data but if you do, the event is so severe that everything might be gone.

That's why I feel more comfortable storing data across two "unreliable" providers with a lot of physical space in between them rather than one super reliable provider.

You also have to consider that data loss can result of simple things such as an account that gets blocked for stupid reasons. If you want to be safe you always need to have your data with at least 2 providers.

dx034 · on March 12, 2021

The EU and US data centers are completely separated. You can't even use both from the same account. To change, you need to open a completely new account selecting EU at the start. So it's not really easy to use both at the same time and there's definitely no redundancy across data centers if you're a customer in the EU.

busymom0 · on March 12, 2021

A limitation I ran across when using B2 was that their pre-defined url generation doesn't allow you to set file-size limits nor does it allow you to set the file name in the pre-defined url. It simply gives you a url to upload it to. So if you are using b2 for storage for lets say image uploads from browser, some malicious user has the ability to modify the network request with whatever file name or file size they want. Next thing you know, you have a 5gb sized image uploads happening....

This pretty much prevents me from using B2 for now.

SmellTheGlove · on March 12, 2021

I ran into the same limitation! IIRC, there also wasn't a way to expire a signed upload URL sooner than whatever the default was, which was hours or maybe a day. I had the exact use case you mentioned, too - image uploads bypassing my backend server. I didn't want the generation of a signed url to, say, upload a profile photo, give carte blanche to create a hidden image host when combined with the limitation that you highlighted. All sorts of bad things could come of that. I ended up just going back to S3 - costs more, but still worth it.

vishnumohandas · on March 11, 2021

B2 claims to offer 99.9% uptime[1], and seems to have SLAs[2] similar to that of AWS S3[3].

[1]: https://www.backblaze.com/blog/cloud-storage-durability-vs-a...

[2]: https://www.backblaze.com/company/sla.html

[3]: https://aws.amazon.com/s3/sla/

mattjaynes · on March 12, 2021

In practical terms, 99.9% uptime means no more than ~44 minutes per month of downtime (~9 hours per year).

tedmiston · on March 11, 2021

What a strange time for a company with lots of US customers to schedule a maintenance window.

Twisol · on March 11, 2021

I'm not so sure. It's better than a late night window in that it forces their customers to actually deal with the engineering window, rather than just cross their fingers.

aequitas · on March 11, 2021

Also, your office is full of awake engineers at that time. Which is better than a handful of on-call sleepy engineers and crossing your fingers the rest of the team wakes up when you call them if something goes completely haywire.

bretpiatt · on March 11, 2021

With the B2 target use case of backup and archive data I suspect it's actually a good time to do it for their customer base (and it also then happens to be good for their engineering team too, awake and alert!).

johnjboren · on March 11, 2021

Don't push to production on a Friday, but Thursday at the latest, sure...

ericbarrett · on March 11, 2021

Thursday is the best maintenance day. If it goes haywire, you have a full business day to fix before losing a weekend. And if you can't, well, you've given your coworkers a "free Friday," which is far less likely to result in complaints than screwing up M-Th.

lupire · on March 12, 2021

What's a free Friday?

ericbarrett · on March 12, 2021

Ever go into work on Friday and "the system is down"? You can't get anything done, because the tools you use to do your job aren't working, and the fix is out of your hands. Your coworkers are all affected, too. First people are frustrated, because they have tasks deadlines, and those tasks won't get done and the deadlines won't be met. A few are really freaked out and start calling bosses and getting VPs to yell at other VPs. But soon, most people in the company realize that everybody else is in the same boat, and nobody will be meeting their OKRs this week, and the status reports probably won't be filed.

Then they relax.

If they're in offices, they group up, maybe in the break room. A rousing game of ping-pong breaks out.

Remote coworkers ping each other on Slack. Maybe a few start a round of Among Us. Bread dough is kneaded. Kids get a little more help with their schoolwork.

Everybody takes a very long lunch.

By 2pm, people realize the entire day is gone. Almost everybody has left or signed off by 3. Some roll out to bars; others go home to their kids, or to their gardens or garages or battlestations. Everybody beats the traffic.

Come Monday, the system is fixed. People are a little stressed out, since there's so much catching-up to do, status reports to be filed, widgets to be tracked and poked. But everybody agrees that was an amazing couple of days, and they got lots of rest, and it sure was nice. And hey, I had this great idea over the weekend—

monsieurbanana · on March 12, 2021

Sometimes I miss working in big companies.

preommr · on March 11, 2021

Digital Ocean has spaces with unlimited uploads, 250gB + (0.02/gb), 1TB of outbound transfer all at $5/mo.

That seems way cheaper than this.

For backups and large, long term storage, AWS has Glacier, that's really really cheap.

caymanjim · on March 11, 2021

I'm a big fan of Digital Ocean and run a bunch of droplets. B2 is way cheaper than Spaces for storage (1/4 the price). I tried using Spaces anyway, because I wanted something with faster throughput for streaming video, but Spaces was even slower than B2, even within the same Digital Ocean datacenter. All these S3 clone storage systems are clearly throttled, and there seems to be at least soft collusion to keep the bandwidth about the same between them, and just enough to prevent video streaming. I'll go sit in the corner and adjust my tinfoil hat now.

jjeaff · on March 11, 2021

$0.02/gb compared to $0.005 with b2. So not really in the same ballpark as far as price.

preommr · on March 11, 2021

That's just the storage cost per GB, if you have any kind of serious throughput, then bandwith costs make it more expensive.

brightball · on March 11, 2021

Aren't they in that free bandwidth alliance with Cloudflare?

divbzero · on March 11, 2021

Yes, it looks like Backblaze is part of that free bandwidth alliance. [1]

Thank you, I was not aware of this policy.

[1]: https://www.cloudflare.com/bandwidth-alliance/backblaze/

preommr · on March 11, 2021

Yea, that changes things, that's a pretty good deal then.

Especially because Cloudflare's pricing is "smoother" and detached from any one service.

atYevP · on March 11, 2021

Yev from Backblaze here -> yes we are!

pcnix · on March 12, 2021

Thank you for being our Yev!

dr-smooth · on March 11, 2021

Wait until you see how much it costs to retrieve your entire backup from Glacier.

kenmacd · on March 12, 2021

I guess it depends on how likely you are to need to do that. Looking at b2 vs glacier deep it seems as long as you don't need the data more than every 2y that glacier still works out cheaper even with the high bandwidth costs.

dx034 · on March 12, 2021

But glacier also has minimum storage duration. With S3, you'll need to use a tiered system unless you want to store all backups for several months (often that's only the case for weekly or monthly backups).

In the end, S3 can be cheaper but you have to make a lot of assumptions beforehand. Backblaze is cheap enough to just throw everything in there and work with their lifecycle rules. You don't need to make assumptions about download volumes or storage duration beforehand (esp if you can retrieve via cloudflare).

mike503 · on March 12, 2021

Spaces was not fully compatible with S3 at one point. It was nearly impossible to download your entire bucket when it was huge. Rclone was able to, but it was horribly slow. AWS CLI would only grab up to 1000 items. It seems like they did finally fix that though.

I’m actually a huge fan of Bunny now. The CDN piece is about as cheap as it gets (for any utility based service), it’s optimizer and other things work well, and it works seamlessly with their storage system too. Which is super cheap itself, allows you to control how much it’s replicated (and where) - just waiting for them to deliver S3 compatibility so all the existing tools that exist work, or some other type of CLI tool.

killingtime74 · on March 11, 2021

Linode has the same and allows using bandwidth pooled across all instances

mekster · on March 12, 2021

I wonder if this still applies about performance comparison.

https://github.com/gilbertchen/cloud-storage-comparison

dx034 · on March 12, 2021

Backblaze downloads are $.01/GB, not $.02/GB as stated there. And via Cloudflare they're free. That makes a big difference vs. AWS where you have no chance to get that data to another provider for free (unless you have a very special deal with them).

ddorian43 · on March 11, 2021

If you ever need those backups, your company will fail from the receipt to receive them.

For Digital Ocean, please look that their pricing is higher in both bandwidth & storage.

truetraveller · on March 11, 2021

Ouch. Does B2 Cloud actually go down during this time. That's a major show-stopper. Was actually counting on B2. Not sure anymore.

atYevP · on March 11, 2021

Yev from Backblaze here -> typically not - we've built most of our systems to be keep data up and flowing during those maintenance windows - if we anticipate longer windows or are doing things that can impact performance we typically announce it on our blog and twitter!

xoa · on March 11, 2021

>if we anticipate longer windows or are doing things that can impact performance we typically announce it on our blog and twitter!

As a customer is there any way to opt-into a more proactive notification of an anticipated delay, like an email? I understand such things are necessary sometimes, but "always pay attention to some blog or twitter for a rare occurrence" doesn't seem particularly busy-stressed-admin friendly :).

NewJazz · on March 11, 2021

RSS feed for maintenances would be nice I think.

ericvanular · on March 11, 2021

Thanks for building Backblaze. If you're able to share some feedback with the team - this is an extremely important factor. If you can ensure downtime is avoided during maintenance windows, it will make your service much more viable for production systems

barkingcat · on March 11, 2021

This is an unreasonable expectation. The whole point of a maintenance window is to allot an expected time when there might be downtime.

Otherwise, the maintenance window becomes 24hx365, since "ensuring downtime is avoided during maintenance window" means literally - make a maintenance window have the same uptime as non-maintenance window.

shock-value · on March 11, 2021

It's not necessarily unreasonable, it just depends on what kind of product they want to offer. S3, Google Storage, etc. do not have a maintenance window that I'm aware of. That's not to say they would never go down, but if they do you would expect an alert and an apology, at least. Many application require this kind of expected uptime, but of course there are others (backup, etc) that would not.

barkingcat · on March 11, 2021

Yes the request would turn into - please remove the maintenance window, instead of "make the maintenance window not have downtime"

NewJazz · on March 11, 2021

Can anyone be 100% sure that downtime is avoided during maintenance windows?

xchaotic · on March 11, 2021

If you have some umm other useful tips for that 100% uptime let the rest of the world know. I am more happy with realistic and upfront statements from B2 than some wishful thinking from potential users.

mattjaynes · on March 11, 2021

Not often, I only remember a few times in the last couple of years. If you're just hosting backups, you're unlikely to even notice. But if you're serving live production data from B2, it can bring down your whole service, which is quite painful especially if you have a large customer base.

seekbeak · on March 12, 2021

Same here. We're midway through a migration, and this has us re-thinking the whole move. I wonder if we had a duplicate copy of our data in the EU data center if we could do a fallback during US downtime, or if the entire 'cloud' goes down.

seekbeak · on March 13, 2021

Replying to my own comment as I just talked to their support about this: "The maintenance window does affect all our data simultaneously. As we push the updates through one data center to another." Welp, so much for that.

truetraveller · on March 11, 2021

Side question: Apart from this maintenance window, is B2 Cloud reliable? I've heard of problems with the S3 API. Is the "native" API more stable? Would love to know your insight, it will potentially save me a lot of time!

tedivm · on March 11, 2021

Several years ago Backblaze lost all of my wife's data. Their dashboard said it was all there, and we trusted their systems to be accurate. When attempting to download the data it turned out that none of it was there. When my wife contacted support they tried to blame her.

Obviously this was a few years ago, but a backup provider failing at their one job and then blaming the customer left a really bad impression that keeps me from using them.

jiofih · on March 11, 2021

You were probably a victim of their 30 day deletion policy. If for any reason (firewall, etc) you did not connect to the backup servers your data would just be purged without a grace period. For that reason I built my own backup sync using B2 directly instead of their backup service (and it’s a lot cheaper).

tedivm · on March 11, 2021

We actually checked that- the day we went to get the laptop repaired we confirmed that it was active and backed up, and a week later the restore failed.

Backblaze eventually admitted that their dashboards aren't realtime, and they had a bug which was showing us (and their client) files that didn't exist.

Spivak · on March 11, 2021

It is? I avoided it on my Linux laptop for ages since I assumed their flat pricing was a lot cheaper.

sgerenser · on March 12, 2021

Depends entirely on how much data you have. If it’s less than 1TB then $.005/GB/month is less than $5/month. These days most people would blow past 1TB pretty easily so for most people unlimited Backblaze backup would be cheaper (if your on Mac or Windows where it is even an option).

dx034 · on March 12, 2021

Do people really blow past 1TB easily? Even with all my pictures I don't really get past that mark. Many people I know only have a laptop and all their data on there, it's rare to see laptops with more than 1TB capacity. So apart from people with a high amount of pictures or videos I wouldn't expect many to have more than 1TB backup needs.

sgerenser · on March 12, 2021

Yeah, most is probably an overstatement. I know for me, between photos, videos, music, VM images, etc. I have way more than 1TB of data that I want to be backed up.

jiofih · on March 12, 2021

Surprisingly I’ve never crossed the 200GB mark, even backing up all photos from two phones and three cameras. Guess I don’t make enough videos.

fulafel · on March 12, 2021

I wonder what the version of the old "if you don't test restoring you don't have backups" rule is for these new services. Restoring costs money and laptop users don't have good ways of doing complete restores just for test as it's a lot of downtime.

Maybe it needs a kind of stochastic automated approach.. a program that finds sufficiently small (vs costs) sample of files on your computer (some old, some recently changed, etc) and tries restoring them and verifies.

pfranz · on March 12, 2021

> Restoring costs money and laptop users don't have good ways of doing complete restores just for test as it's a lot of downtime.

At least for the standard Backblaze service you can download for free. For a USB drive you float the cost of the drive (they reimburse you when you return it)--maybe you pay shipping?

tedivm · on March 13, 2021

Downloading my full backup would use the entire month's quota on Comcast, forcing me to pay more money to my crappy ISP.

pfranz · on March 13, 2021

Downloading rarely makes sense for a full restore, but is perfect for smaller restores or tests. Even if it didn't blow away your quota they only keep the packaged restores around for 7 days and I've found that difficult to restore a large amount of data with home Internet.

To restore it to a drive all you pay out of pocket is return shipping of the drive. The one time I had to use it I was slightly over the 30 days (I was waiting on a repair before I could restore the data) and it wasn't an issue.

Dylan16807 · on March 11, 2021

I haven't used a it a lot but it's been a solid endpoint for backups.

I did some medium-intensity benchmarking a while back and decided not to put certain server data on it because I was getting a few 20+ second timeouts per thousand read requests. I can handle server errors, and I have retry logic, but this was something where I needed to be able to access the data within a second or two. Maybe it would have worked better if I set a very aggressive timeout, I'm not sure. Deeper testing is something I'll worry about some other time if the data actually grows past a couple hundred gigabytes.

This was mostly with the S3 API, I don't remember if I ever succeeded in getting the program to use the native one.

mattjaynes · on March 11, 2021

Have had live production data on there for a couple of years and it's been very solid outside the maintenance windows.

(The exception being the recent outages GoDaddy caused for them, but since they've moved to using Cloudflare as their registrar, I don't anticipate further issues there: https://news.ycombinator.com/item?id=26119619 )

truetraveller · on March 11, 2021

Okay, that's great to hear.

fukmbas · on March 13, 2021

Lol who has a maintenance window in the middle of the day?

nikisweeting · on March 11, 2021

I love B2's free ingress and egress when you use CloudFlare, they utterly destroy the competition on price.

But my true dream would be for backblaze to someday offer ZFS as a service.

I want to `zfs send -i my_pool@2020-03-11 | b2zfs recv some_bucket_id`, then be able to view my snapshots and files within in the backblaze web UI, and restore with `b2zfs send -i some_bucket_id/my_pool@2020-03-11 | zfs recv my_pool`.

You can already mount B2 as a FUSE filesystem with something like ExpanDrive, then write ZFS raw file vdevs to the B2 FUSE mount, but it's horrifically slow and probably too janky for any real use.

vluft · on March 11, 2021

rsync.net's zfs as a service is really great for that, though a bit more expensive than b2 is.

bpye · on March 11, 2021

Rsync.net looks great but the storage cost is 5x B2 unfortunately

EDIT: As mentioned below this is for the most expensive (lowest capacity) tier. I’d been comparing for my own home use and so I would be unlikely to exceed 10TB but if you’re looking at higher capacity then maybe the calculus is different.

rsync · on March 11, 2021

... for small quantities, yes.

For large quantities, it is 3x, and actually less since there are no charges for ingress/egress.

Further questions/comments over email, please, since this is BBs HN thread and I don't want to butt in.

mhio · on March 11, 2021

Without b2 adding anything zfs specific, that should be possible with a regular large file upload if you can deal with each upload part being buffered in a file/memory locally before upload.

kgog · on March 11, 2021

maybe you need borg with borgbase.

philsnow · on March 11, 2021

The wording is a little clunky; the "is" in "Backblaze is now a Terraform provider" makes me think that BB is competing with Terraform Cloud.

"Backblaze now has a Terraform provider" or "Backblaze released a Terraform provider" makes more sense to me.

OJFord · on March 11, 2021

But unfortunately plugins cannot provide remote backends, and it's a little clunky to use the S3 one:

https://github.com/hashicorp/terraform/issues/27304

javajosh · on March 11, 2021

It's interesting because it feels like a small turf war. The "Amazon Provider Team" wants to own the s3 provider backend, and you want to make it pluggable. Is this to avoid an explicit "b2" provider? I can see how it might be confusing to use terraform and see an s3 provider and then have it...not use s3. :)

ArchOversight · on March 11, 2021

Terraform doesn't allow any backends other than the ones built-in. So even if you wanted to provide a "b2" backend, it'd have to be built into to terraform.

Using the S3 backend with B2 should work fine since B2 makes a S3 compatible API available, but its made more difficult because the "Amazon Provider Team" is the one that maintains the "s3" backend for Terraform, and they want to do additional validation that matches AWS's expectations.

renewiltord · on March 11, 2021

Very cool! Great job, guys.

Do you use a similar technique to https://poweruser.blog/embedding-python-in-go-338c0399f3d5 to embed the Python SDK?

Also, was there a reason it's not against the B2 API? Not a judgment, just curious about the design tradeoffs in a professional-talking-to-professional sense.

AYBABTME · on March 11, 2021

That design choice prevents their provider from working with official Terraform container images. They should make the API calls directly in Go, it's just weird to do it via Python... They have very few resources exposed on their API so it's not like writing a Go client would be all that hard.

renewiltord · on March 12, 2021

Why would the official container images not work? The Python interpreter and libs are bundled into the (massively fat) Go plugin binary, right?

And appropriately linked?

vbsteven · on March 11, 2021

Terraform is an amazing tool and it’s always great to see more infrastructure providers jumping on the bandwagon.

atYevP · on March 11, 2021

Yev from Backblaze here -> Yea, it's pretty great! They're a great tool and we're glad to be involved!

rattray · on March 11, 2021

I sure wish this post had included a few code snippets, just to make it extra-clear what using Backblaze through Terraform might look like.

vbsteven · on March 11, 2021

Inside the article is a section "How to get started using Backblaze B2 in Terraform" with a link [1] to the getting started guide which should have everything you need.

I can see why they put it on a separate page to not clutter up the article.

[1] https://help.backblaze.com/hc/en-us/articles/1260803375989

rattray · on March 11, 2021

Nice, thanks!

I was less interested in the code sample as a "how-to" and more for skimmability – rather than reading a bunch of words I just wanted to see what it would look like.

Eg, from that link:

    terraform {
      required_version = ">= 0.13"
      required_providers {
        b2 = {
          source  = "Backblaze/b2"
          version = "~> 0.2"
        }
      }
    }

    provider "b2" {
    }

    resource "b2_application_key" "example" {
      key_name     = "test-b2-tfp-0000000000000000000"
      capabilities = ["readFiles"]
    }

    data "b2_application_key" "example" {
      key_name = b2_application_key.example.key_name
    }

    output "application_key" {
      value = data.b2_application_key.example
    }

solatic · on March 12, 2021

> Terraform is an open-source infrastructure as code (IaC) tool

Maybe it was in the beginning, but Terraform is far more powerful than that now. Terraform is a monad that neatly separates pure declarative configuration from the I/O (side effects) that are factored out into providers. Terraform used at its most powerful is not limited to infrastructure, it also sets up the platforms and applications running on that infrastructure for you, by separating the configuration of the platforms and applications from the generic API calls that apply that configuration. Terraform's dependency graph ensures that the calls are made in the right order, no matter if they are made to infrastructure APIs, platform APIs, or APIs belonging to layers further up the stack

1vuio0pswjnm7 · on March 11, 2021

For large downloads, does BB support the Range header. If the user is on a connection that that is not suitable for long downloads, could the Range header be used to download a large file in several parts.

0xbkt · on March 11, 2021

They support byte serving.

nhoughto · on March 12, 2021

Popped in to say backblaze built out a tf provider at our request and they were great about it! Got a quick early build out to test, GA build a few months later and are very responsive to feedback. Pleasure to deal with

atYevP · on March 13, 2021

Yev from Backblaze here -> That's really nice of you, glad we could work together!

majormjr · on March 11, 2021

I was just looking at this new Terraform provider yesterday how timely. Nice to see the quickstart guide this will be helpful for managing the buckets and application keys.

atYevP · on March 11, 2021

Yev from Backblaze here -> That's great! Glad the quick-start makes it simple.

satyrnein · on March 11, 2021

Anyone have any experience with the Backblaze compute partners? I want to put data up on B2, but then I still need to do things with it.

snicker7 · on March 11, 2021

Terraform is great, but I wish more cloud providers had Guix/Nix integration.

ethnt · on March 11, 2021

You could use NixOps[0] for Nix but I'm not sure you can directly compare Terraform and Guix/Nix? My set up involves Terraform for infrastructure and Nix for provisioning, and it's working for me so far.

[0] https://github.com/NixOS/nixops

aseipp · on March 12, 2021

I think this too, but at the same time the reality is a tool like Terraform is really complicated to implement well, and, importantly, it always has to work. All the time. There are really high standards for this, and in my personal experience, alternative solutions like NixOps don't quite stack up in reliability or broad utility versus Terraform. The design is good in theory, but it needs just a huge amount of work to be trusted.

For the most part, I provision things with Terraform and then instantiate the servers with NixOS/Nix itself, and this mostly works. For bonus points you can use Nix to generate the HCL that Terraform reads in (because Nix can write JSON, and HCL is just JSON in a trenchcoat) if you want to put some veneer on it.

emptysongglass · on March 11, 2021

You can deploy and manage NixOS on Big Cloud using Terraform: https://nix.dev/tutorials/deploying-nixos-using-terraform.ht...

NewJazz · on March 11, 2021

Are there any providers that to have such integration? I have only lightly used nix. Having a hard time understanding what a cloud provider integration would look like.