Dear customer, This mail is a follow-up to the previous email we sent (on Januar...

petee · on Jan 9, 2020

Wow, for a company that boasts "no bullshit", only offering a month after destroying data and backups seems a little tone deaf

Edit: in fairness, I'm not sure how exactly you would quantify such a loss anyway...

Shalle135 · on Jan 9, 2020

It sounds like they didn’t have any backups at all but rather relied on a active-active replication link to a secondary storage.

Edit: who knows it may be related to the HPE issue.

https://www.bleepingcomputer.com/news/hardware/hp-warns-that...

jnwatson · on Jan 9, 2020

In other words, RAID is not backup.

mschuster91 · on Jan 9, 2020

What baffles me is that there seems to be no way for either the customer or a data-recovery company to flash a new firmware onto the drive after it has failed. Someone there wanted to spare the few millicents of copper trace for a JTAG port?!

marcinzm · on Jan 9, 2020

Probably to prevent supply chain firmware changes for hacking, espionage, etc.

deelowe · on Jan 9, 2020

Hmm... I wonder what the "incident" was. If it involved something akin to an "rm -rf," then of course their replication link didn't protect them.

fencepost · on Jan 9, 2020

Perhaps they were depending on snapshotting and were not prepared for some kind of hardware failure taking out the entire storage system.

joepie91_ · on Jan 11, 2020

Reputable hosting providers typically don't try to quantify such a loss, but rather outright offer a credit/compensation that is very obviously generous (say, a year or even two of free service).

Especially when a small set of your customerbase is affected, it won't cost you that much, and "overcompensating" like that means that virtually noone is going to criticize you for quantifying it wrong; instead, the public narrative will be centered around "well, shit happens, they did their best and generously compensated".

manuelmagic · on Jan 9, 2020

I could understand the incident (I would _at least_ start questioning myself about the quality of the service I'm paying), but IMHO this is not something that can be addressed with a casual e-mail that contains few lines of excuses and a "promo code" like it's everyday business. That's astonishing.

Worse than a bad incident there is only bad management of the following situation.

abtinf · on Jan 9, 2020

> This type of incident is extremely rare in the web hosting industry.

Why would they include that sentence? Are they trying to imply it is rare for them because it is rare for the industry? Are they saying they are not as good as the industry, so customers should move to other providers? Or are they trying to show they apply the same inattention to their customer communication as they apply to their data backup/recovery practices?

This kind of data loss should simply never happen. It’s one thing to say “it will take us up to 30 days to restore your data because our fast recovery options aren’t working and we have to bring up cold archives”, it’s entirely another to say “your data is gone, tough”.

SmellyGeekBoy · on Jan 9, 2020

I'm not sure why you've been downvoted for this. I thought the same.

I read it as: "This type of incident is extremely rare in the web hosting industry, because apparently the overwhelming majority of our competitors aren't capable of fucking up as badly as we just did."

Doesn't inspire confidence at all, IMO.

ben509 · on Jan 9, 2020

> Why would they include that sentence?

They're a French company; it may be a non-native speaker not catching the implication.

It's also possibly an editing error, e.g. they started writing something like, "these types of incidents are extremely rare and when they happen etc" and most of it was dropped without considering how that changed the implication.

jkaptur · on Jan 9, 2020

I think they're referring to the "incident" that they experienced (on the storage unit in the datacenter), not the situation as a whole. The implication is meant to be that they prepared for many things, but not something as unlikely as this.

martius · on Jan 9, 2020

I think it was meant to say "nobody is infallible", these events are extremely rare, but they /will/ occur, even if you're a customer of the best and biggest players.

jermops · on Jan 9, 2020

If you're not paying for backups... what archive?

icebraining · on Jan 9, 2020

They say you can backup by using their snapshoting tool, but they lost those snapshots too.

yjftsjthsd-h · on Jan 9, 2020

The bright side is that now if anyone asks me why we would ever need the 3-2-1 backup protocol, I have a beautifully worked example.

jermops · on Jan 9, 2020

oh damn

mmoez · on Jan 9, 2020

A promo code in exchange of your data loss. What a bargain!

atlasAUPrivacy · on Jan 9, 2020

“Please keep trusting us to host your data”

scohesc · on Jan 9, 2020

You really shouldn't trust anyone hosting your data. Always have backups!

oefrha · on Jan 9, 2020

Often times the backup provider is the hosting provider, whom you have to trust. (This extends all the way from big clouds like AWS and GCE to small providers like Linode and DO). Having an external backup can be unreasonably expensive due to ridiculous egress costs.

nradov · on Jan 9, 2020

If your business can't afford external backups then you don't have a viable business in the first place. And of course egress costs have to be considered when choosing a hosting provider.

oefrha · on Jan 9, 2020

Not everything that’s hosted in the cloud is a business. In fact, the Internet wasn’t even created for the purpose of profit-generating business.

icebraining · on Jan 9, 2020

The Internet was created by the military, so yes it was.

arpa · on Jan 9, 2020

You can still back up to the same providers' different data center. Two data centers failing simultaneously is very unlikely.

oefrha · on Jan 9, 2020

Not always an option. For instance, I use Linode’s backup service and it can only back up to the same data center (although it is said to live on a separate system).

tgsovlerkhgsel · on Jan 9, 2020

You can, and should, back up your irreplaceable data elsewhere using a custom solution. Unless it's some service that doesn't allow you to export the data at all, it may be inconvenient, but it is an option.

jdmguit · on Jan 9, 2020

Coming from a Linode employee, I can confirm this is true. Linode's backups live in the same data center as the server, but the systems are separated so that they don't directly affect one another.

alexchamberlain · on Jan 9, 2020

Do they have separate power supplies? Have steps been taken to ensure that fire can’t spread from one room to the next? What would happen if there was an explosion?

jdmguit · on Jan 9, 2020

In all seriousness, these are good points. I'm not a data center expert by any means, but here's what I know: The data center hardware has failsafes present by design, but they aren't disaster-proof being that they're in the same building.

To answer your questions: Yes, the backup storage box is in a separate chassis than the host machine that the Linode lives on; they have separate power supplies. The DCs themselves also have some sort of fire suppression. I don't know what would happen if there was an explosion.

notyourday · on Jan 9, 2020

Same data center is a single failure zone if simply because of:

1. Power delivery systems that bring power to the buildings - see issues at 111 8th Ave failures during Sandy.

2. Power systems inside the data center. Blast radius there is rather nasty. See the infamous Internap blow up around 2015(?).

3. Fire suppression/firefighting protocols.

jolmg · on Jan 9, 2020

They could mean using regular data transfer (i.e. using something like rsync instead of the provider's backup service). Maybe egress costs among servers from the same provider are reduced or nullified.

From[1]:

> Traffic over the private network does not count against your monthly quota.

I wonder how private addresses are setup by Linode.

[1] https://www.linode.com/docs/platform/billing-and-support/net...

jdmguit · on Jan 9, 2020

Each data center has an internal private network with a pool of private IPs available for assignment. If a private IP is assigned to a server, it then has access to the private network.

https://www.linode.com/docs/platform/manager/remote-access/#...

actuator · on Jan 9, 2020

This becomes very difficult as your data grows. If you live in AWS world, imagine periodic snapshotting from EBS, S3, RDS(and other data stores), EFS etc. For most people a different DC of the same cloud provider should be enough. If you have to put this into a different cloud provider it is a big cost drain and difficult to manage let alone if you want to have your own physical backups.

citiguy · on Jan 9, 2020

AWS has tools around this (lifecycle manager) that you can easily leverage for simple site backups. Or you can roll your own, honestly it is not that hard to take rolling snapshots.

Obviously hosting providers do not make it easy to extract your data because that's their vendor lock.

EnderMB · on Jan 10, 2020

Also, always make sure you're testing your backups by restoring to a non-production space, and ensuring that customer services are still available.

Gandi has never explicitly said they never had their own backups, just that they don't offer backups as a service. It's entirely possible that they did have backups, but couldn't recover/restore them.

ben509 · on Jan 9, 2020

"...marginally more than rolling your own or another cloud provider."

And to "trust marginally more" simply means:

    gandi_cost_per_month + P(gandi_fails_per_month) * cost_recovery
    < 
    alt_cost_per_month + P(alt_fails_per_month) * cost_recovery

jandrese · on Jan 9, 2020

> This type of incident is extremely rare in the web hosting industry.

I read this as "so maybe you should consider one of the other web hosting companies that doesn't have problems like this."

hedora · on Jan 9, 2020

Interesting. The public status page says they’re still waiting for the recovery process to complete.

passivepinetree · on Jan 9, 2020

Is this a response from the company or are you putting it forth as an example response for how to handle this incident better? It’s unclear from your post.