Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Dear customer,

This mail is a follow-up to the previous email we sent (on January 8th, 2020) on this topic. As a reminder, yesterday, we experienced an incident on a storage unit at our LU-BI1 datacenter, located in Luxembourg.

Despite the replication systems in place, and the combined efforts of our technical teams throughout the night, we were unable to reover the data that was lost on the impacted storage unit.

We sincerely apologize for the inconvenience that this situation has caused. This type of incident is extremely rare in the web hosting industry.

In the event that you have a backup of your data, we suggest that you to use it to recreate your server at a different datacenter.

To help you in this, we have provided you with a promo code that will give you one free month for an instance, so that you can create a new Simple Hosting instance in a different datacenter:

    XXX


Wow, for a company that boasts "no bullshit", only offering a month after destroying data and backups seems a little tone deaf

Edit: in fairness, I'm not sure how exactly you would quantify such a loss anyway...


It sounds like they didn’t have any backups at all but rather relied on a active-active replication link to a secondary storage.

Edit: who knows it may be related to the HPE issue.

https://www.bleepingcomputer.com/news/hardware/hp-warns-that...


In other words, RAID is not backup.


What baffles me is that there seems to be no way for either the customer or a data-recovery company to flash a new firmware onto the drive after it has failed. Someone there wanted to spare the few millicents of copper trace for a JTAG port?!


Probably to prevent supply chain firmware changes for hacking, espionage, etc.


Hmm... I wonder what the "incident" was. If it involved something akin to an "rm -rf," then of course their replication link didn't protect them.


Perhaps they were depending on snapshotting and were not prepared for some kind of hardware failure taking out the entire storage system.


Reputable hosting providers typically don't try to quantify such a loss, but rather outright offer a credit/compensation that is very obviously generous (say, a year or even two of free service).

Especially when a small set of your customerbase is affected, it won't cost you that much, and "overcompensating" like that means that virtually noone is going to criticize you for quantifying it wrong; instead, the public narrative will be centered around "well, shit happens, they did their best and generously compensated".


I could understand the incident (I would _at least_ start questioning myself about the quality of the service I'm paying), but IMHO this is not something that can be addressed with a casual e-mail that contains few lines of excuses and a "promo code" like it's everyday business. That's astonishing.

Worse than a bad incident there is only bad management of the following situation.


> This type of incident is extremely rare in the web hosting industry.

Why would they include that sentence? Are they trying to imply it is rare for them because it is rare for the industry? Are they saying they are not as good as the industry, so customers should move to other providers? Or are they trying to show they apply the same inattention to their customer communication as they apply to their data backup/recovery practices?

This kind of data loss should simply never happen. It’s one thing to say “it will take us up to 30 days to restore your data because our fast recovery options aren’t working and we have to bring up cold archives”, it’s entirely another to say “your data is gone, tough”.


I'm not sure why you've been downvoted for this. I thought the same.

I read it as: "This type of incident is extremely rare in the web hosting industry, because apparently the overwhelming majority of our competitors aren't capable of fucking up as badly as we just did."

Doesn't inspire confidence at all, IMO.


> Why would they include that sentence?

They're a French company; it may be a non-native speaker not catching the implication.

It's also possibly an editing error, e.g. they started writing something like, "these types of incidents are extremely rare and when they happen etc" and most of it was dropped without considering how that changed the implication.


I think they're referring to the "incident" that they experienced (on the storage unit in the datacenter), not the situation as a whole. The implication is meant to be that they prepared for many things, but not something as unlikely as this.


I think it was meant to say "nobody is infallible", these events are extremely rare, but they /will/ occur, even if you're a customer of the best and biggest players.


If you're not paying for backups... what archive?


They say you can backup by using their snapshoting tool, but they lost those snapshots too.


The bright side is that now if anyone asks me why we would ever need the 3-2-1 backup protocol, I have a beautifully worked example.


oh damn


A promo code in exchange of your data loss. What a bargain!


“Please keep trusting us to host your data”


You really shouldn't trust anyone hosting your data. Always have backups!


Often times the backup provider is the hosting provider, whom you have to trust. (This extends all the way from big clouds like AWS and GCE to small providers like Linode and DO). Having an external backup can be unreasonably expensive due to ridiculous egress costs.


If your business can't afford external backups then you don't have a viable business in the first place. And of course egress costs have to be considered when choosing a hosting provider.


Not everything that’s hosted in the cloud is a business. In fact, the Internet wasn’t even created for the purpose of profit-generating business.


The Internet was created by the military, so yes it was.


You can still back up to the same providers' different data center. Two data centers failing simultaneously is very unlikely.


Not always an option. For instance, I use Linode’s backup service and it can only back up to the same data center (although it is said to live on a separate system).


You can, and should, back up your irreplaceable data elsewhere using a custom solution. Unless it's some service that doesn't allow you to export the data at all, it may be inconvenient, but it is an option.


Coming from a Linode employee, I can confirm this is true. Linode's backups live in the same data center as the server, but the systems are separated so that they don't directly affect one another.


Do they have separate power supplies? Have steps been taken to ensure that fire can’t spread from one room to the next? What would happen if there was an explosion?


In all seriousness, these are good points. I'm not a data center expert by any means, but here's what I know: The data center hardware has failsafes present by design, but they aren't disaster-proof being that they're in the same building.

To answer your questions: Yes, the backup storage box is in a separate chassis than the host machine that the Linode lives on; they have separate power supplies. The DCs themselves also have some sort of fire suppression. I don't know what would happen if there was an explosion.


Same data center is a single failure zone if simply because of:

1. Power delivery systems that bring power to the buildings - see issues at 111 8th Ave failures during Sandy.

2. Power systems inside the data center. Blast radius there is rather nasty. See the infamous Internap blow up around 2015(?).

3. Fire suppression/firefighting protocols.


They could mean using regular data transfer (i.e. using something like rsync instead of the provider's backup service). Maybe egress costs among servers from the same provider are reduced or nullified.

From[1]:

> Traffic over the private network does not count against your monthly quota.

I wonder how private addresses are setup by Linode.

[1] https://www.linode.com/docs/platform/billing-and-support/net...


Each data center has an internal private network with a pool of private IPs available for assignment. If a private IP is assigned to a server, it then has access to the private network.

https://www.linode.com/docs/platform/manager/remote-access/#...


This becomes very difficult as your data grows. If you live in AWS world, imagine periodic snapshotting from EBS, S3, RDS(and other data stores), EFS etc. For most people a different DC of the same cloud provider should be enough. If you have to put this into a different cloud provider it is a big cost drain and difficult to manage let alone if you want to have your own physical backups.


AWS has tools around this (lifecycle manager) that you can easily leverage for simple site backups. Or you can roll your own, honestly it is not that hard to take rolling snapshots.

Obviously hosting providers do not make it easy to extract your data because that's their vendor lock.


Also, always make sure you're testing your backups by restoring to a non-production space, and ensuring that customer services are still available.

Gandi has never explicitly said they never had their own backups, just that they don't offer backups as a service. It's entirely possible that they did have backups, but couldn't recover/restore them.


"...marginally more than rolling your own or another cloud provider."

And to "trust marginally more" simply means:

    gandi_cost_per_month + P(gandi_fails_per_month) * cost_recovery
    < 
    alt_cost_per_month + P(alt_fails_per_month) * cost_recovery


> This type of incident is extremely rare in the web hosting industry.

I read this as "so maybe you should consider one of the other web hosting companies that doesn't have problems like this."


Interesting. The public status page says they’re still waiting for the recovery process to complete.


Is this a response from the company or are you putting it forth as an example response for how to handle this incident better? It’s unclear from your post.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: