Q1. I still don't get the use case for db storage on ephemeral storage.
Q2. If EBS is the problem why are you migrating to S3 backed EBS boot vols? The problem with this is still the time in between snapshots even though it will be shortened.
Some Comments:
It will only be a matter of time before S3 disks and hardware start dying like EBS...en masse
I talked with Ketralnis several year ago and know how many VMs you were running back then. Pretty sure your not too far off from that count even today (even if 2x).
You can still virtualize on a good set of dedicated hardware to emulate your current 'network environment' to get you up and running in the near term _asap_. Obviously you'd build out of that vm environment (with your load) as the days go by. Seriously look into a parallel switch over though.
If EBS is in fact a huge issue as has been shown, you really may need to start migrating off unless you want dedicated employees monitoring system health on AWS. Eventually if problems continue that is what will happen, with no time left to even develop automation... And why automate on a pile of instability?
Don't forget that the more VMs you add with this high failure rate increases soft management costs and will eventually eat into your development time...
I don't work for Rackspace (I think they're quite expensive), but you guys might benefit from this level of care to focus on the real issues.
> Q1. I still don't get the use case for db storage on ephemeral storage.
We're still not sure either, so we're investigating to see if it makes sense. One possible option will be to have the master on ephemeral disk with a hot backup on EBS so there is no data loss.
Another option is use ephemeral for the master and all but one slave, so we got hot backups without a slowdown.
Still need to look into it more.
The one that we are doing ephemeral right now is Cassandra with continuous snapshots to EBS. Everything in there can be recalculated, and with an RF of 3, if we lose one node we can run a repair.
> Q2. If EBS is the problem why are you migrating to S3 backed EBS boot vols? The problem with this is still the time in between snapshots even though it will be shortened.
They are just easier to use. The root volume is rarely accessed after it is booted, so the EBS slowdowns aren't really a problem in that case.
> Some Comments: It will only be a matter of time before S3 disks and hardware start dying like EBS...en masse
I don't think so. It is a totally different product built by a totally different team with a different philosophy. S3 was build for durability above all else.
In response to the rest of your comments, you are absolutely right, there are other options. We will certainly be investigating them.
Q1. I still don't get the use case for db storage on ephemeral storage.
Q2. If EBS is the problem why are you migrating to S3 backed EBS boot vols? The problem with this is still the time in between snapshots even though it will be shortened.
Some Comments: It will only be a matter of time before S3 disks and hardware start dying like EBS...en masse
I talked with Ketralnis several year ago and know how many VMs you were running back then. Pretty sure your not too far off from that count even today (even if 2x).
You can still virtualize on a good set of dedicated hardware to emulate your current 'network environment' to get you up and running in the near term _asap_. Obviously you'd build out of that vm environment (with your load) as the days go by. Seriously look into a parallel switch over though.
If EBS is in fact a huge issue as has been shown, you really may need to start migrating off unless you want dedicated employees monitoring system health on AWS. Eventually if problems continue that is what will happen, with no time left to even develop automation... And why automate on a pile of instability?
Don't forget that the more VMs you add with this high failure rate increases soft management costs and will eventually eat into your development time...
I don't work for Rackspace (I think they're quite expensive), but you guys might benefit from this level of care to focus on the real issues.