We've taken a very similar approach when migrating data from one DB to another (MySql to Redis in our case, but the principle should apply to any databases). We split it into 4 phases:
* Off - Data written only to MySql (starting state)
* Secondary - Data written to both MySql and Redis, and MySql shall be the source of truth.
* Primary - Data written to both MySql and Redis, and Redis shall be the source of truth.
* Exclusive - Data written exclusively to Redis.
As mentioned in the article, the Secondary phase allowed some time for the new database to be populated. And the distinction between Primary and Secondary phases gave us a rollback option if something went wrong.
The difficulty here is what happens when one of the databases is not available temporarily (network error, etc.) You cannot have a "transaction" cover writes to both systems so you either have to manually undo one of the writes or risk the two systems getting out of sync.
This is typically solved (in my experience) using a reconciliation process using transaction GUIDs along with backfill from non-source of truth in the event the data isn't found in the source of truth. As long as a transaction made it into one of your data stores, consistency isn't lost (and if writes failed to both data stores, alarms should go off).
Interesting write-up. I'd love to see stats on how the new database is performing. Have they reduced it from running on 45 4xlarge instances? Do backups still take a day? Was it a good financial decision?
Financially it was a good decision. The current cluster is a lot less than the original one, and traffic + data have grown since. we currently maintain 2 clusters of 5 i3.4xlarge machines. That's a total of 10 machines and is a lot cheaper of what we had before.
The DB is performing great. It is flash based and 99.98% of the queries have <1ms latency. Each XDR end holds around 3.1B records, with a replication factor of 2.
midterm load is around 3 (very low) and we are doing around 190K reads p/s plus 37K write p/s at pick load.
Or... they could just run this entire thing using ScyllaDB on a single mid-size VM with local SSDs with headroom to spare. Put 1 in each DC for active/active replication. No enterprise contract needed.
Yes. Low latency lookups are q requirement. Saying that, even double the latency we have now would be okay. More important then latency was actually throughput and high availability. And this was demonstrated by Aeropsike well.
That is why I asked, I am only aware of the use of IOPS in connection with disk I/O and it seems a rather unusual measure for a database where caches hopefully avoid hitting the disk too often, at least for reads. And writing 8 MiB/s assuming 4k clusters seems not really noteworthy especially given three fold redundancy. On the other hand reading 700 MiB/s per second which would also not be affected by the redundancy seems a comparatively big number especially because caches should limit the disk traffic to a small fraction.
This is not true at all, bordering to utter nonsense. Databases try hard to keep the correct set of blocks in memory because that is essential for their performance. Heck, many of the fastest database systems advertise themselves as in-memory databases avoiding disks altogether.
Queries here are measured from the application side. And in our case, those are simple key-value operations such as get(key), update(key, value), write(key, value) and delete(key). Aerospike is known to optimize those into fewer block operation as it can.
No N1QL was involved that needed to be ported/re-architected?
N1QL was our primary use case for using Couchbase at my last position and it worked wonderfully (though at less than half your scale).
For straight key operations, the overhead of all the JVM stuff on couchbase seems like a lot, especially when considering the alternative amount of K/V stores these days.
Glad to hear Aerospike turned out well. Been keeping an eye on them over the past few years.
This is mostly marketing. You can scale to infinity only if you're app is architected in a very particular way and you're properly using the underlying primitives (be prepared to shard, cache, and queue everything at some point; try to put thought into where your demarcation points will be, it will make future decoupling less painful). Even Aurora Postgresql and Mysql only scale so far.
* Off - Data written only to MySql (starting state)
* Secondary - Data written to both MySql and Redis, and MySql shall be the source of truth.
* Primary - Data written to both MySql and Redis, and Redis shall be the source of truth.
* Exclusive - Data written exclusively to Redis.
As mentioned in the article, the Secondary phase allowed some time for the new database to be populated. And the distinction between Primary and Secondary phases gave us a rollback option if something went wrong.