Large Scale NoSQL Database Migration Under Fire

SuddsMcDuff · on March 3, 2018

We've taken a very similar approach when migrating data from one DB to another (MySql to Redis in our case, but the principle should apply to any databases). We split it into 4 phases:

* Off - Data written only to MySql (starting state)

* Secondary - Data written to both MySql and Redis, and MySql shall be the source of truth.

* Primary - Data written to both MySql and Redis, and Redis shall be the source of truth.

* Exclusive - Data written exclusively to Redis.

As mentioned in the article, the Secondary phase allowed some time for the new database to be populated. And the distinction between Primary and Secondary phases gave us a rollback option if something went wrong.

nimrody · on March 4, 2018

The difficulty here is what happens when one of the databases is not available temporarily (network error, etc.) You cannot have a "transaction" cover writes to both systems so you either have to manually undo one of the writes or risk the two systems getting out of sync.

toomuchtodo · on March 4, 2018

This is typically solved (in my experience) using a reconciliation process using transaction GUIDs along with backfill from non-source of truth in the event the data isn't found in the source of truth. As long as a transaction made it into one of your data stores, consistency isn't lost (and if writes failed to both data stores, alarms should go off).

aynsof · on March 3, 2018

Interesting write-up. I'd love to see stats on how the new database is performing. Have they reduced it from running on 45 4xlarge instances? Do backups still take a day? Was it a good financial decision?

barkanido · on March 4, 2018

Financially it was a good decision. The current cluster is a lot less than the original one, and traffic + data have grown since. we currently maintain 2 clusters of 5 i3.4xlarge machines. That's a total of 10 machines and is a lot cheaper of what we had before. The DB is performing great. It is flash based and 99.98% of the queries have <1ms latency. Each XDR end holds around 3.1B records, with a replication factor of 2. midterm load is around 3 (very low) and we are doing around 190K reads p/s plus 37K write p/s at pick load.

seanwilson · on March 4, 2018

> The following post describes how we migrated a large NoSql database from one vendor to another in production without any downtime or data loss.

Are there any good write-ups where a migration went really wrong and how it was fixed?

manigandham · on March 4, 2018

Or... they could just run this entire thing using ScyllaDB on a single mid-size VM with local SSDs with headroom to spare. Put 1 in each DC for active/active replication. No enterprise contract needed.

barkanido · on March 4, 2018

ScyllaDB was actually too late to enter our POC (we had a somewhat tight schedule for migration) but it was a valid candidate nevertheless.

redwood · on March 3, 2018

Would be interested in learning about why they chose the technology they did: was the use case requiring ultra low latency lookups?

barkanido · on March 4, 2018

Yes. Low latency lookups are q requirement. Saying that, even double the latency we have now would be okay. More important then latency was actually throughput and high availability. And this was demonstrated by Aeropsike well.

danbruc · on March 3, 2018

~2000 write IOPS

~180000 read IOPS

What are those IOPS in this case? Queries? Transactions? Disk block accesses?

drodgers · on March 3, 2018

I'm pretty sure they mean operations on the block storage layer (EBS) as reported by AWS CloudWatch monitoring.

It's a standard measure: https://en.wikipedia.org/wiki/IOPS

danbruc · on March 3, 2018

That is why I asked, I am only aware of the use of IOPS in connection with disk I/O and it seems a rather unusual measure for a database where caches hopefully avoid hitting the disk too often, at least for reads. And writing 8 MiB/s assuming 4k clusters seems not really noteworthy especially given three fold redundancy. On the other hand reading 700 MiB/s per second which would also not be affected by the redundancy seems a comparatively big number especially because caches should limit the disk traffic to a small fraction.

z3t4 · on March 4, 2018

Databases usually don't use caches. It's cheaper to just ask the file-system for the data.

danbruc · on March 4, 2018

This is not true at all, bordering to utter nonsense. Databases try hard to keep the correct set of blocks in memory because that is essential for their performance. Heck, many of the fastest database systems advertise themselves as in-memory databases avoiding disks altogether.

barkanido · on March 4, 2018

In Aerospike case, they do not use a file system. they own the mount point and access the disk blocks directly, optimizing for local SSD disks.

barkanido · on March 4, 2018

Queries here are measured from the application side. And in our case, those are simple key-value operations such as get(key), update(key, value), write(key, value) and delete(key). Aerospike is known to optimize those into fewer block operation as it can.

dkoston · on March 5, 2018

No N1QL was involved that needed to be ported/re-architected?

N1QL was our primary use case for using Couchbase at my last position and it worked wonderfully (though at less than half your scale).

For straight key operations, the overhead of all the JVM stuff on couchbase seems like a lot, especially when considering the alternative amount of K/V stores these days.

Glad to hear Aerospike turned out well. Been keeping an eye on them over the past few years.

jtchang · on March 4, 2018

Isn't the point of AWS / RDS to be able to scale infinitely? I don't remember if RDS has one of their data backends that they are using.

toomuchtodo · on March 4, 2018

This is mostly marketing. You can scale to infinity only if you're app is architected in a very particular way and you're properly using the underlying primitives (be prepared to shard, cache, and queue everything at some point; try to put thought into where your demarcation points will be, it will make future decoupling less painful). Even Aurora Postgresql and Mysql only scale so far.

One click it is not.

ddorian43 · on March 3, 2018

They probably are GET and INSERT/UPSERT/UPDATE single rows. Seems kinda low, need to also say how many GB 2B rows are.