A Primer on Database Replication (2017)

weitzj · on Jan 6, 2019

I highly recommended reading “Designing data intensive applications “ by Martin Kleppmann to get a thorough overview with lots of references. Reading this book is a timesaver compared to finding all these information across blog posts.

https://www.amazon.de/dp/1449373321/

loevborg · on Jan 6, 2019

I couldn't agree more. What DDIA gives you, most of all, is context - why does replication (or partitioning, or horizontal scaling, or ACID) matter? A must read for any working programmer today.

hardwaresofton · on Jan 6, 2019

This is a pretty much complete introduction to database replication techniques.

If you're more interested in the quorum-ing tech that's come about (algorithms like paxos and raft) and the underlying distributed systems research I've recently tried to compile all the paxos family of algorithms in a blog post[0]. There has been a lot of work recently (notably EPaxos, WPaxos, SDPaxos) on tuning and improving the algorithms, including consideration for long distance WAN connections (that's what the W in WPaxos stands for).

[0]: https://vadosware.io/post/paxosmon-gotta-concensus-them-all

zzzcpan · on Jan 6, 2019

> https://vadosware.io/post/paxosmon-gotta-concensus-them-all

This is pretty good non-FUD post about distributed systems (unlike many posts related to consistency and consensus coming from distributed database startups and megacorps).

Although missing some important real world considerations in reasoning, namely latency in WAN setups (reducing number of RTTs isn't really about that).

hardwaresofton · on Jan 7, 2019

I'm not sure I agree -- reducing the number of round trips is definitely important in reducing latency for the system in the WAN context, a quote from the SDPaxos paper:

> In the wide area, latency is dominated by network communication,which is decided by the number of round trips, and the distance tothe replica to contact. The test for Multi-Paxos is omitted becauseits disadvantage is obvious: client has to communicate with theremote leader, as long as it is not co-located with the leader.The replicas and clients for wide-area experiments are deployedin California (CA), Oregon (OR), Ohio (OH), Ireland (IRE) and Seoul(SEL). The sequencer of SDPaxos locates in CA. The round-triptimes (ping latencies) between these regions are shown in Table 1.

I think this case is even worse than most real world cases people are dealing with today. Google/Facebook might have data centers this spread out, but I feel like 90% of the people who read that post will be worried about multiple regions in America.

gtowey · on Jan 6, 2019

This article is pretty thurough on the subject of replication. It goes through pretty much all the options and trade-offs I've learned over my career in database operations.

The takeaways, I hope people will get from this article is that there is no one solution which will work best in all situations. It is a design decision about what advantages your application needs to have and what sacrifices you are able to make. The more limited and strict you can be about how your application reads and writes data, the more options you have to make your database backend more robust. Applications that "try to have it all" usually just end up doing everything poorly.

The other takeaway is the the physical limits of the universe are the greatest barrier database systems are working against. You simply cannot get around them. When you end up working with a team who is unwilling to acknowledge this, run as fast as you can.

elvinyung · on Jan 6, 2019

Very tiny nit: DynamoDB is not Dynamo. Leaderless replication was popularized by Amazon's original Dynamo paper, true, but DynamoDB is not the same system [1].

[1] https://www.allthingsdistributed.com/2012/01/amazon-dynamodb...

ansible · on Jan 6, 2019

There are solutions to this now:

https://cockroachlabs.com

https://foundationdb.org

Of course, there are still tradeoffs, but for many applications, a distributed database with strong consistency guarantees is likely to be a good choice for many projects.

ansible · on Jan 6, 2019

Also of interest... The Raft consensus algorithm is used as an alternative to Paxos to decide who is in charge of a particular piece of data in a distributed system:

https://raft.github.io

zzzcpan · on Jan 6, 2019

The tradeoffs those solutions make are exactly the ones most projects cannot really make.

marknadal · on Jan 6, 2019

This is a really great article and overview. It is a tad light on the most important subjects of CRDTs (Conflict free Replicated Data Types).

After having worked on these types of problems for a decade, and successfully running an Open Source database startup for nearly a half decade, they are certainly and only the right way to go.

They seem to suggest they aren't well adopted, and suggest there are severe limitations that are withholding their adoption.

This was true with append-only or log-based CRDTs.

But now, state-based graph CRDTs (like we've implemented in https://github.com/amark/gun ) have solved that. The only thing you can't do with them is Global Strong Consistency (think banking), but their Strong Eventual Consistency guarantees make them the best solution for literally everything else.

They are also run in production at large large sites, like the Internet Archive (top 300 site globally), D.Tube (1M monthly uniques), notabug.io (P2P reddit), etc. with GUN.

The article kind of makes this joke:

> (otherwise our lives would be too easy, right?)

But here is the kicker, both Internet Archive and NAB integrated/built in 1 week. Literally yes it makes people's lives easier.

And you should too! Next time you build a non-banking app, you should consider using state-based graph CRDT to cover 99.99% of your use cases!

kilburn · on Jan 6, 2019

You are getting downvoted because of your attitude. Even if your GUN database is really that awesome, here are some points for you to reflect on:

- What is "easy" for you might not be for others.

- Banking systems don't use strong consistency. They can just rollback/swallow the costs of invalid transactions after the fact. Online mom-and-pop shop... maybe they can, maybe its better for the site to just have some downtime sporadically and avoid such hassles entirely.

- P2P is a nightmare for anything that requires audit trails (and there are legal requirements about that in many industries).

- There exists a whole ecosystem of databases dedicated to analytical processing workloads (OLAP). Graph databases are typically not the best at it.

- Conflicts happen. Your "conflict-free" datatypes embed the conflict resolution rules in the data structure. This is fine for some conflict resolution rulesets, but it cannot be done for some other sets (e.g.: in these weird circumstances, the user decides what the resolution is by clicking a button).