Thanks for answering questions, any comments or progress on aphyr's complaints a...

antirez · on March 5, 2014

Given that Apyhr's investigation in the blog post you cite does not involve anything implemented or planned for Redis, I'm not able to reply to the question. Maybe you fell in the mediatic trap of considering that discussion about Redis turning into a CP system using the proposed algorithm. If you re-read the post, Aphyr itself explains that this is not Redis Cluster.

If you are curious about what triggered the post, is my claim that WAIT per se as a primitive does not mean CP or any other kind of consistency, but that it is defined by the rest of the system. To prove my point, I orchestrated a toy and god-driven distributed system where WAIT turned into a strong consistent primitive. The point is that you can replace the "God" part with other stuff to get it real-world.

Apyhr modeled it and found that if you don't read the same way you write (via agreement) the system is inconsistent when you consider reads, but I think this was common knowledge. Every CP system in the base case needs to follow the same path for writes and reads. However most of the times you can invent optimizations to speedup reads.

p.s. Redis replication is asynchronous by default. The WAIT primitive in Redis unstable does just what it claims, that is, to return an acknowledgment when the specified number of replicas received some data. The practical effect is that you can make writes safer for a (big) latency cost.

lucian1900 · on March 5, 2014

You answered a different question. Are you using a verified consensus algorithm? If not, why are you making a distributed database?

You're not inspiring confidence in using anything but single node Redis.

antirez · on March 5, 2014

Actually I tried to reply to the question of the original poster. Now I'll try to reply to your question: consensus algorithms are only used to implement the strong consistency requirements of CP databases, like for example Zookeeper. Other databases that don't feature strong consistency like Riak and Cassandra don't rely on consensus algorithms. Redis also has a form of weak consistency that does not require consensus (the merging function is, in the special case of Redis, just picking the "history" that seems the most updated, plus other systems to try to bound divergence to a given fixed amount). You can read more about the exact implementation here: http://antirez.com/news/70

This is sometimes referred as "optimistic replication" in the literature AFAIK but I'm not 100% sure.

lucian1900 · on March 5, 2014

So you're going for AP, not CP? That is a valid choice, but it should be explicit.

However, from your description it would appear to not even be AP. It's neither, which means it doesn't guarantee uptime and or a causal connection between accesses. What exactly does it guarantee?

As I said previously, single node Redis is brilliant and I'm grateful for your work on it. However, I am not convinced you know what you're doing when it comes to distributed databases, so I won't be touching Redis Cluster or Sentinel.

seiji · on March 6, 2014

I am not convinced you know what you're doing

I feel you've read the aphyr writings, but didn't follow the story through to the end. He wasn't testing the systems entirely as they were designed to be used.

Sentinel is just a Redis-aware consensus driven failover service. Instead of writing your own "if master is unreachable, promote replica to master" script, Sentinel will do that for you and at the same time notify all your clients to switch to the new master. It's kinda nice that way.

Redis Cluster has very specific use cases. Not every distributed software doodad should be used in any situation we can imagine. Things have design goals. Used the right way, you get happy times. Used the wrong way, you get grumpy cat.

lucian1900 · on March 6, 2014

> Sentinel is just a Redis-aware consensus driven failover service

But its consensus algorithm is broken to the point where it will lose half of your recent data any time an election happens (much like MongoDB).

> Redis Cluster has very specific use cases.

What are those? It's neither CP nor AP apparently. What guarantees does it make?

antirez · on March 6, 2014

> But its consensus algorithm is broken to the point where it will lose half of your recent data any time an election happens (much like MongoDB).

lucian1900, it sounds like you are confused about consensus about failover, and consensus about data.

Redis Sentinel uses a form of consensus in order to have a consistent view of the configuration, however this does not mean the store it will failover turns into a CP system that features strong consistency.

So if you consider Sentinel + Redis as an unique system, Sentinel guarantees that every given partition of Sentinels agree about the configuration (as a general case when all the partitions heal, all Sentinels will agree). It also guarantees to failover only in the majority side.

However when there is a partition and there are clients with a master that is in the minority, Sentinel does not make Redis automatically consistent, since Redis is asynchronously replicated.

In order to put a bound to the desynchronization possible (and the amount of writes you can lose in this scenario), you can configure Redis so that if it is not connected to N slaves receiving acks for M milliseconds, it stops accepting writes.

So what you do is to asynchronously sense the split and bound the write loss.

Example: you have three nodes, each run a Sentinel and a Redis instance. If your master gets partitioned with a few clients, and you configured the option I was referring to, after a few milliseconds or seconds, depending on your config, it will stop accepting writes. On the other side (in the majority side) a slave will get elected a master, and the clients will be able to write. When the partition heals all will be able to write to the new master.

About CP VS AP, I'll reply to another comment asking the same.

redbad · on March 5, 2014

It seems that antirez believes he's allowing Redis to approach AP and/or CP semantics by providing primitives that users can compose in ad-hoc ways according to their requirements.

Unfortunately, that's not a valid methodology to achieve availability or (especially) consistency.

antirez · on March 5, 2014

In Redis Cluster there is nothing you can do to tune availability or consistency. The only thing you can tune is write "safety" using WAIT, but actually most users should never use it since the system is mostly designed to be low-latency.

Redis Cluster is not AP nor CP. It resembles more AP since it has no strong consistency, however it is not able to reach "A" of AP that is very strict. Like AP systems have some form of consistency, but not "C" of CAP. Redis does this for "A" as well, it has some form of availability, but not as strict as "A" of CAP.

Why this? You'll notice that there are not AP database systems with the powerful commands of Redis, and that there are not CP systems with the latency of Redis. So it is a very opinionated trade off to get both sacrificing other stuff (especially "A").

leobelle · on March 5, 2014

If it's not AP or CP, then what is the value? Cassandra is eventually consistent. Consistency is pretty important. It seems like Redis Cluster doesn't have consistency, nor availability, nor partition tolerance. Maybe I misunderstand it though.

antirez · on March 6, 2014

> Cassandra is eventually consistent. Consistency is pretty important.

Your above statement unfortunately has no match in distributed systems theory or practice. Basically eventually consistent means that the system is not consistent from the point of view of CAP, so it does not feature strong consistency.

As an alternative eventual consistency means: our contract with the user is, that, at least, when there are no partitions, the system will no show split-brain conditions, so a given information will converge to a given value for all the nodes.

This alone is a contract with the user, but does not tell a lot, it just tells you:

1) No strong consistency, sorry. 2) No split-brain conditions when the partitions heal.

So "It features eventual consistency, consistency is important" statement, I'm afraid, means that you don't have the tradeoffs about distributed systems clear.

Now, if EC alone does not mean a lot, how to classify an EC system? It depends on the exact implementation of what happens when there is a partition, and what happens when the partition heals.

There are EC systems that are configured by default to accept writes in the minority side (even in partitions when there is a single DB node if we want to honor "A" of CAP), and later will merge depending on "wall clock" timestamps.

This is a form of consistency (and is weak), but is not Consistency with capital-C of CAP.

Other systems will use vector clocks instead of timestamps, which is stronger but slower. Other systems will instead merge values together doing a "Set union" operation. Here we still have no strong consistency, but the safety of the system is improved.

Now that we understand what means strong consistency, and what means eventual consistency, we can talk about Redis Cluster from this point of view. In Redis Cluster, which is eventual consistent, when the partition heals all the nodes will agree by selecting an history between the histories available for a given "hash slot" (a set of keys). However we need to consider what happens during partitions to have a better picture of what happens. In the case of Redis Cluster the minority side of the partition will stop accepting writes after some time. This gives up "A" of CAP in order to put a bound to how much two histories can diverge.

Ok, now let's explore "A". CAP Availability is the ability to reply with a positive reply (so, basically, to be able to accept writes is required) to requests even in a minority partition with a single node.

This is a very strong idea of availability, but this is not the only possibility to be fault tolerant. For example CP systems don't feature "A", but they are able to work correctly even if N/2-1 nodes will fail. This is a pretty strong failure tolerance! So they are reliable systems for a lot of works even if they don't have "A".

As a counter example, Cassandra or Riak would still be very useful tools even without "A" availability, but just with N/2-1 nodes allowed to fail.

Ok, back to Redis Cluster. It does not provides "A" since minority partitions are not available. It does not provide "C" since it is eventually consistent. But it still provides a form of consistency (its contract with the user about what happens during partitions) and provides some degree of resilience to failures.

AP or CP are just the theoretical limits you can get if you want to exploit the max in this regard, however it is not like a system is useless if it is not AP or CP.

In the case of Redis Cluster to renounce to "C" was obvious since it is asynchronously replicated and is not a good fit for high latency interactions. To renounce to the strong availability of "A" was a tradeoff to feature Redis data structures that are limited in size (number of elements) only by available memory.

Suggested reading about EC: http://www.bailis.org/blog/safety-and-liveness-eventual-cons...

leobelle · on March 6, 2014

Thank you for your replies to my questions! For what it's worth I love the redis interface and appreciate your indepth information.