Oh, I didn't expected to see a patch release posted here. As I wrote in the announce this patch release is a bit special as it adds a few new features so at least there is something to explore. Btw ask me anything about this release if something is not clear, I'll try to reply at my best.
Given that Apyhr's investigation in the blog post you cite does not involve anything implemented or planned for Redis, I'm not able to reply to the question. Maybe you fell in the mediatic trap of considering that discussion about Redis turning into a CP system using the proposed algorithm. If you re-read the post, Aphyr itself explains that this is not Redis Cluster.
If you are curious about what triggered the post, is my claim that WAIT per se as a primitive does not mean CP or any other kind of consistency, but that it is defined by the rest of the system. To prove my point, I orchestrated a toy and god-driven distributed system where WAIT turned into a strong consistent primitive. The point is that you can replace the "God" part with other stuff to get it real-world.
Apyhr modeled it and found that if you don't read the same way you write (via agreement) the system is inconsistent when you consider reads, but I think this was common knowledge. Every CP system in the base case needs to follow the same path for writes and reads. However most of the times you can invent optimizations to speedup reads.
p.s. Redis replication is asynchronous by default. The WAIT primitive in Redis unstable does just what it claims, that is, to return an acknowledgment when the specified number of replicas received some data. The practical effect is that you can make writes safer for a (big) latency cost.
Actually I tried to reply to the question of the original poster. Now I'll try to reply to your question: consensus algorithms are only used to implement the strong consistency requirements of CP databases, like for example Zookeeper. Other databases that don't feature strong consistency like Riak and Cassandra don't rely on consensus algorithms. Redis also has a form of weak consistency that does not require consensus (the merging function is, in the special case of Redis, just picking the "history" that seems the most updated, plus other systems to try to bound divergence to a given fixed amount). You can read more about the exact implementation here: http://antirez.com/news/70
This is sometimes referred as "optimistic replication" in the literature AFAIK but I'm not 100% sure.
So you're going for AP, not CP? That is a valid choice, but it should be explicit.
However, from your description it would appear to not even be AP. It's neither, which means it doesn't guarantee uptime and or a causal connection between accesses. What exactly does it guarantee?
As I said previously, single node Redis is brilliant and I'm grateful for your work on it. However, I am not convinced you know what you're doing when it comes to distributed databases, so I won't be touching Redis Cluster or Sentinel.
I feel you've read the aphyr writings, but didn't follow the story through to the end. He wasn't testing the systems entirely as they were designed to be used.
Sentinel is just a Redis-aware consensus driven failover service. Instead of writing your own "if master is unreachable, promote replica to master" script, Sentinel will do that for you and at the same time notify all your clients to switch to the new master. It's kinda nice that way.
Redis Cluster has very specific use cases. Not every distributed software doodad should be used in any situation we can imagine. Things have design goals. Used the right way, you get happy times. Used the wrong way, you get grumpy cat.
> But its consensus algorithm is broken to the point where it will lose half of your recent data any time an election happens (much like MongoDB).
lucian1900, it sounds like you are confused about consensus about failover, and consensus about data.
Redis Sentinel uses a form of consensus in order to have a consistent view of the configuration, however this does not mean the store it will failover turns into a CP system that features strong consistency.
So if you consider Sentinel + Redis as an unique system, Sentinel guarantees that every given partition of Sentinels agree about the configuration (as a general case when all the partitions heal, all Sentinels will agree). It also guarantees to failover only in the majority side.
However when there is a partition and there are clients with a master that is in the minority, Sentinel does not make Redis automatically consistent, since Redis is asynchronously replicated.
In order to put a bound to the desynchronization possible (and the amount of writes you can lose in this scenario), you can configure Redis so that if it is not connected to N slaves receiving acks for M milliseconds, it stops accepting writes.
So what you do is to asynchronously sense the split and bound the write loss.
Example: you have three nodes, each run a Sentinel and a Redis instance. If your master gets partitioned with a few clients, and you configured the option I was referring to, after a few milliseconds or seconds, depending on your config, it will stop accepting writes. On the other side (in the majority side) a slave will get elected a master, and the clients will be able to write. When the partition heals all will be able to write to the new master.
About CP VS AP, I'll reply to another comment asking the same.
It seems that antirez believes he's allowing Redis to approach AP and/or CP semantics by providing primitives that users can compose in ad-hoc ways according to their requirements.
Unfortunately, that's not a valid methodology to achieve availability or (especially) consistency.
In Redis Cluster there is nothing you can do to tune availability or consistency. The only thing you can tune is write "safety" using WAIT, but actually most users should never use it since the system is mostly designed to be low-latency.
Redis Cluster is not AP nor CP. It resembles more AP since it has no strong consistency, however it is not able to reach "A" of AP that is very strict. Like AP systems have some form of consistency, but not "C" of CAP. Redis does this for "A" as well, it has some form of availability, but not as strict as "A" of CAP.
Why this? You'll notice that there are not AP database systems with the powerful commands of Redis, and that there are not CP systems with the latency of Redis. So it is a very opinionated trade off to get both sacrificing other stuff (especially "A").
If it's not AP or CP, then what is the value? Cassandra is eventually consistent. Consistency is pretty important. It seems like Redis Cluster doesn't have consistency, nor availability, nor partition tolerance. Maybe I misunderstand it though.
> Cassandra is eventually consistent. Consistency is pretty important.
Your above statement unfortunately has no match in distributed systems theory or practice. Basically eventually consistent means that the system is not consistent from the point of view of CAP, so it does not feature strong consistency.
As an alternative eventual consistency means: our contract with the user is, that, at least, when there are no partitions, the system will no show split-brain conditions, so a given information will converge to a given value for all the nodes.
This alone is a contract with the user, but does not tell a lot, it just tells you:
1) No strong consistency, sorry.
2) No split-brain conditions when the partitions heal.
So "It features eventual consistency, consistency is important" statement, I'm afraid, means that you don't have the tradeoffs about distributed systems clear.
Now, if EC alone does not mean a lot, how to classify an EC system? It depends on the exact implementation of what happens when there is a partition, and what happens when the partition heals.
There are EC systems that are configured by default to accept writes in the minority side (even in partitions when there is a single DB node if we want to honor "A" of CAP), and later will merge depending on "wall clock" timestamps.
This is a form of consistency (and is weak), but is not Consistency with capital-C of CAP.
Other systems will use vector clocks instead of timestamps, which is stronger but slower. Other systems will instead merge values together doing a "Set union" operation. Here we still have no strong consistency, but the safety of the system is improved.
Now that we understand what means strong consistency, and what means eventual consistency, we can talk about Redis Cluster from this point of view. In Redis Cluster, which is eventual consistent, when the partition heals all the nodes will agree by selecting an history between the histories available for a given "hash slot" (a set of keys). However we need to consider what happens during partitions to have a better picture of what happens. In the case of Redis Cluster the minority side of the partition will stop accepting writes after some time. This gives up "A" of CAP in order to put a bound to how much two histories can diverge.
Ok, now let's explore "A". CAP Availability is the ability to reply with a positive reply (so, basically, to be able to accept writes is required) to requests even in a minority partition with a single node.
This is a very strong idea of availability, but this is not the only possibility to be fault tolerant. For example CP systems don't feature "A", but they are able to work correctly even if N/2-1 nodes will fail. This is a pretty strong failure tolerance! So they are reliable systems for a lot of works even if they don't have "A".
As a counter example, Cassandra or Riak would still be very useful tools even without "A" availability, but just with N/2-1 nodes allowed to fail.
Ok, back to Redis Cluster. It does not provides "A" since minority partitions are not available. It does not provide "C" since it is eventually consistent. But it still provides a form of consistency (its contract with the user about what happens during partitions) and provides some degree of resilience to failures.
AP or CP are just the theoretical limits you can get if you want to exploit the max in this regard, however it is not like a system is useless if it is not AP or CP.
In the case of Redis Cluster to renounce to "C" was obvious since it is asynchronously replicated and is not a good fit for high latency interactions. To renounce to the strong availability of "A" was a tradeoff to feature Redis data structures that are limited in size (number of elements) only by available memory.
Since I switched to Redis, I had massive drop in resource requirements, saving me a lot of money. Thank you very much.
The keyspace notifications are great but I use NodeJS at the moment to listen to the events. When will it be possible to run Lua to remove the current point of failure? On the topics page, it says "In the future". Are we looking at years or months?
Awesome that you improved your stack with Redis, thanks for sharing. I'm not sure about what exact feature you are referring to, please could you describe it? Thanks.
Currently, NodeJS subscribes to expired keys. When the key is expired it will delete the 'physical' data from the server (thumbnails, more indepth info, etc). If NodeJS fails, even for a few seconds, I will forever have the 'physical' data stored on the server (as it doesn't re-run the expired event).
Having lua scripting do what NodeJS currently does would fix the problem.
In order to simplify the development process I've this strange "rule" that new things are admissible in the stable branch patch releases as long as they are very very unlikely to interfere with all the rest.
For example in this case, a testing environment and the other Sentinel changes actually improve reliability. From the point of view of BITPOS, it is a new command that if not used should not modify the behaviour/stability of the rest of the system in any way.
However compare this process with creating a new release. With 2.8.7 I just need to write a changelog, tag the release on Github, and update a few numbers in the web site. With a release the process is very different, we have a number of release candidates before switching to stable and so forth.
For me, beauty of redis lies in being 'data structure server '. All kinds of lists/sets/hashes redis has are very powerful. But in last few releases redis has not introduced many new types. I would like to see more focus on that in future releases, please. Also improvements on memory consumption. Thanks for all work behind redis. You guys rock, take a bow!!
I totally think you are right. Redis 3.2 was already announced to be the "back to API" release where more data structures and APIs will be added.
Oh well, in the next weeks we'll get native support for bloom filters, I hope you'll like this! And even without introducing a new data structure, they'll use strings like all the current bit operations.
I also like your this post on open source
http://antirez.com/news/48
I often share it with people who ask why they should write or contribute to open source.
I think it's bad manners to act like if the author owed you something. Redis is a monumental personal effort, and a lot of people try to help in many different ways because they are grateful such a tool exists. If you want to improve Redis, you are welcome to discuss any issue and even provide a patch, but you have to be respectful. A comment like yours is very demoralizing for open source contributors.
Hi mrmondo, I respect your opinion, but I think this is really a matter of individual design taste.
I think that persisting the mutable state in the configuration file is a good thing, since actually in the Sentinel case, the initial configuration is the statement of a given state. As the state evolves it is reflected in the configuration.
The version of Sentinel provided with 2.8.7 also provides a full API that allows to change the configuration while the system is running, so you can automate everything without even having to touch the configuration file at all if you don't like it.
I started using Redis for storing basic/low-level session store data for my python web stack and have been amazed at just about every feature available. Its a breeze to work with and speedy fast. Thank you for the continued work Redis folks!
EDIT: sorry replied with a mobile device. Basically we store digests there at Github while the tarballs are in a different server running at Linode. The idea is that is unlikely that an attacker can gain access to both the Github repository and the server where the tarballs are stored.
However security conscious users need to download the tar.gz and verify the digest manually or via a script (the format is computer parseable).
Hey, at this point I think Sentinel is considered Stable by the Redis community. It is relatively new code, but the second incarnation uses better algorithms and was applied in a number of production environments.
What we recommend currently is just to test it in your specific environment, and to understand the way it works. The weak thing currently, to be honest, is the documentation, but I'm working at it.
Hey, thanks for your comment and all the work you put in Redis!
I talked to my co workers, and we agreed to put some more time into researching and giving sentinel a shot. You are right about the documentation though.. It isn't very clear and that is probably one of the reasons why we didn't go for it right away.