"almost always" atomicity means "not atomic". (if i guess correctly, there is no stage/commit phase for data mutations in the aof persistence, so i believe incremental changes are written to the AOF as the script runs so you can have partially-applied updates on restore if your redis process dies in the middle of executing a watch/multi/exec or lua script.) also, instances die all the time.
Hello, I'm not sure of the setup Redis Labs is using, but vanilla Redis AOF does not allow partially-applied updates. Not for Redis transactions (MULTI/EXEC) nor with scripting. The same happens in the replication channel. In order to enforce this, Redis goes a very long way to avoid partial applications of Lua scripts, more info are in the EVAL page and in the -BUSY error in case of scripts not returning that already made writes.
ah, thank you. I should have checked the docs! It is nice to know that you are writing the script and multi/exec semantics into the AOF log (at least in the default config) and that my guess is wrong.
I still wonder what the details are around the "almost" in "almost always" and stand by the conclusion that "almost atomic" is not the same as "atomic".
My best guess is that the author is referring to the fact that there are no rollbacks in Redis transactions, but I'm not sure. I'll try to ask internally. Thanks!
This where the durability argument has a problem as well.
At a first glance, this looks like an un-replicated system, which means that the loss of an instance is an availability nightmare.
The worrying quote from the article for me is this one
>> With Redis Enterprise, it is easy to create a master-only cluster and have all shards run on the same node
Another node has to be brought in and attached to the same network disk to restore access to that key range?
500k+ ops/sec is nothing to laugh about on a single node with 1:1 read-write ratios, however the fragility of this system is concerning.
Half a decade ago, I was working with row-level atomic ops in ZBase/Membase (set-with-cas[1]), which gets away with using replication instead of an ssd backing the durability of operations + an fsync - the 99% latencies were at 3-4ms, but the scalability and availability were baked in.
I'm not sure how the append only mode works but I tried to run a script (appendfsync directive set to always) which sets a new key in an endless loop and kill the server in the middle of the execution.
No changes are written to the AOF so I guess the operation is atomic.
Re the first question - enterprise Redis includes, besides the management utilities you can expect from such a product - a different cluster architecture (that actually predates Redis cluster) and cluster manager; It doesn't require a cluster aware client as it uses a proxy tier (that also does implicit pipelining and other stuff to increase throughput); And the next version of it will include some extra features like flash support, extra modules, etc.
Re the second and third questions - I'm afraid I cannot answer. I hope the author of the post will reply tomorrow morning (it's the middle of the night here).
To add to what dvirsky said [and I'm also from Redis Labs]
- Redis Enterprise adds some enhancements to the Redis storage layer (details are in the blog).
- This benchmark only tested the Redis Enterprise. The idea was to show how fast Redis (Enterprise) can run on a single node with ACID transactions and still keep sub-millisecond latency. Note the hidden point - at the moment you cannot achieve sub-millisecond latency over the cloud (any cloud) storage, I mean persistent storage that is attached to an instance and not the local storage, which is ephemeral by design. So in-order to see how far we can go we decided to test it over Dell-EMC's VMAX that doesn't have these limitations
- In theory adding CPUs/cores can of course help, as you can add more shards to the Redis cluster and increase the parallelism when accessing the storage. That said, we haven't tested it over AMD ThreadRipper.
To add to what dvirsky and Yiftach said, this video [1] from RedisConf17 gave me the best understanding of the storage layer of Enterprise Redis. Before that, it was hard for me to break through the marketing fluff. Amazing what it can do with NVMe and/or Optane.
I'm curious on all of these as well. Also, is this a first party (ie antirez) piece of software? If not, has he blessed calling something "Redis Enterprise"?
Yes. Full disclosure: I work for Redis Labs, but prior to that I designed the back-end of a server-heavy mobile app (EverythingMe launcher) with Redis as its "front-end database". Meaning it wasn't the only source of truth, but it wasn't a cache either - we served data to users from Redis only, while some of it was migrated (live or in batch) from MySQL, and other was the output of machine learning jobs that filled up data in Redis.
Some of it was done with raw Redis commands, and for some we wrote a data-store engine with automatic indexing and optional mapping to models - https://github.com/EverythingMe/meduza
We also used Redis for geo tagging, queuing, caching, and other stuff I forgot probably. It is very flexible, but requires some effort when not used as just a cache.
I use it for a specialized time-series storage / messaging layer. We are receiving stock market data directly, normalizing it into JSON and also PUBLISHing these objects via Redis to consumers (generally connected through a custom WebSocket gateway). We basically turn the whole US stock market into an in-memory sea of JSON, optimized for browser-based visualization.
Redis is great because of its multiple data structures.
Depending on their "kind", these JSON objects are either `APPEND`ed onto Redis Strings (e.g. for time&sales or order history) or `HSET` (e.g. opening/closing trade) or ZSET (e.g. open order book).
Sometimes an object transitions from a SortedSet to a String. We used to handle this with `MULTI` but now we use custom modules to do this with much better performance (e.g. one command to `ZREM`, `APPEND`, `PUBLISH`).
We run these Redis/feed-processor pairs in containers pinned to cores and sharing NUMA nodes using kernel bypass technology (OpenOnload) so they talk over shared-memory queues. This setup can sustain very high throughput (>100k of these multi-ops per second) with low, consistent latency. [If you search HN, you'll see that I've approached 1M insert ops/sec using this kind of setup.]
We have a hybrid between this high-performance ingestion and long-term storage. To reduce memory pressure (and since we don't have 20 TB of memory), we harvest these Redis Strings into object storage (both NAS and S3 endpoints) with Postgres storing the metadata to facilitate querying this.
We also do mundane things like auto-complete, ticker database, caching, etc.
I love this tech! It's extremely easy to hack Redis itself and now with modules you don't even need to do that anymore.
I'm using Redis as the main data store for my semi-serious chat app project. For better or worse, all tables, including the userdb is stored in Redis.
In general, an extra layer on top of Redis is a must to make it even simple searchable database. Especially indices can't be ad-hoc implemented with separate Redis commands. Only transactions or Lua snippets that update both data and related indices atomically can avoid data corruption. For my own use, I wrote: https://github.com/ilkkao/rigidDB It supports indices and strict schema for the data.
I think somebody has described Redis at some point as a database-SDK. Makes sense to me.
I used redis' lua scripting to implement a per user/account cache; I wanted to provide a memcached instance per account, but also enforce a limit on cache size so a single account couldn't cache GBs of data and detoriate the service for everyone. I used hashes to track items and their sizes per account, and simply calculated the available size of all live objects on insert.
Hyperdex is no longer maintained [1]. While the technology is impressive, the author seems to have lost interest, and is now working on something called Consus [2].
Hyperdex's problem all along was that the author — a very talented developer from what I can tell — seems more invested in his projects from the perspective academic research (he's at Cornell) than in delivering a practical, living open source project. He tried to form a company around Hyperdex (the transactional "Warp" add-on thing was commercial) even though nobody seemed to be using it; and he was the sole developer. Unfortunately, as interesting as Consus is, history seems to be repeating itself there.
But yeah, Hyperdex seemed to have real potential at one point. It was the only NoSQL K/V store (at the time) that had transactions.
Good point, but at least you could try out Hyperdex and consider whether you wanted transactions. But this is pretty moot at this point, unless someone picks up Hyperdex development again.
I guess I don't understand who requires less than 1ms latency on ACID writes in a system accessible only through socket interfaces. Even so - isn't this benchmark simply pushing the requirement of fast disk syncs onto a fast flash drive designed for DMA? I mean, okay.. I guess... did customers actually think the networking wasn't the latency culprit?
FWIW, I just get a blank page in FF with uBlock running. I believe it's the "naked-social-share" which various 3rd party extensions (Fanboy's) is blocking.
Redis won't function till it has loaded all data from disk to memory. So if your webapp doesn't have too much data then possibly.
Also since Redis is memory backed you need more RAM then data. This can get very costly.
Another annoyance is that Redis is single process single threaded so you really have to avoid running long running queries unless you do extensive manual sharding.
(Disclaimer: it's been a few years since I had to think about these constraints so maybe some are removed in more recent versions of Redis)
Sort of - the model it supports is not really multi-threaded. The modules can spawn threads, and acquire a "GIL" when they want to touch actual Redis data - thus only one thread at a time actually "owns" the entire Redis instance.
This allows long running queries to do primitive cooperative multi-tasking, releasing the GIL and letting other queries have a chance; But there is no real parallel data access. You will only gain real parallelism if you have actual work to do that does not touch the data directly when a thread is not touching the GIL. There aren't many cases that this applies to - usually copying the data aside to do work on it is not worth the gain of parallelism.
[disclosure I'm from Redis Labs] - around 50% of our 7K+ paying customers use Redis as a database.
In general, you can setup an environment in which Redis is HA (replication + auto-failover) and persistent.
Redis Enterprise does it by default + backup and DR + some enhancements to the Redis storage layer (as mentioned in this blog. These enhancements together with the high end storage device by Dell-EMC allowed us to reach this throughout & latency. BTW, this test ran on a single node, you can scale it by just adding node(s) to the cluster.
As far as I know, AOF is still not the default setting. It's important to point this out, or otherwise Redis will suffer the same fate as early MongoDB, which had a similar default persistence model which many users didn't fully understand.
As long as your data will fit in RAM in a cost-effective way, and you can scale your cluster when it exhausts its memory. AFAIK Netflix are using it as a "real database", but of course not for everything - https://medium.com/netflix-techblog/introducing-dynomite-mak...
All I see with uBlock is a black screen. I have to whitelist over a dozen ad trackers to see the content. Several ad scripts pull in even more scripts from external domains. Try it yourself.