Cloud Memorystore: A fully managed in-memory data store service for Redis

dudus · on May 9, 2018

There's a fine print that says:

> ¹Basic Tier instances experience a downtime and a full cache flush during scaling. Standard Tier instance experience very minimal downtime and loss of some unreplicated data during scaling operation. ²Applicable for GA release only.

Does that mean that right now there are no persistance options with this Managed Redis cluster? I have never tried to use Redis but I've been eyeing it for a while and the most confusing part is how it handles persistence.

Apparently there are ways to have Redis configured to be pretty durable as outlined here:

https://redis.io/topics/persistence

But it doesn't look like Google is supporting either RDB nor AOF with this managed service, they say RDB is coming though.

Doesn't it heavily limits the use cases for this product and relegate it to a cooler memcache?

This is not criticism, it's just curiosity on good use cases for this product.

antirez · on May 9, 2018

Maybe this is just for the initial period. Redis has orchestration commands that allow to stop the clients in one side, migrate, shutdown the first instance, and resume the other, without the possibility of getting writes in the middle of the transition. Other providers do that to ensure safety of operations.

dudus · on May 10, 2018

Thanks for chiming in. I'm a big fan of your work. Did Google consult you on this product before launch?

antirez · on May 10, 2018

Nope, no vendor does (fortunately). AFAIK people get the "Redis way": grab the source and do whatever you want with such code, in the limits of the license / law. Also I get near no contacts from big corps using Redis at scale. This is a sane model otherwise my work would be dealing with communications instead of writing code :-) However when technical people from big companies ping me about bugs / issues / improvements ... then it's great feedback, but is rare.

mrkurt · on May 9, 2018

It sounds like they're scaling by adding larger instances, replicating, and then failing over. So they'll lose any data that hasn't been synced to the new instance when the failover occurs. I'd bet it's rare, but they're being explicit about it.

The Basic Tier probably only uses one instance, so they just replace it in place.

trhway · on May 9, 2018

What would be the point of downtime they mention other than doing the sync?

hinkley · on May 9, 2018

From the docs:

   Basic Tier: Provides a standalone Redis instance. Use this tier for applications that require a simple Redis cache.

If you were only reallocating replicas you'd expect no downtime. But turning on replication isn't seamless.

panarky · on May 9, 2018

Their post isn't very clear on persistence, here's how I read it.

1) Yes, they're using Redis persistence (RDB, AOF, or a Google-written persistence layer, it's not specified).

2) Basic Tier scales by tearing down the smaller Redis instance and migrating the data (???) to a larger instance. This causes downtime, but not lost data.

3) Standard Tier migrates using replication during scaling, so there's little downtime but if data is in flight at cutover, it could be lost.

4) And "coming soon" you can migrate data in and out of Cloud Memorystore by importing and exporting RDB files.

gopalashok · on May 9, 2018

Full disclosure, I'm the product manager for Cloud Memorystore

1) We are not using persistence at all in beta, this is something we are working towards

2) Scaling of basic tier will result in a full cache flush and will cause downtime during the scaling process.

3) That's correct

4) Yes we will be making Import-Export of RDB very soon

sitkack · on May 9, 2018

> 1) We are not using persistence at all in beta, this is something we are working towards

Persistence is a key feature of Redis, it isn't just a fancy memcached. This severely limits the applicability of this offering.

jkaplowitz · on May 9, 2018

So does the beta label. :) Both will change in time, from what the PM said.

dannyw · on May 10, 2018

It’s like releasing a beta for DNS that doesn’t support IPv6 or CNAMES.

rkangel · on May 10, 2018

Exactly. It will be still be useful to some people, it allows them to get something working end-to-end, and then iterate on it to provide the features for more people.

sitkack · on May 11, 2018

Not providing persistence is setting people up to lose data. It is the wrong move.

russell_h · on May 9, 2018

This is a great looking product, thanks for following up here.

Can you speak to what kind of durability you expect to achieve once persistence is in place? Is it reasonable to treat this as a durable datastore and simply have a disaster recovery plan in place? Or should applications be prepared to handle occasional minor data loss?

hamandcheese · on May 9, 2018

> We are not using persistence at all in beta

Maybe don’t advertise it as a “data store” then.

jkaplowitz · on May 9, 2018

Transient caches are still a useful type of data store for better performance... That said, they're noting it as a goal and a beta limitation, so they're not satisfied with pure transience either.

hamandcheese · on May 10, 2018

That’s fine, but it should be made way more obvious what this is and what it isn’t. This goes against expectations.

antonvs · on May 10, 2018

Its name is Cloud MemoryStore, not Cloud Diskstore. That should set expectations right there.

hamandcheese · on May 10, 2018

They use the term “data store” multiple times, not to mention:

> and features like persistence, replication and pub-sub.

(Customer testimonial) > We have used Redis on everything from storing asynchronous task queues for tens of thousands of CPUs to a centralized persisted key-value pair store for the feature vectors output by our ML models

Additionally, “persistence” isn’t mentioned at all in the “Coming soon to Cloud Memorystore” section.

I’m not trying to be a pedant, I just think it’s dangerous that it’s not made abundantly clear that there is no persistence.

dudus · on May 9, 2018

Why do you think they are using persistence at all, and why do you think Basic Tier doesn't have data loss on scaling?

I read it differently. When they say full cache flush doesn't that imply full database flush, I was thinking all data gets wiped on scaling operation.

nodesocket · on May 9, 2018

Agree I don't think it is persistent. "a full cache flush" represents all data loss to me.

panarky · on May 9, 2018

Could be I read this one wrong.

There's a Redis command called FLUSHALL that deletes all the data.

https://redis.io/commands/flushall

dudus · on May 9, 2018

Wouldn't that also imply full data loss on server restarts?

tgtweak · on May 10, 2018

Redis persistance is not built or intended to be a write guarantee, but more of a safeguard against cold cache after a restart or failure.

It looks as if free tier has persistence turned off by default and that standard tier uses snapshotting/rdb to replicate.

manigandham · on May 10, 2018

That's exactly what it's built for, to save data. It's just configurable in terms of how and when its persisted.

zodvik · on May 10, 2018

Would AOF persistence as 'everysec' not imply a write guarantee of max 1 second data loss for restart?

tgtweak · on May 12, 2018

No, this is not a write guarantee.

You can still lose an acknowledged write.

ihsw2 · on May 9, 2018

On the surface it looks like this isn't a true-blue Redis server (or even a fork of Redis) but support for the Redis protocol bolted onto an in-house memory-based caching product.

bradhe · on May 9, 2018

> Does that mean that right now there are no persistance options with this Managed Redis cluster?

It's a memory store.

amazingman · on May 9, 2018

That's not really an answer to the question. You may not agree with it, but there are many orgs out there using Redis as a persistent datastore. If Google wants their money, they're going to need persistence options.

bradhe · on May 9, 2018

It doesn’t guarantee it’s redis. It guarantee’s it’s redis compatible. The use case they’re (rather obviously) optimizing for is an in-memory cache.

panarky · on May 9, 2018

The three big reasons I use Redis are:

1) Data structures like lists, hashmaps and sets

2) Persistence

3) Blindingly fast

I wanted to see if Google's implementation sacrifices any speed.

snip

EDIT: I deleted the benchmark results from this post because they're meaningless. I used the "intrinsic latency" tool [0] that Redis provides, but it must be run on the server.

So my results only reflected the intrinsic latency of the client VM, not the new Cloud Memorystore.

My apologies.

[0] https://redis.io/topics/latency

fusl · on May 9, 2018

Your benchmarks:

Google's new Cloud Memorystore, 1.0 GB:

    redis-cli -h 10.0.0.3 --intrinsic-latency 100
    677411416 total runs in 100 seconds
    avg latency: 0.1476 microseconds / 147.62 nanoseconds per run

Open-source Redis on a Google Cloud Platform micro instance, 0.6 GB:

    redis-cli -h localhost --intrinsic-latency 100
    353427208 total runs in 100 seconds
    avg latency: 0.2829 microseconds / 282.94 nanoseconds per run

Open-source Redis on an AWS EC2 micro instance, 1.0 GB:

    redis-cli -h localhost --intrinsic-latency 100
    21681751 total runs in 100 seconds
    avg latency: 4.6122 microseconds / 4612.17 nanoseconds per run

My benchmark:

Open-source Redis on a Hetzner Cloud VPS, CX11 (92% cheaper than Google's Cloud Memorystore):

    redisbench-client:~# redis-cli -h 88.99.124.195 --intrinsic-latency 100
    Max latency so far: 1 microseconds.
    Max latency so far: 77 microseconds.
    Max latency so far: 113 microseconds.
    Max latency so far: 130 microseconds.
    Max latency so far: 2562 microseconds.
    Max latency so far: 2835 microseconds.
    Max latency so far: 4165 microseconds.
    Max latency so far: 5497 microseconds.

    757281326 total runs (avg latency: 0.1321 microseconds / 132.05 nanoseconds per run).
    Worst run took 41628x longer than the average latency.

antirez · on May 9, 2018

Unfortunately intrinsic latency does not measure the latency of the Redis instance, but the one of the whole host, that is, the kernel scheduler max latency.

manigandham · on May 9, 2018

Micro instances are shared CPU cores, not very useful for measuring latency in a single-threaded application like Redis.

detaro · on May 9, 2018

A more interesting comparison might be an n1-standard-1, since that's roughly the same price as the 1 GB Memorystore (but has more memory)

sabareesh · on May 9, 2018

This is not fair to compare with micro instance

douglasfshearer · on May 9, 2018

It would be fairer to compare with an AWS ElastiCache Redis instance [0].

[0] https://aws.amazon.com/elasticache/redis/

caleblloyd · on May 9, 2018

Why is it that GCP can provide an internal IP address for Cloud Memorystore but not for Google Cloud SQL? It would be beneficial for these teams to work together, no internal IP in cloud SQL makes it much less appealing then AWS RDS.

laixer · on May 9, 2018

Private IP support for Cloud SQL is on the roadmap. The reality of development is that some things are easier to do on a new product. It's not a matter of communication.

neuland · on May 9, 2018

> authorized networks ensure that the Redis instance is accessible only when connected to the authorized VPC network

That's good that this is handled smoothly as it's the way Redis wants auth to work.

I've always seen the security story to be one of the big weaknesses of Redis. No TLS, opt-in password with no user and only one shared password, no per-database privileges, etc.

Does anyone know the status of TLS in Redis? I heard somewhere that Amazon has a patch to add that directly, rather than having to use stunnel.

kozziollek · on May 9, 2018

It's being worked in in pull request: https://github.com/antirez/redis/pull/4855

Also, antirez recently met with patch author: https://news.ycombinator.com/item?id=16943289

neuland · on May 9, 2018

Cool! I saw that thread last week, but didn't check back in to see that antirez responded.

jchw · on May 9, 2018

This is exciting, but I really hope that Terraform will add support for it in reasonable time. These days I'd prefer to not manually manage cloud resources if I don't have to.

danawillow · on May 9, 2018

Hey jchw, Dana from GCP here. No promises on the timeline, but I can tell you that adding a request in the issue tracker (https://github.com/terraform-providers/terraform-provider-go...) will at least put it on our radar so we start working on it sooner.

dimitropoulos · on May 10, 2018

done: https://github.com/terraform-providers/terraform-provider-go...

tecleandor · on May 9, 2018

Hashicorp doesn't seem to like Google Cloud too much. I've been moving to Google Cloud Builder and it's Jinja and Python templates, which I happen to be a little familiar with...

dantiberian · on May 9, 2018

I’m not sure why you’d say that Hashicorp doesn’t like Google Cloud. I can think of recent partnerships between Hashicorp and Google on Vault and Terraform that are both substantial integrations. Google Cloud and Hashicorp both have people that work on the GCP Terraform Provider, and they do an excellent job keeping it up to date. By all external measures this looks like a great relationship.

jkaplowitz · on May 9, 2018

Google Cloud Deployment Manager, you mean? Yeah, that is linked tightly with Google's API system which means it has the fastest support for new things.

Google does (last I checked) put some paid time into systems like Terraform so that they support GCP well, but there's always going to be more of a lag, especially before GA.

I don't know Terraform well enough to know how big the lag is for new AWS features, but some lag probably exists even there despite AWS's dominant market share.

_asummers · on May 10, 2018

There's a bunch of undocumented GCB features by the way. Pop into their Slack and they'll often give you tips that haven't made their way into the docs yet. We've been mostly happy with it, but still use Jenkins as the kickoff point.

samstave · on May 9, 2018

Plz do a write-up.

EOM

outworlder · on May 9, 2018

https://github.com/terraform-providers/terraform-provider-go...

"This provider plugin is maintained by:

    The Google Cloud Graphite Team at Google
    The Terraform team at HashiCorp

"

So Google can also help in this regard.

poooogles · on May 9, 2018

AFAIK the policy is will be added once it's GA?

dimitropoulos · on May 10, 2018

If postgres on cloud sql is any indication, then it will only be another year and a half! :) They seem to take the Beta/GA process very seriously and don't rush things to GA.

jkaplowitz · on May 9, 2018

Sooner would be helpful to people who want to test the service while it's in beta in anticipation of relying on it in production after GA.

Do you know if they document their Terraform timeline policy for GCP anywhere?

Elect2 · on May 9, 2018

What are the benefits of using this compare to self-hosted Redis on a Compute instance?

_0nac · on May 9, 2018

It's fully managed, so you don't need to worry about OS security patches, rotating logs and all the other administration that comes with running your own server. Basically the same benefits as using Cloud SQL/Aurora RDS vs maintaining your own MySQL server.

Disclaimer: I work at GCP, but not on Memorystore.

kraftman · on May 9, 2018

Not having to deal with failover and scaling is also handy.

anonfunction · on May 9, 2018

Is it managed redis or something google made themselves that has the same API? I wonder if they will keep up with all the new stuff in redis like streams.

itcmcgrath · on May 9, 2018

It's the open source version of Redis v3.2.11

anonfunction · on May 9, 2018

Thanks!

PRJ2999 · on May 18, 2018

I was pretty excited when Memorystore came out, as there is very little I enjoy less than managing redis servers. However, I've had nothing but trouble since switching. Applications than ran smoothly on redis instances deployed on kubernetes engine are giving me all sorts of problems now.

"NOREPLICAS Not enough good slaves to write" errors have been extremely common, sometimes followed by problems connecting. Then operations on the memorystore instance start pending (I mean "repairing"), taking 20+ minutes. During that time I obviously get connection refused errors. This usually comes with huge spikes in network in/out that are completely unexplained by the app pointing to the service.

I thought the pitch for MemoryStore was that it was a managed service; y'know, less time on devops and all of that. I've found the exact opposite to be the case. Pay more to Google, spend more time on devops, get a redis service that doesn't work.

edf13 · on May 9, 2018

Biggest concern I have about all these hosted Redis replacements is that of latency.

I use Redis on a highly controlled internal network and also local for its speed... This goes when moving to a hosted solution.

jkaplowitz · on May 9, 2018

If you're already accessing it from within GCP, their network is excellent, not much lower latency than a high-quality internal physical network (under an order of magnitude) and highly flexible and configurable. In that use case it should be comparable to any solution you can host within GCP except hosting on the same instance as each client and using loopback access.

For accessing from outside GCP, yeah the broader internet latency would figure into this as any other external solution.

edf13 · on May 9, 2018

It depends on what you mean by not much lower latency though...

At Redis speeds the network latency can quite often be the bottleneck.

jkaplowitz · on May 9, 2018

Double digits of microseconds of difference as of some time last year, if I remember correctly - it might have gotten even better since then since I know Google wanted to narrow that gap.

(Disclaimer: While I have in the past worked for the GCP team, nothing in this comment relates to my time at Google or to info I learned then.)

nodesocket · on May 9, 2018

If Redis Cluster support is coming soon, how are they currently handing cross region replication and failover? Is Redis Sentinel (painful and requires client support) deprecated now?

ssambros · on May 9, 2018

> how are they currently handing cross region replication and failover?

We do not, unfortunately. Redis service is regional and only failover from one zone to another within a region is provided.

--

I am an engineer on Memorystore Redis team.

pulkitsh1234 · on May 9, 2018

It seems like they might be running Redis Sentinels (for handling failover and replication), but exposing it outside as a single IP, instead of giving a list of sentinels to connect to.

Applications might use that endpoint like a standalone Redis instance. Requests to that endpoint will be routed to the current master.

Effectively abstracting the details about the Sentinels and their configuration from the clients.

Not too sure, just a guess.

Arqu · on May 9, 2018

I was hoping to see something of a more flexible pricing like per 100 MB provisioned based on the far fetched assumption that it is not just a managed redis(like?) offering but a fully abstracted managed redis service where you are just charged by GB/s used as with lots of other services.

This just boils down to the same cost as self hosting minus the self hosting hassle. So not too bad as its just easier to use + no vendor lock in since it is redis compatible at a similar price point.

meteor333 · on May 9, 2018

how does this compare with Redis Labs?

manigandham · on May 9, 2018

Redis Labs (the official sponsor of Redis now) uses a custom proxy layer on top of Redis, giving you the latest 4.0 version with support for modules, no downtime and features like storing the values of keys on SSDs. They also support cross-region multi-master using CRDTs.

If you need VPC/internal access only, or are fine with the lower level of features, or just want to consolidate on GCP services, then Cloud Memorystore can work, but otherwise I'd recommend Redis Labs.

gingerlime · on May 9, 2018

Unfortunately redis labs isn’t available in all regions of GCloud. I was effectively begging them to add support in Europe a couple of months ago.

sandGorgon · on May 10, 2018

same here for India datacenter of AWS or GCloud

nodesocket · on May 9, 2018

It is in your GCP account, and thus VPC. Has a private ip address.

merb · on May 9, 2018

how can a instance with 1gb cost 35 USD per month... also I would've hoped that google won't add tiers, more like "limits" and that if I only host 1mb on my instance that I would only pay for 1mb... besides that I could scale to XGb (depending on my limit)...

GordonS · on May 9, 2018

I totally get the benefits of a managed service, but...$35/month for 1GB is just silly.

dsymonds · on May 9, 2018

Are you sure you fully understand the benefit?

$35 is, what, 30 minutes of an engineer's time from a company's perspective? If the management of the service would take any more than that every month (and it easily could!), you're ahead.

GordonS · on May 10, 2018

Not everyone works in the Bay making boat-loads of cash. My comment was from the perspective of a small business in Europe.

dsymonds · on May 12, 2018

Even then, I think it'd be unlikely that the time-cost wouldn't make it worthwhile. People generally underestimate how much time gets soaked up by managing a computer service.

Can_Not · on May 12, 2018

$35 seems to be 3x more cost than AWS's smallest tier (no further comparison of features/value beyond price), so it does seem expensive for an introductory project, considering a single $35~ Digital Ocean VPS might all some companies need their first year. On the other hand, the number of times I've seen an engineer suggest spending $500-$2000 in labor/hours (when already behind on mission critical work) as alternative to spending $5/month in some SaaS is just staggering.

f311a · on May 9, 2018

Can someone explain use cases of using remote redis? Doesn't network delay is a big downside?

monocasa · on May 9, 2018

In a datacenter, a network request to a server that has the information in DRAM can be faster than pulling it locally off of an SSD. It goes back and forth with perf gains on either side, but they're on the same order with different tradeoffs.

ryanworl · on May 9, 2018

Especially if that "local SSD" is really a virtualized block device over a network anyway, which is how most people are deploying their cloud stuff for convenience.

qeternity · on May 9, 2018

Is this still the case? We don’t use much cloud from the big boys but I thought even they were moving towards local storage. All of our OVH instances are local SSD backed.

ryanworl · on May 9, 2018

Both are available and AWS and GCP are pretty up-front about what you should use depending on the use case. The virtualized devices do have advantages in terms of being able to be moved around and not tied to the life of a single instance.

dominotw · on May 9, 2018

what about local rocksDB database that state stores use in kafka streams. Something like that would be best of both worlds.

tybit · on May 9, 2018

It depends on the use case.

eg if you have multiple web servers and a lot to keep cached then it’s more economical to have one machine with a lot of memory than each web server.

Another would be keeping the cache consistent, a single instance of Redis offers atomic operations and strong consistency. E.g ensuring rate limiting is correctly applied across a service’s instances.

Another would be decoupling state from apps which is especially important for serverless/FaaS where app memory is frequently cleared.

If latency is an issue it can be mostly resolved by using a 2 tier cache, an in memory cache that is backed by a Redis cache. Redis pub sub can then be used to keep the in memory caches in sync. Stackoverflow is a good example of this architecture.

ssambros · on May 9, 2018

It is intended to be accessed by resources running in the same Google Cloud zone/region, so the network latency is minimal.

peter_retief · on May 9, 2018

Great concept, I am having a serious look