Glad to see that they are adding features but I wish the pace was faster. We have been using Cloud SQL for Postgres and overall it is good but there are a number of glaring and frustrating feature holes. The two top of mind for me are:
1) No way to force SSL connections without enforcing two-way SSL (which is a huge pain and not supported by all the clients we use). This is literally just a Postgres config option but they don’t expose it. RDS has had this feature since 2016.
2) No in place upgrade. This is again a feature built in to Postgres and RDS has had it for years. Instead the upgrade story for Cloud SQL is cumbersome and involves setting up a new instance, creating and loading backups, etc.
We switched to Cloud SQL from running our own Postgres and it is a huge improvement, but the feature set is disappointing compared to RDS
Also, if you use Cloud SQL in HA mode, it may still go down randomly for up to 90s with no warning or entry in the operation log, and this is considered expected behaviour.
Here is a direct quote from google support when we contacted them about our database going down outside of our scheduled maintenance window:
> As I mentioned previously remember that the maintenance window is preferred but there are time-sensitive maintenance events that are considered quite important such as this one which is a Live migration. Most maintenance events should be reflected on the operations logs but there are a few other maintenance events such as this one that are more on the infrastructure side that appear transparent to clients because of the nature of the changes made to the Google managed Compute Engine that host the instances, this is a necessary step for maintaining the managed infrastructure. For this reason this maintenance does not appear visible in your logs or on the platform.
Here "transparent to clients" means that the database is completely inaccessible for up to 90s. Furthermore, because there's no entry in the operation log, there's no way to detect if the database is down because of "expected maintenance", or because of some other issue without talking to a human at google support: so really great if you're woken up in the middle of the night because your database is down, and you're trying to figure out what happened...
That’s troubling. In fairness, when I last used RDS (2018) we had 9 databases running and we averaged about one database failover per month, with about 2-3 minutes of downtime per incident. I never got a satisfactory answer from support other than that this was a thing that sometimes happened.
To be clear: there is no failover happening, even though we do have a failover instance. If there was, we could at least detect that something had happened after the fact!
The Cloud SQL failover only occurs in certain circumstances, and in all our time using Cloud SQL the failover has not once kicked in automatically (despite many outages).
In fact, one of our earliest support issues was that the "manual failover" button was disabled when any sort of operation was occuring on the datbase, making it almost completely useless! Luckily this issue at least was fixed.
It sounds like the machine that the DB VM was running on was being taken offline or restarted for maintenance. The default behaviour for GCE is to live-migrate the VM to another machine. So I guess Cloud SQL uses the default here. (It may well be the best option as failover isn't instant either.) Live migration is usually much faster than 90s at well but if you are making heavy RAM updates that could definitely slow it down.
Either way I agree that some full-stack integration is needed on GCPs part to at least get that into the maitnaince log. It would also be nice to make most of these happen during the maitnaince window but IIUC they don't always have 24h notice of a machine reboot.
Live migration is when they move you to another system.
They have to do that when hardware fails (if you're lucky), but that it's happening so often suggests it's part of regular software maintenance or something like that. Which is pretty unacceptable to me.
how significant is 1 if you are using cloud sql proxy? My understanding is that the proxy tunnels traffic over an encrypted connection, so there is no benefit to adding an extra layer.
cloud sql proxy is secure and that is what all our developers use for local access to the database.
However, we have some third party data analysis tools (such as Tableau) that also connect to one of our databases. They are hosted in their own clouds and have to connect over the databases’s public IP address and can’t use cloud_sql_proxy. I of course manually confirmed that these connections use SSL but I would feel much more comfortable if I could enforce it from our end.
Also, it enables a really cool pattern of change data capture, which allows you to capture "normal" changes to your Postgres database as events that can be fed to e.g. Kafka and power an event-driven/CQRS system. https://www.confluent.io/blog/bottled-water-real-time-integr... is a 2015 post describing the pattern well; the modern tool that replaces Bottled Water is https://debezium.io/ . For instance, if you have a "last_updated_by" column in your tables that's respected by all your applications, this becomes a more-or-less-free audit log, or at the very least something that you can use to spot-check that your audit logging system is capturing everything it should be!
When you're building and debugging systems that combine trusted human inputs, untrusted human inputs, results from machine learning, and results from external databases, all related to the same entity in your business logic (and who isn't doing all of these things, these days!), having this kind of replayable event capture is invaluable. If you value observability of how your distributed system evolves within the context of a single request, tracking a datum as it evolves over time is the logical (heh) evolution of that need.
Where I work we use debezium to enable “transactional publishing” to kinesis.
Not only does it allow publishing events as part of an ordinary database transaction, it also provides a nice buffer if kinesis ever goes down. During the Nov. 2020 kinesis outage, our services using this mechanism kept chugging with no immediate issues.
Similarly - Supabase uses Elixir to listen to Postgres changes via logical replication. Pretty neat pattern and Elixir/Erlang is especially good at this sort of thing:
We get data into BigQuery from pg using a custom airflow plugin but recently started doing something similar on AWS with DMS to get data from postgres -> redshift.
DMS is awesome. No code is the best code. Big query is great but not having a “snap-your-fingers and the data is there” connector makes it a PITA for me to maintain.
I tend to utilize bigquery's external connections and scheduled queries to periodically clone my "hot" PG data to an BQ table for long term storage and analysis.
It seems so much easier to go PG=>BQ than the other way around.
I also use the external data connector, the problem is when you have two projects in different locations. Then I go for psycopg2 and uploat it to bigquery using bigquery magic lines
Why the Parquet step? You should be able to do straight Debezium -> Kafka -> BQ, using the BQ sink connector for Kafka Connect (https://github.com/confluentinc/kafka-connect-bigquery); we have users using this with the Debezium MySQL connector, I'd expect this to work equally for Postgres.
Google just launched the Datastream product in beta, which looks very cool. I find it curious that it doesn't support Postgres at all, even as they're launching the logical replication feature for Cloud SQL.
In fact, the Datastream documentation has a diagram showing Postgres as a source, as well as custom sources — but, disappointingly, neither is supported. Only Oracle and MySQL are supported.
1) No way to force SSL connections without enforcing two-way SSL (which is a huge pain and not supported by all the clients we use). This is literally just a Postgres config option but they don’t expose it. RDS has had this feature since 2016.
2) No in place upgrade. This is again a feature built in to Postgres and RDS has had it for years. Instead the upgrade story for Cloud SQL is cumbersome and involves setting up a new instance, creating and loading backups, etc.
We switched to Cloud SQL from running our own Postgres and it is a huge improvement, but the feature set is disappointing compared to RDS