> It even assists in PostgreSQL’s implementation of Multiversion Concurrency Control (MVCC) - the WAL keeps a version history of data changes,
That's not really correct - postgres' MVCC implementation doesn't read from the WAL. Sure, row changes are WAL logged, but that's not really related to MVCC.
That's an excellent point. My statement was unnecessarily confusing. I've changed it to simply reference another benefit of the WAL - optimizing I/O operations. That would make a good blog post in and of itself :D
This is a guide to logical replication in Postgres where I break down some of the internal components of the database to explain CDC. If you've ever wondered how WAL buffers work, or what happens when a transaction is executed, check it out!
Sortof unrelated, I've been looking at moving some data from SQL Server to Postgres, and one of the big reasons is replication. SQL Server replication has been super brittle for me - it's always silently choking, getting desynchronized, or exhibiting weird locking behavior with no indication of the issue until you notice something downstream is broken. It's been tough to test Postgres replication though, since a lot these issues only occur at huge volumes of data. Anyone have any experience with the two they can pass on?
I don't have any first-hand experience with Postgres replication to share, just, when moving DB from MSSQL, Babelfish extensions for Postgres (https://babelfishpg.org/) may be of interest.
Is it possible to run some user-defined functions (e.g., to perform some transformations) on the subscriber side? It'll be super useful when the external data source is not identical to the source.
Should be relatively affordable for modest deployments - I would categorize Neon and Airbyte as "bang-for-your-buck" products (Airbyte vs Fivetran, + Neon's solid free tier).
One caveat about LISTEN/NOTIFY is that channels are not first-class objects, so there's no authorization associated with them, thus anyone who can login can also NOTIFY any payload to any channel.
Supabase realtime (especially if you want a managed backend) or other streaming CDC setups (like Decodable, which is Flink/Debezium under the hood) are also great choices for logical replication. Streaming tech will continue to get more cost-effective and simpler to implement in the coming year(s).
I should note: I haven't used Decodable in production yet, I'm just a fan of Flink :)
Recently added the concepts of Destinations - where you can just configure it to send database change events to an Elixir module, webhook or EventRelay (the later two don't require Elixir know-how).
Youre right, real-time is generally achieved by streaming CDC data with logical replication via a connector server to Kafka (Debezium does this) or a different queue to be picked up by consumers downstream.
Supabase does a great job of abstracting that part away with their real-time feature set
- Chap 11 Backup and Recovery
- Chap 12 Replication and Upgrades
Highly recommended!
https://learning.oreilly.com/library/view/-/9781835460580/ or https://www.packtpub.com/product/postgresql-16-administratio...