A Technical Dive into PostgreSQL's replication mechanisms

smartbit · on Jan 11, 2024

I learned Postgres replication by studying PostgreSQL 16 Administration Cookbook by Simon Riggs et al

- Chap 11 Backup and Recovery

- Chap 12 Replication and Upgrades

Highly recommended!

https://learning.oreilly.com/library/view/-/9781835460580/ or https://www.packtpub.com/product/postgresql-16-administratio...

apnew · on Jan 11, 2024

Thanks for the book recommendations!

anarazel · on Jan 11, 2024

> It even assists in PostgreSQL’s implementation of Multiversion Concurrency Control (MVCC) - the WAL keeps a version history of data changes,

That's not really correct - postgres' MVCC implementation doesn't read from the WAL. Sure, row changes are WAL logged, but that's not really related to MVCC.

jacobprall · on Jan 11, 2024

That's an excellent point. My statement was unnecessarily confusing. I've changed it to simply reference another benefit of the WAL - optimizing I/O operations. That would make a good blog post in and of itself :D

cpursley · on Jan 11, 2024

Good stuff, I'm a little obsessed with Postgres replication and the WAL (write ahead log).

If you're an Elixir user, you might find my library for subscribing to Postgres WAL events useful: https://github.com/cpursley/walex

It's a lot easier to operate than the typical debezium setup (which is what I think Airbyte uses behind the scenes).

I need to write a guide on how to use WalEx with Neon.

jacobprall · on Jan 11, 2024

Looks cool. I appreciate the support for all replica identity settings!

jacobprall · on Jan 11, 2024

This is a guide to logical replication in Postgres where I break down some of the internal components of the database to explain CDC. If you've ever wondered how WAL buffers work, or what happens when a transaction is executed, check it out!

cowthulhu · on Jan 11, 2024

Sortof unrelated, I've been looking at moving some data from SQL Server to Postgres, and one of the big reasons is replication. SQL Server replication has been super brittle for me - it's always silently choking, getting desynchronized, or exhibiting weird locking behavior with no indication of the issue until you notice something downstream is broken. It's been tough to test Postgres replication though, since a lot these issues only occur at huge volumes of data. Anyone have any experience with the two they can pass on?

staticlibs · on Jan 11, 2024

> moving some data from SQL Server to Postgres

I don't have any first-hand experience with Postgres replication to share, just, when moving DB from MSSQL, Babelfish extensions for Postgres (https://babelfishpg.org/) may be of interest.

cryptonector · on Jan 11, 2024

PG logical replication is rock solid. One annoyance is that you can't subscribe to a publication but using a different schema name, say.

egnehots · on Jan 11, 2024

There was recently a very interesting overview of the different distributed PostgreSQL architectures:

https://www.crunchydata.com/blog/an-overview-of-distributed-...

qianli_cs · on Jan 11, 2024

Is it possible to run some user-defined functions (e.g., to perform some transformations) on the subscriber side? It'll be super useful when the external data source is not identical to the source.

robertlagrant · on Jan 11, 2024

It is a cool idea to use Neon and Airbyte together, as database and pushing to analytics is a classic expensive-only use case.

I don't know how expensive this would get, of course.

jacobprall · on Jan 11, 2024

Should be relatively affordable for modest deployments - I would categorize Neon and Airbyte as "bang-for-your-buck" products (Airbyte vs Fivetran, + Neon's solid free tier).

alfor · on Jan 11, 2024

Unrelated to the post:

Is there realtime features with postgres?

It seem like a kludge to have to add redis, mqtt, or kafka to our application to get things as they change.

devbug · on Jan 11, 2024

You can LISTEN/NOTIFY. Or you can use logical replication and a custom subscriber.[1] Supabase uses the latter.[2]

[1]: https://www.postgresql.org/docs/current/logical-replication....

[2]: https://github.com/supabase/realtime

cryptonector · on Jan 11, 2024

One caveat about LISTEN/NOTIFY is that channels are not first-class objects, so there's no authorization associated with them, thus anyone who can login can also NOTIFY any payload to any channel.

mritchie712 · on Jan 11, 2024

we use this feature in Supabase, works great.

dventimi · on Jan 11, 2024

Supabase has arguably a better alternative, which uses logical replication and can be used outside of Supabase.

https://github.com/supabase/realtime

jacobprall · on Jan 11, 2024

Supabase realtime (especially if you want a managed backend) or other streaming CDC setups (like Decodable, which is Flink/Debezium under the hood) are also great choices for logical replication. Streaming tech will continue to get more cost-effective and simpler to implement in the coming year(s).

I should note: I haven't used Decodable in production yet, I'm just a fan of Flink :)

defaultcompany · on Jan 11, 2024

There is NOTIFY [1] that might do what you are thinking of.

[1] https://www.postgresql.org/docs/current/sql-notify.html

cpursley · on Jan 11, 2024

Yes, WalEx.

Recently added the concepts of Destinations - where you can just configure it to send database change events to an Elixir module, webhook or EventRelay (the later two don't require Elixir know-how).

https://github.com/cpursley/walex?tab=readme-ov-file#destina...

jacobprall · on Jan 12, 2024

Youre right, real-time is generally achieved by streaming CDC data with logical replication via a connector server to Kafka (Debezium does this) or a different queue to be picked up by consumers downstream.

Supabase does a great job of abstracting that part away with their real-time feature set

klysm · on Jan 11, 2024

Logical replication is an immensely powerful for integrating backend data.

the_doctah · on Jan 12, 2024

What does that even mean?