Hacker News new | past | comments | ask | show | jobs | submit login
A Technical Dive into PostgreSQL's replication mechanisms (airbyte.com)
210 points by jacobprall on Jan 11, 2024 | hide | past | favorite | 25 comments



I learned Postgres replication by studying PostgreSQL 16 Administration Cookbook by Simon Riggs et al

- Chap 11 Backup and Recovery

- Chap 12 Replication and Upgrades

Highly recommended!

https://learning.oreilly.com/library/view/-/9781835460580/ or https://www.packtpub.com/product/postgresql-16-administratio...


Thanks for the book recommendations!


> It even assists in PostgreSQL’s implementation of Multiversion Concurrency Control (MVCC) - the WAL keeps a version history of data changes,

That's not really correct - postgres' MVCC implementation doesn't read from the WAL. Sure, row changes are WAL logged, but that's not really related to MVCC.


That's an excellent point. My statement was unnecessarily confusing. I've changed it to simply reference another benefit of the WAL - optimizing I/O operations. That would make a good blog post in and of itself :D


Good stuff, I'm a little obsessed with Postgres replication and the WAL (write ahead log).

If you're an Elixir user, you might find my library for subscribing to Postgres WAL events useful: https://github.com/cpursley/walex

It's a lot easier to operate than the typical debezium setup (which is what I think Airbyte uses behind the scenes).

I need to write a guide on how to use WalEx with Neon.


Looks cool. I appreciate the support for all replica identity settings!


This is a guide to logical replication in Postgres where I break down some of the internal components of the database to explain CDC. If you've ever wondered how WAL buffers work, or what happens when a transaction is executed, check it out!


Sortof unrelated, I've been looking at moving some data from SQL Server to Postgres, and one of the big reasons is replication. SQL Server replication has been super brittle for me - it's always silently choking, getting desynchronized, or exhibiting weird locking behavior with no indication of the issue until you notice something downstream is broken. It's been tough to test Postgres replication though, since a lot these issues only occur at huge volumes of data. Anyone have any experience with the two they can pass on?


> moving some data from SQL Server to Postgres

I don't have any first-hand experience with Postgres replication to share, just, when moving DB from MSSQL, Babelfish extensions for Postgres (https://babelfishpg.org/) may be of interest.


PG logical replication is rock solid. One annoyance is that you can't subscribe to a publication but using a different schema name, say.


There was recently a very interesting overview of the different distributed PostgreSQL architectures:

https://www.crunchydata.com/blog/an-overview-of-distributed-...


Is it possible to run some user-defined functions (e.g., to perform some transformations) on the subscriber side? It'll be super useful when the external data source is not identical to the source.


It is a cool idea to use Neon and Airbyte together, as database and pushing to analytics is a classic expensive-only use case.

I don't know how expensive this would get, of course.


Should be relatively affordable for modest deployments - I would categorize Neon and Airbyte as "bang-for-your-buck" products (Airbyte vs Fivetran, + Neon's solid free tier).


Unrelated to the post:

Is there realtime features with postgres?

It seem like a kludge to have to add redis, mqtt, or kafka to our application to get things as they change.


You can LISTEN/NOTIFY. Or you can use logical replication and a custom subscriber.[1] Supabase uses the latter.[2]

[1]: https://www.postgresql.org/docs/current/logical-replication....

[2]: https://github.com/supabase/realtime


One caveat about LISTEN/NOTIFY is that channels are not first-class objects, so there's no authorization associated with them, thus anyone who can login can also NOTIFY any payload to any channel.


we use this feature in Supabase, works great.


Supabase has arguably a better alternative, which uses logical replication and can be used outside of Supabase.

https://github.com/supabase/realtime


Supabase realtime (especially if you want a managed backend) or other streaming CDC setups (like Decodable, which is Flink/Debezium under the hood) are also great choices for logical replication. Streaming tech will continue to get more cost-effective and simpler to implement in the coming year(s).

I should note: I haven't used Decodable in production yet, I'm just a fan of Flink :)


There is NOTIFY [1] that might do what you are thinking of.

[1] https://www.postgresql.org/docs/current/sql-notify.html


Yes, WalEx.

Recently added the concepts of Destinations - where you can just configure it to send database change events to an Elixir module, webhook or EventRelay (the later two don't require Elixir know-how).

https://github.com/cpursley/walex?tab=readme-ov-file#destina...


Youre right, real-time is generally achieved by streaming CDC data with logical replication via a connector server to Kafka (Debezium does this) or a different queue to be picked up by consumers downstream.

Supabase does a great job of abstracting that part away with their real-time feature set


Logical replication is an immensely powerful for integrating backend data.


What does that even mean?




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: