An orthogonal migration issue which I'm hitting right now: we need to migrate fr...

bastawhiz · on Sept 30, 2021

I did that exact migration. Unfortunately, to my knowledge, there's no way to do it with zero downtime. You need to make your app read only until the RDS instance has ingested your data, then you can cut over. For me, that was roughly one gigabyte of data and took about forty seconds.

My best advice is to automate the whole thing. You can automate it with the Heroku and AWS CLIs. Test on your staging site until you can run through the whole process end to end a few times with no interruptions.

oauea · on Sept 30, 2021

Yep, absolutely garbage that these clouds (Azure is another one) don't allow you to replicate with external systems. Pretty much devalues their entire hosted postgresql offering if you ask me, since it's just designed to keep you locked in (duh).

If you have any significant amount of data where you're worried about a migration, stay far away from hosted postgres offerings. You'll never get your data out without significant downtime.

barrkel · on Sept 30, 2021

There are other ways to handle this at the application level, to be clear, using dual read & write and backfill. More relevant when you have TB+++ of data.

tomhallett · on Sept 30, 2021

Interesting. I've done dual-writes at the application level to migrate the datastore for a smaller feature (branch by abstraction), but never for an entire application. And the code path was quite simple, so it was easy to think about all of the edge cases at one time in your head.

Do you have any resources which talk through the read/write/backfill approach?

Here's what I found so far: * https://medium.com/google-cloud/online-database-migration-by... * https://aws.amazon.com/blogs/architecture/middleware-assiste...

bastawhiz · on Oct 1, 2021

Jumping in again... Your post reminded me that I actually typed my migration up!

https://mattbasta.medium.com/migrating-from-heroku-to-aws-6d...

Hopefully it's somewhat helpful!

tomhallett · on Sept 30, 2021

Thank you for this - extremely helpful in validating the current approach and de-risking the developer time.

laurent92 · on Sept 30, 2021

So, basically, Postgres would have a replication port which can be used for both replication/clustering and transfer across cloud providers. And sharding. </dreaming>

bastawhiz · on Oct 1, 2021

I mean, it essentially does. Heroku's managed postgres has it disabled.

craigkerstiens · on Sept 30, 2021

We've moved a number of customers from Heroku over to Crunchy Bridge with essentially no down time, am currently helping one customer with 7TB through that process. It's not over to RDS, but would be happy to talk through process if helpful. And we do support logical replication and have many people using wal2json/logical replication with us.

zozbot234 · on Sept 30, 2021

> Why the migration is required? Heroku postgres doesn't support logical replication

You could possibly hack together some form of higher-layer logical replication via postgres_fdw and database triggers. A comment ITT references this as a known technique.

sidmitra · on Sept 30, 2021

One possible solution for the ETL stuff might be to use Heroku Kafka for the Change Data Capture and then from that Kafka you can move it someplace else.

See https://blog.heroku.com/streaming-data-connectors-beta Heroku's own Kafka seems to have slightly more native support than if you use a 3rd party like Confluence.

We've not yet tried any of this, but it's been bookmarked as a possible solution to explore.

tomhallett · on Sept 30, 2021

Interesting that you bring this up. I looked into heroku's streaming connectors to facilitate an integration with materialize.com, but Heroku's support team wasn't confident we could sync all 187 postgres tables under 1 connection.

I thought about using Debezium and Kafka to roll my own micro-batch ETL solution, but listening to this podcast made me walk away slowly: https://www.dataengineeringpodcast.com/datacoral-change-data...

gunnarmorling · on Sept 30, 2021

Interesting, what was it from that podcast that made you reconsider? Always eager to learn about opportunities for improving the experience of using Debezium.

Disclaimer: I work on Debezium

tomhallett · on Sept 30, 2021

Oh wow, by "work on" you mean "the core maintainer of". Thank you for replying. :)

The main part I reconsidered based on was the level of effort taking the data from kafka and landing into snowflake, especially around handle postgres schema changes safely. I also have no experience with kafka, so I'd be out of my depth's pretty quickly for a critical part of the architecture. He also expressed the need for building quality checks into the kafka to snowflake code, but those details were a bit sparse (if i recall correctly).

Note: all of the above are probably outside the scope of debezium. :)

Note 2: your article [1] on using cdc to build audit logs w/ a "transactions" table blew my mind. Once I listened to your data engineering podcast interview [2], I knew there was some implementation of "event sourcing lite w/ a crud app" possible, so I was excited to see you had already laid it out.

1) https://debezium.io/blog/2019/10/01/audit-logs-with-change-d...

2) https://www.dataengineeringpodcast.com/debezium-change-data-...

gunnarmorling · on Sept 30, 2021

Gotcha, yeah, there's many things to consider indeed when setting up end-to-end pipelines. Thanks for the nice feedback, so happy to hear those resources are useful for folks. As far as event sourcing is concerned, we got another post [1] which might be interesting to you, discussing how "true ES" compares to CDC, pros/cons of either approach, etc.

[1] https://debezium.io/blog/2020/02/10/event-sourcing-vs-cdc/

another · on Oct 1, 2021

We faced this migration, too. My sympathies.

Adding to your list of options that still require _some_ downtime: we used Bucardo [0] in lieu of logical replication. It was a bit of a pain, since Bucardo has some rough edges, but we made it work. Database was ~2 TiB.

[0] https://bucardo.org/

tomc1985 · on Sept 30, 2021

When you subscribe to managed services instead of running the software yourself, these are the kinds of trade-offs that get made

smileysteve · on Sept 30, 2021

logical replication, but this is one of the walls that heroku creates.

tomhallett · on Sept 30, 2021

Coming from the outside, with zero understanding of the internal details, my hunch is the same: lack of support for logical replication is more of a business decision than a technical decision. (But again, this a hunch -- partially based on how good heroku is from a technical perspective)

oauea · on Sept 30, 2021

It's absolutely an evil business decision, and all the clouds are playing this game. Don't ever use a hosted database solution if you're thinking about storing any significant amount of data. You will not be able to get it out without downtime.

coded · on Oct 2, 2021

It looks like gcp supports logical replication now: https://cloud.google.com/blog/products/databases/you-can-now...

oauea · on Oct 2, 2021

The big question is: Can you enter arbitrary IP addresses, or do you have to replicate to another GCP instance? Azure does the latter.

coded · on Oct 3, 2021

Yes you can enter an arbitrary IP: https://cloud.google.com/sql/docs/postgres/replication/confi...