We use Hikari [1] an in-process connection pooler. We didn't opt for pgbouncer at al, because we didn't want to add the extra infra yet.
> since what they did in code can be natively done with PgBouncer, PgCat, et al.
Can you point me to a reference I could look at, about doing a major version upgrade with PgBouncer et al? My understanding is that we would still need to write a script to switch masters, similar to what we wrote.
> The big bottleneck is all the active connections
The active connections we were referring too were websocket connections; we haven't had problems with PG connections.
Right now the algorithm we use to find affected queries and notify websockets starts to falter when the number of active websocket connections on one machine get too high. We're working on improving it in the coming weeks.
I updated the footnote to clarify that it was about websocket connections.
> I did feel for them here:
Thank you! That part was definitely the most frustrating.
I’m not sure about a reference, other than their docs [0]. Basically, you’d modify the config to point to the new servers, issue PAUSE to PgBouncer to gracefully drain connections, then RELOAD to pick up the new config, then RESUME to accept new traffic.
This would result in client errors while paused, though, so perhaps not quite the same. To me, a few seconds of downtime is fine, but everyone has their own opinions. EDIT: you could of course also modify your client code (if it doesn’t already) to gracefully retry connections, which would effectively make this zero downtime.
ProxySQL (which I think now supports Postgres) has a global delay option where you can effectively make clients think that the query is just taking a long time; meanwhile, you can do the same sequence as outlined.
If you had HA Bouncers (which hopefully you would), you could cheat a little as you eluded to in the post, and have one still allow read queries to hit the old DB while cutting over writes on the other one, so the impact wouldn’t be as large.
> you’d modify the config to point to the new servers, issue PAUSE to PgBouncer to gracefully drain connections, then RELOAD to pick up the new config, then RESUME to accept new traffic.
The function we wrote effectively executes these steps [1]. I think it would look similar if we had used PgBouncer. I could see it be an option though if we couldn't scale down to "one big machine".
> This would result in client errors while paused, though, so perhaps not quite the same.
What? Docs say:
> New client connections to a paused database will wait until RESUME is called.
Which fits what I remember when I was testing pgbouncer as part of automatic failover ages ago, if the connection from pgbouncer to the database dropped it would block until it reconnected without the app erroring.
I stand corrected! It may also depend on the application itself, timeouts, etc. I’ve seen errors before when doing this, but now that I think about it, it was on the order of a handful of connections out of thousands, so it was probably poor client handling, or something else.
I thin he means already established connections, but not sure.
Edit: not true, actually. PAUSE will wait for the connections to be released (disconnected in session pooling, transaction ended in transaction pooling...)
We use Hikari [1] an in-process connection pooler. We didn't opt for pgbouncer at al, because we didn't want to add the extra infra yet.
> since what they did in code can be natively done with PgBouncer, PgCat, et al.
Can you point me to a reference I could look at, about doing a major version upgrade with PgBouncer et al? My understanding is that we would still need to write a script to switch masters, similar to what we wrote.
> The big bottleneck is all the active connections
The active connections we were referring too were websocket connections; we haven't had problems with PG connections.
Right now the algorithm we use to find affected queries and notify websockets starts to falter when the number of active websocket connections on one machine get too high. We're working on improving it in the coming weeks.
I updated the footnote to clarify that it was about websocket connections.
> I did feel for them here:
Thank you! That part was definitely the most frustrating.
[1] https://github.com/brettwooldridge/HikariCP