Since there seems to be some confusion in the comments about why pg_query chose Protobufs in the first place, let me add some context as the original author of pg_query (but not involved with PgDog, though Lev has shared this work by email beforehand).
The initial motivation for developing pg_query was for pganalyze, where we use it to parse queries extracted from Postgres, to find the referenced tables, and these days also rewrite and format queries. That use case runs in the background, and as such is much less performance critical.
pg_query actually initially used a JSON format for the parse output (AST), but we changed that to Protobuf a few major releases ago, because Protobuf makes it easy to have typed bindings in the different languages we support (Ruby, Go, Rust, Python, etc). Alternatives (e.g. using FFI directly) make sense for Rust, but would require a lot of maintained glue code for other languages.
All that said, I'm supportive of Lev's effort here, and we'll add some additional functions (see [0]) in the libpg_query library to make using it directly (i.e. via FFI) easier. But I don't see Protobuf going away, because in non-performance critical cases, it is more ergonomic across the different bindings.
1) When do pages get removed? (file on disk gets smaller)
Regular vacuum can truncate the tail of a table if those pages at the end are fully empty. That may or may not happen in a typical workload, and Postgres isn't particular about placing new entries in earlier pages. Otherwise you do need a VACUUM FULL/pg_squeeze.
2) Does a regular VACUUM rearrange a single page when it works on it? (i.e. remove empty pockets of data within an 8kb page, which I think the author calls compacting)
I think the answer to that is yes, e.g. when looking at the Postgres docs on page layout [0] the following sentence stands out: "Because an item identifier is never moved until it is freed, its index can be used on a long-term basis to reference an item, even when the item itself is moved around on the page to compact free space". That means things like HOT pruning can occur without breaking index references (which modify the versions of the tuple on the same page, but keep the item identifier in the same place), but (I think) during VACUUM, even breaking index references is allowed when cleaning up dead item identifiers.
The article has a section where it estimates index bloat based on comparing the number of index reltuples * 40 bytes (?), compared to the size of the file on disk.
This is problematic, first of all because I don't think the math is right (see [0] for a more comprehensive query that takes into account column sizes), and second because it ignores the effects of B-Tree index deduplication in Postgres 13+: [1]
In my experience, fast bloat estimation queries can work okay for table bloat, but for index bloat I'd recommend instead looking at the change in page density over time (i.e. track relpages divided by reltuples), or just go direct to running pgstatindex outside business hours.
Every time Postgres advice says to “schedule [important maintenance] during low traffic period” (OP) or “outside business hours”, it reinforces my sense that it’s not suitable for performance-sensitive data path on a 24/7/365 service and I’m not sure it really aims to be. (To be fair, running it like that for several years and desperately trying to make it work also gave me that feeling. But I’m kind of aghast that necessary operational maintenance still carries these caveats.)
> Every time Postgres advice says to “schedule [important maintenance] during low traffic period” (OP) or “outside business hours”, it reinforces my sense that it’s not suitable for performance-sensitive data path on a 24/7/365 service and I’m not sure it really aims to be.
It's a question of resource margins. If you have regular and predictable windows of low resource utilization, you can afford to run closer to the sun during busy periods, deferring (and amortizing, to some degree) maintenance costs till later. If you have a 24/7/365 service, you need considerably higher safety margins.
Also, there's a lot of terrible advice on the internet, if you haven't noticed.
> (To be fair, running it like that for several years and desperately trying to make it work also gave me that feeling. But I’m kind of aghast that necessary operational maintenance still carries these caveats.)
To be fair, I find oxides' continual low-info griping against postgres a bit tedious. There's plenty weaknesses in postgres, but criticizing postgres based on 10+ year old experiences of running an, at the time, outdated postgres, on an outdated OS is just ... not useful? Like, would it useful to criticize oxides lack of production hardware availability in 2021 or so?
What you describe is true and very important (more margin lets you weather more disruption), but it's not the whole story. The problem we had was queueing delays mainly due to I/O contention. The disks had the extra IOPS for the maintenance operation, but the resulting latency for all operations was higher. This meant overall throughput decreased when the maintenance was going on. The customer, finally accepting the problem, thought: "we'll just build enough extra shards to account for the degradation". But it just doesn't work like that. If the degradation is 30%, and you reduce the steady-state load on the database by 30%, that doesn't change the fact that when the maintenance is ongoing, even if the disks have the IOPS for the extra load, latency goes up. Throughput will still degrade. What they wanted was predictability but we just couldn't give that to them.
> To be fair, I find oxides' continual low-info griping against postgres a bit tedious. There's plenty weaknesses in postgres, but criticizing postgres based on 10+ year old experiences of running an, at the time, outdated postgres, on an outdated OS is just ... not useful?
First, although I work at Oxide, please don't think I speak for Oxide. None of this happened at Oxide. It informed some of the choices we made at Oxide and we've talked about that publicly. I try to remember to include the caveat that this information is very dated (and I made that edit immediately after my initial comment above).
I admit that some of this has been hard for me personally to let go. These issues dominated my professional life for three very stressful years. For most of that time (and several years earlier), the community members we reached out to were very dismissive, saying either these weren't problems, or they were known problems and we were wrong for not avoiding them, etc. And we certainly did make mistakes! But many of those problems were later acknowledged by the community. And many have been improved -- which is great! What remains is me feeling triggered when it feels like users' pain is being casually dismissed.
I'm sorry I let my crankiness slip into the comment above. I try to leave out the emotional baggage. Nonetheless, I do feel like it's a problem that, intentionally or otherwise, a lot of the user base has absorbed the idea that it's okay for necessary database maintenance to significantly degrade performance because folks will have some downtime in which to run it.*
> First, although I work at Oxide, please don't think I speak for Oxide. None of this happened at Oxide. It informed some of the choices we made at Oxide and we've talked about that publicly. I try to remember to include the caveat that this information is very dated (and I made that edit immediately after my initial comment above).
I said oxide, because it's come up so frequently and at such length on the oxide podcast... Without that I probably wouldn't have commented here. It's one thing to comment on bad experiences, but at this point it feels like more like bashing. And I feel like an open source focused company should treat other folks working on open source with a bit more, idk, respect (not quite the right word, but I can't come up with a better one right now).
I probably shouldn't have commented on this here. But I read the message after just having spent a Sunday morning looking into a problem and I guess that made more thin skinned than usual.
> For most of that time (and several years earlier), the community members we reached out to were very dismissive, saying either these weren't problems, or they were known problems and we were wrong for not avoiding them, etc.
I agree that the wider community sometimes has/had the issue of excusing away postgres problems. While I try to avoid doing that, I certainly have fallen prey to that myself.
Leaving fandom like stuff aside, there's an aspect of having been told over and over we're doing xyz wrong and things would never work that way, and succeeding (to some degree) regardless. While ignoring some common wisdom has been advantageous, I think there's also plenty where we just have been high on our own supply.
> What remains is me feeling triggered when it feels like users' pain is being casually dismissed.
I don't agree that we have been "bashing" Postgres. As far as I can tell, Postgres has come up a very small number of times over the years: certainly on the CockroachDB episode[0] (where our experience with Postgres is germane, as it was very much guiding our process for finding a database for Oxide) and then again this year when we talked about our use of statemaps on a Rust async issue[1] (where our experience with Postgres was again relevant because it in part motivated the work that we had used to develop the tooling that we again used on the Rust issue).
I (we?) think Postgres is incredibly important, and I think we have properly contextualized our use of it. Moreover, I think it is unfair to simply deny us our significant experience with Postgres because it was not unequivocally positive -- or to dismiss us recounting some really difficult times with the system as "bashing" it. Part of being a consequential system is that people will have experience with it; if one views recounting that experience as showing insufficient "respect" to its developers, it will have the effect of discouraging transparency rather than learning from it.
I'm certainly very biased (having worked on postgres for way too long), so it's entirely plausible that I've over-observed and over-analyzed the criticism, leading to my description.
> I (we?) think Postgres is incredibly important, and I think we have properly contextualized our use of it. Moreover, I think it is unfair to simply deny us our significant experience with Postgres because it was not unequivocally positive -- or to dismiss us recounting some really difficult times with the system as "bashing" it. Part of being a consequential system is that people will have experience with it; if one views recounting that experience as showing insufficient "respect" to its developers, it will have the effect of discouraging transparency rather than learning from it.
I agree that criticism is important and worthwhile! It's helpful though if it's at least somewhat actionable. We can't travel back in time to fix the problems you had in the early 2010s... My experience of the criticism of the last years from the "oxide corner" was that it sometimes felt somewhat unrelated to the context and to today's postgres.
> if one views recounting that experience as showing insufficient "respect" to its developers
I should really have come up with a better word, but I'm still blanking on choosing a really apt word, even though I know it exists. I could try to blame ESL for it, but I can't come up with a good German word for it either... Maybe "goodwill". Basically believing that the other party is trying to do the right thing.
>> What remains is me feeling triggered when it feels like users' pain is being casually dismissed.
> Was that done in this thread?
Well, I raised a general problem around 24/7/365 use cases (rooted in my operational experience, reinforced by the more-current words that I was replying to and the OP) and you called it "tedious", "low-info griping". Yes, that seems pretty dismissive.
(Is it fair? Though I thought the podcast episodes were fairly specific, they probably glossed over details. They weren't intended to be about those issues per se. I did write a pretty detailed post though:
https://www.davepacheco.net/blog/2024/challenges-deploying-p...
(Note the prominent caveat at the top about the experience being dated.))
You also wrote:
> running an, at the time, outdated postgres, on an outdated OS
Yes, pointing to the fact that the software is old and the OS is unusual (it was never outdated; it was just not Linux) are common ways to quickly dismiss users' problems. If the problems had been fixed in newer versions, that'd be one thing. Many (if not all) of them hadn't been. But also: the reason we were running an old version was precisely that it was a 24/7/365 service and there was no way to update databases without downtime, especially replicated ones, nor a great way to mitigate risk (e.g., a mode for running the new software without updating the on-disk format so that you can go back if it's a disaster). This should be seen as a signal of the problem, not a reason to dismiss it (as I feel like you're doing here). As for the OS, I can only think of one major issue we hit that was OS-specific. (We did make a major misconfiguration related to the filesystem that certainly made many of our issues much worse.)
I get that it sucks to keep hearing about problems from years ago. All of this was on 9.2 - 9.6 -- certainly ancient today. When this comes up, I try to balance sharing my operational experience with the fact that it's dated by just explaining that it's dated. After all, all experience is dated. Readers can ignore it if they want, do some research, or folks in the PostgreSQL world can update me when specific things are no longer a problem. That's how I learned that the single-threaded WAL receiver had been updated, apparently in part because of our work: https://x.com/MengTangmu/status/1828665449850294518 (full thread: https://x.com/MengTangmu/status/1828665439234474350). I'll happily share these updates wherever I would otherwise share my gripes!
Regarding pgstattuple specifically: If this was a 24/7/365 service and you would be concerned by the I/O impact of loading the full table or index at any time, you could run this on a replica too. For tables there is pgstattuple_approx which is much better at managing its impact, but there is no equivalent for indexes today.
The REINDEX CONCURRENTLY mentioned in OP could also be run at other times of the day - the main issue is again I/O impact (with potentially some locking concerns at the very end of the reindex concurrently to swap out the index).
There are no magic solutions here - other databases have to deal with the same practical limitations, though Postgres sometimes is a bit slow to adopt operational best practices in core (e.g. the mentioned pg_squeeze from OP may finally get an in-core "REPACK CONCURRENTLY" equivalent in Postgres 19, but its been a long time to get there)
Yep, I find cloud storage performance to be quite frustrating, but its the reality for many production database deployments I've seen.
Its worth noting that even on really fast local NVMe drives the new asynchronous I/O work delivers performance benefits, since its so much more efficient at issuing I/Os and reducing syscall overhead (for io_uring).
Andres Freund (one of the principal authors of the new functionality) did a lot of benchmarking on local NVMe drives during development. Here is one mailinglist thread I could find that shows a 2x and better benefit with the patch set at the time: https://www.postgresql.org/message-id/flat/uvrtrknj4kdytuboi...
It depends on the I/O method - as described in the article, "io_uring" is only available on Linux (and requires building with liburing, as well as io_uring to be enabled in the Kernel), but the default (as of beta1) is actually "worker", which works on any operating system.
The "worker" method uses a dedicated pool of I/O worker processes that run in the background, and whilst not as performant as io_uring in our benchmark, did clearly outperform the "sync" method (which is the same as what Postgres currently has in 17 and older).
Yes, but what's your point? It's not that we can take this new Postgres version and just compile it in a Windows machine supporting IORing, can we? It requires some effort and time -many times by volunteers- and given that most Postgres deployments are running under Linux, it's understandable that the developers focus on that platform first.
Yeah, as one of the main authors of libpg_query, I think the primary things that make this easier is that Postgres has good abstractions internally, and the parser works independently from other parts (e.g. the community discourages adding settings that affect parser behavior).
Over the years we've only had to maintain a small set of patches on top of the Postgres source [0], together with some mocks and our libclang-based extraction logic [1]. Of course it would be nice if Postgres just packaged this directly like it packages the client libraries, but there is non-trivial effort involved to do that. From what I recall, the main issue is that error handling and memory allocations work differently in the client-side libraries (and so that would have to either also be moved out of the backend source, or use some other abstraction).
Unrelated to your comment but a big fan of pganalyze. Makes Postgres infinitely more grokable for developers who aren’t experts at running and tuning a database. Keep up the good work!
Thanks, glad to hear! I like to think that one of the reasons pganalyze is a good product (though there are always parts I'd like to improve, and feedback is always welcome) is because we like to use it ourselves to optimize our own database, where we can :)
At pganalyze, we're redefining how developers optimize one of the world's most popular databases, PostgreSQL. Our software gives companies like Atlassian, Robinhood, and Notion the tools to solve their most complex Postgres performance challenges, ensuring their mission-critical applications run smoothly at scale.
For our Solutions Engineering role you will directly work with our VP of Sales, as well as our Founder & CEO (me!) on strategic deals, and develop the playbook for integrating pganalyze in complex customer environments. You will also actively contribute to both our documentation and the product itself. Our team is cross-functional, and our goal is to improve the pganalyze product where we can, to allow repeated delivery of a great customer experience.
We are looking for candidates with a strong background in working on Linux VMs in the cloud or on-premise, and with Kubernetes or containers. Go experience is a big plus, since our agent (the pganalyze collector) is written in Go, and this role may involve making the collector work in different environments, and improving cloud provider API integrations.
At pganalyze, we're redefining how developers optimize one of the world's most popular databases, PostgreSQL. Our software gives companies like Atlassian, Robinhood, and Notion the tools to solve their most complex Postgres performance challenges, ensuring their mission-critical applications run smoothly at scale.
For our Product Engineering role you will be responsible for driving feature development from concept to writing code to deployment. Sometimes you might have to dig deep into Postgres internals to explain information correctly in a visual way. Recent examples of work done by our product engineers include the launch of our Query Tuning Workbooks feature, our Buffer Cache Monitoring functionality, or our integration with Plan Statistics provided by some Postgres-as-a-service providers.
At pganalyze, we're redefining how developers optimize one of the world's most popular databases, PostgreSQL. Our software gives companies like Atlassian, Robinhood, and Notion the tools to solve their most complex Postgres performance challenges, ensuring their mission-critical applications run smoothly at scale.
For our Product Engineering role you will be responsible for driving feature development from concept to writing code to deployment. Sometimes you might have to dig deep into Postgres internals to explain information correctly in a visual way. Recent examples of work done by our product engineers include the launch of our Query Tuning Workbooks feature ([0]), our Buffer Cache Monitoring functionality, or our integration with Plan Statistics provided by some Postgres-as-a-service providers.
For our Solutions Engineering role you will directly work with our VP of Sales, as well as our Founder & CEO (me!) on strategic deals, and develop the playbook for integrating pganalyze in complex customer environments. Our team is cross-functional, and our goal is to improve the pganalyze product where we can, to allow repeated delivery of a great customer experience.
In my understanding it was a timing issue with the UUIDv7 RFC not being finalized before the Postgres 17 feature freeze in early April. Shouldn't be an issue to get this in for Postgres 18, I think.
The initial motivation for developing pg_query was for pganalyze, where we use it to parse queries extracted from Postgres, to find the referenced tables, and these days also rewrite and format queries. That use case runs in the background, and as such is much less performance critical.
pg_query actually initially used a JSON format for the parse output (AST), but we changed that to Protobuf a few major releases ago, because Protobuf makes it easy to have typed bindings in the different languages we support (Ruby, Go, Rust, Python, etc). Alternatives (e.g. using FFI directly) make sense for Rust, but would require a lot of maintained glue code for other languages.
All that said, I'm supportive of Lev's effort here, and we'll add some additional functions (see [0]) in the libpg_query library to make using it directly (i.e. via FFI) easier. But I don't see Protobuf going away, because in non-performance critical cases, it is more ergonomic across the different bindings.
[0]: https://github.com/pganalyze/libpg_query/pull/321
reply