Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Working around a case where the Postgres planner is “not very smart” (heap.io)
65 points by kmdupree on Aug 2, 2021 | hide | past | favorite | 59 comments


I confess that 'type' sounds to me like the sort of thing that's going to appear in a WHERE clause often enough that I might as well make an enum for it and materialise that out on insert whether by trigger or otherwise.

Neat trick for the meantime though, I'm sure making a schema change like the one I'm thinking of would be a colossal pain in the buttocks if decided late in the run up to shipping a feature.


I'm a huge, huge fan of JSONB columns in Postgres, but I'd agree here. Anything that is consistent and important enough should be pulled out of JSON and put into a real column. Postgres has some impressive functionality to query JSON, but it's not the same as with real columns and you run into performance issues much, much quicker.


2c: in most cases, a Postgres index is (performance wise) identical to a "real column" just without the name, and indeed you have more control over the data structure for an index, and indexes can include lossy compression (e.g. partial indexes) unlike a real column...

(obviously, the challenge is getting pg to use your index for a given query... for that, I like to use VIEWs to help query authors write queries that use the indexes)


Yeah. This wasn't the case a while back--PostgreSQL was missing support for what I think people call covered indexes, and so would only use the index to filter a set of candidate rows and then still had to go digging around in the real tuple to verify it matched--but I am pretty sure they changed that a while back.


You're exactly right that our schema could be better and that it's non-trivial to execute on a schema change, especially because we actually run a distributed Postgres cluster via Citus AND we use a special sharding method that we manage manually.

We actually just started working towards how we might do a schema change for the 1+ million shards in our cluster. Hopefully, we'll be able to write up some learnings on the schema change after its done. :)


Right, your current solution is what I'd classify as an "expedient field hack" if I were the one implementing it, and I do mean that positively :D

I look forwards to comparing notes when that post comes out.


A long time ago, our team tried to implement basically a versioned time-series key-value store (ie, you have many keys with daily values, and in addition to key and value it also stored an "as-of" date, so that you could override or correct a value for a specific value date by specifying a newer "as-of" date).

To query a value for a given date, you'd specify key and value date, and it would then give you the value with the latest "as-of" date. Almost always, there'd only be one "as-of" date, and rarely, when a correction had happened, there'd be two, extremely rarely three.

Now, the SQL query then was some sort of group_by and join where as_of = max over the group of as_of. Conceptually, just find the key and date, then pick the last as_of (if there are more than one). But the stupid engine did this massive self-join with a hash-join, and it was terribly slow. The conceptually simple idea turned out to be a disaster.

Made me realise

a) how little I know about DBs.

b) that you can get a correct solution in a language by understanding the language a bit; but to obtain an efficient solution, you need to understand the language rather well.

(Note: the SO question below is just about this topic, and I assume (hope) there are performant solutions to this nowadays. (I worked on this back when Microsoft SQL Server had just introduced an XML datatype for structured unstructured columns...)

https://stackoverflow.com/questions/121387/fetch-the-row-whi... )


Standard sql:

    select key, value, asof
    from (
      select *, row_number() over (partition by key order by asof desc) as rn
      from <table>
    ) t
    where rn = 1
postgresql:

    select distinct on (key)
    key, value, asof
    from <table>
    order by key, asof desc
The DISTINCT ON clause takes the first row that the query returns, which the ORDER BY makes sure is the latest value

Disclaimer: I've used both methods, but haven't tested the perf against each other or the self-join.


Timescaledb adds a custom scan for speeding up distinct on out of the box. Just install extension, no extra work is needed.

https://blog.timescale.com/blog/how-we-made-distinct-queries...

I tried it and it really is that fast.


Fork adding it to postgresql if anyone's interested: https://github.com/jesperpedersen/postgres/tree/indexskipsca...

This is cool, thanks! Bit of a limitation that it only works on a single DISTINCT ON column, hopefully they'll be able to extend it to more in the future. And hope it makes it in for everyone that's on stock postgresql!


Yeah, indeed there's that limitation of a single distinct column. But it works for 80% cases, and it does not require any changes, so it's good. Looks like the earliest time proper implementation could hit mainline is 1.5years away.


We tried the same thing a few years ago in MySQL that was randomly non-performant: 4000x slower. Still waiting for that to be fixed.


Window functions can help here. You can try a self-join with MySQL (and it won't do a hash join because it doesn't have it; it didn't have hash joins at all before 8.10 and it still doesn't have hash outer joins, and what you want to eliminate the older rows is an anti-join, i.e. asserting null on a left join) but honestly you're better off sorting by timestamp and using a manual window when processing the results.


In articles like these, I wish they would go a little deeper and says why the system does what it does. Perhaps the default behavior is geared towards low-memory servers, or avoids disk writes, or the data is usually expected to be of a certain type.

Perhaps the system "isn't very smart" like it says, or it was originally intended for other purposes.


I definitely agree with you, but in this case the Postgres team itself frames the suboptimal behavior as such:

> However, PostgreSQL's planner is currently not very smart about such cases.

(From https://www.postgresql.org/docs/10/indexes-index-only-scans....)

I suspect this is just a rough edge. Many types of functions may not be consistent, and therefore would need to be re-executed on the original data. Perhaps some of the planner functionality here predates function notation like `IMMUTABLE`.


I don't know the internals here, but I think I can make a plausible guess here. The index-only scan is generally triggered only when every column in your query is contained in the relevant index. In this case the index is a functional index, so it's not only plain columns in there. The function needs the data column, but the functional index already stores the result of the function. To the query planner it looks like the query needs the data column, while it actually doesn't need it.

This is a bit of an uncommon case that would need special handling in the query planner. So it's not that odd that it is not implemented yet, though I would suspect that it'll be added at some point.


I have an experience with MS SQL Server, so can only comment about that. And regarding performance, mainly with old 2008R2 version.

The query planner isn't very clever at times. "At times" actually makes it worse, because performance bugs surface sometimes.

Anyways, the result is multiple orders of magnitude difference. What takes 0.3-2ms suddenly takes hundreds of milliseconds or even seconds to complete. Multiply that with millions of execution count and there is a problem.

Sometimes SQL Server WILL NOT CHOOSE covering index (with many include columns), because it also evaluates index size. And if SQL thinks that some seek on clustering index is specific enough for parameter value A, then it is disaster with parameter B.

Good thing SQL server features plan guides, where you can tell server which index to use or provide other hints. Saves the world when dealing with 3rd party applications.

The stuff you have to do deal with your db grows as row count grows. If configured well, can support billions of rows, as we see from this post.


> Good thing SQL server features plan guides

If only Postgres maintainers got this message.


I remember working on a Sybase 11.5 database that basically required forcing of indexes for many tables because the query planner would always go off the rails. We had to run a script in production that would print out the query plan of the currently active queries that were taking forever. Without being able to force indexes, I have no idea how we'd of fixed the problem.


I like Postgres but this aspect is one of its shortcomings. In Postgres you usually have to rework your query in weird ways and hope that it's possible to get the optimizer to make the right choice. Even then, due to its usage of statistics, it could switch plans at any time as your data grows/changes.


FYI one method is to use CTEs (WITH) to force Postgres via materialization.

https://www.postgresql.org/docs/current/queries-with.html


Looks like we could use a "hint" ;-)


The worst part is database performance goes from great to sucky with no in between. Once a query's plan make it fall off a cliff, you really need the hints (forced indexes) that SQL Server provides to get the plan back to acceptable. Even updating the statistics tends not to fix the problem.


I'm not a dba, and I admit having years of experience on Sql Server and only few recent months with postgres, and I know it's not a popular opinion, but so far, generally speaking, including admin stuff, I found Sql Server to be superior. It has its problems, of course, but with Sql Server I found better tooling and OOTB experience, easier maintenance, more features easier to use. E.g. do something like sp_help or sp_helptext exists in postgres? Real and simple case insensitive search, even just on a single comparison in a query? Real collations? Page and row data compression (in Sql Server you can obtain good perf improvement enabling compression thanks to fewer IO ops)? Column encryption? Query store, with the possibility to choose a plan? Graphical exec plan? A true sql language with, e.g., variable declaration in a statement (DO postgres is a hack, not a solution) outside of function or stores procedure? Manual Vacuum, really? Never heard of in sql server (shrink is different, vacuum equivalent is an automatic background job). Sql Server is quite predictable, or maybe it's me that I know it much more than postgres.

However, there are many postgres feature really missing in Sql Server, or added only in recent versions (consider that many clients are still using Sql Server 2012, a few even 2008R2!), idempotent DDL is much easier on postgres than Sql Server older than v2016, json support is waaaay better, many more useful data type...

I know nothing about replicas, clusters, failover etc, those have always been managed by proper DBAs.


And yet Postgres still refuses to add hints.

I'd rather deal with a nonexistent query optimizer, like the one in Clickhouse, than with an insufficiently smart one that I can't control.


Just about when I was leaving [financial-service company] our team was considering that the Postgres query planner was a significant source of risk, and contemplating mitigations and alternatives.

We'd optimize things so that on our busy day each month, query A would take 2 minutes, and query-set B would take 30 or so in total, and we'd have nice graphs to track trends over time. But every so often, the query planner would change its mind about how to do some part of the operation. You'd be looking at query A suddenly taking 2+ hours, while query-set B hadn't even started yet. In the worst cases, someone would have to get on the phone with our banking partner, and ask for an extended deadline tonight.

Business-critical? The job in question was the business, literally the operation customers were paying for (in combination with a quick template-fill and SFTP, anyway).

It was particularly hard to nail down because the plans would depend on the specific customers in question, and the transactions they were doing that day.


It's so odd because Postgres generally does everything else so well and is generally very customizable and modular otherwise. Yet the maintainers totally scoff at the idea of query hints even in the face of so many examples of planning failures.

Their arguments are asinine too (from https://wiki.postgresql.org/wiki/OptimizerHintsDiscussion):

* Poor application code maintainability: hints in queries require massive refactoring.

* Interference with upgrades: today's helpful hints become anti-performance after an upgrade.

* Encouraging bad DBA habits slap a hint on instead of figuring out the real issue.

* Does not scale with data size: the hint that's right when a table is small is likely to be wrong when it gets larger.

* Failure to actually improve query performance: most of the time, the optimizer is actually right.

* Interfering with improving the query planner: people who use hints seldom report the query problem to the project.

Literally all of these are expressions of condescension and distrust of their own users. Anyone who would use hinting would be using it to solve a real issue caused by the shortcomings of the planner -- that much should be self-evident.


Ironically, the underlying cause of the query planner failures is a defect in the statistics collection architecture -- it doesn't scale to large databases in some cases. An argument of "does not scale with data size" applies to the query planner.


Is it possible to at least lock-in a query plan with Postgres?

I understand the philosophy behind wanting to keep the queries themselves as declarative as possible, but there should be some way to prevent a really important query from randomly going rogue and using a stupid index, just because something completely unrelated changed somewhere else in the database.


Not AFAIK. I know MSSQL has a way, but cursory googling suggests that Postgres does not.


I agree - it's probably not nice/elegant/whatever to use hints, but my opinion is that a query planner cannot probably cover 100% of all possible combinations of layouts/queries/data => not having the possibility to use hints is for me just a huge risk.

Additionally in PROD environments you have the "time"-factor about fixing something; often you cannot afford a week of deep analysis trying to understand why the planner does not work as expected respectively what changed about the data that made it think in a different way etc... .

Having said this, I just installed PostgreSQL last week, hehe. I did it only after somebody on HN mentioned the extension "pg_hint_plan" ( https://pghintplan.osdn.jp/pg_hint_plan.html + https://pghintplan.osdn.jp/hint_list.html ) => downloaded & built & installed it on Debian 10 for PostgreSQL 13 => it seems to work (but so far I did only few initial tests). But it's still a bit a risk as it's an external dependency.


Yes, the Postgres query planner can break in ways that are operationally catastrophic in large systems. Many companies have experienced this.

The root cause is that the statistics collection architecture of Postgres that feeds the query optimizer has some deep design flaws. Under some conditions it will produce statistical models of the data that are badly skewed in random ways, which causes the query optimizer which uses those statistics to make erratic and incorrect decisions.

Fixing the statistics collection would be a massive undertaking, the issue is architectural in nature. Adding hints would provide a reasonable workaround and is probably easier in terms of development complexity than fixing the statistics collection.

It is possible to reduce the probability of these bugs by reorganizing the data. The idea of changing your data model to work around a database bug is pretty horrific but companies do it.


Presumably some commercial databases like MS SQL Server and Oracle are better at this? I'd love to know how, and in what way the Postgres architecture is inferior.


This issue is specific to Postgres. Like I stated, it is design flaw in the statistics collection architecture. I would be surprised if many other databases have a similar defect. Most experienced database engine designers would immediately identify the approach as defective; there is even a comment in the Postgres source code that suggests it.

All old database kernels have architectural limitations that emerge because there is no realistic way to anticipate distant future workloads and hardware, and it is nearly impossible to materially modify the architecture in practice.


No, I understand that. I'm asking about how, specifically, Postgres' statistics architecture differs from those other databases.


The statistical models in Postgres are stored as transactional rows in an ordinary table, which puts some pretty hard limits on what you can put into those models because Postgres doesn’t like large rows. There is no theoretical reason you’d want to design things this way, it is just pretty expedient because you don’t have to implement a separate statistics store and you can leverage all of the code for normal tables. IIRC, the fail safe code path puts a hard limit on the amount of statistical data that can be collected no matter how much data you should collect, so that it plays nicely with the Postgres storage model. This threshold is uncomfortably low for proper statistics collection purposes when your database becomes large.

The saving grace is that for some common data distributions, the degraded statistical models are still pretty close to representative of the data.

However, some data distributions badly expose the fact that the statistical models were improperly constructed to fit resource limits. In these cases, the statistical model is unpredictable and erratic, and will change every time the statistics collection is run even if the data does not change.

We are not talking about an extraordinary amount of statistics data. I would expect databases to have a special structure optimized for this purpose rather than using an OLTP row. The Postgres approach to statistics collection was reasonable a couple decades ago, but modern workloads expose it more frequently now.

FWIW, I still use Postgres a fair amount because it is very solid within its limits. It is the reason I am familiar with its sharp edges.


Not really. I had Oracle refuse to use the index written exactly for the query in question (select id from contracts where auth_status = 'U', with index on auth_status) because the planner decided that the index was useless (the statistics were gathered at night when the table contained no unauthorized records). But in Oracle you can use hints to force the planner to use the index, while in Postgres you can't.


I've never heard about it, thanks for writing about this issue. Are there some other resources that cover this problem in more detail?


we're running 'ANALYZE table' with default_statistics_target set in the connection to the max reasonable value for the dataset size every 4 hours to avoid just that. also, the default checklist item for sudden slowness of postgres is a manual ANALYZE with a bumped statistics target.


I haven't done much database work in recent years, but this was always one of my frustrations. I'd take predictability over max performance any day.


A hint won't make the query planner use an index column it doesn't know it can use, though.


It does tell the query planner it can use it though.


"Doesn't know it can use" in this context means the system probably doesn't know how to use it, even if a hint suggests it should.


Many times it's not that, or specifically that type of behavior doesn't cause the weird 'something randomly blows up' type problems. It's that it is using index size vs on disk size or something for specific query values and deciding to do something kind of dumb, like a full range scan, because it isn't taking into account the disk data is on... well disk, and the index is already loaded into memory because of other queries.

Semi-locking it to known good indexes or whatever, or at least overriding it's attempt to be smart, helps usually in cases like this as it cuts out the long tail behavior and makes the whole system more predictable.


Yes.

But in this case it isn't that.


Right, hints can only tell the database to use features it already has, they don't add new features on the fly.

I think the hints discussion is interesting, but not relevant to this particular article.


That's an interesting limitation I didn't know. Though I haven't really relied on index-only scans much so far, the limitations regarding the visibility map seemed a bit too hard to judge to me. I assume that this works usually, but it's always a bit intimidating when a feature talks about limitations in DB internal data structures you didn't even know existed.

What surprises me here a bit is that this only provides a factor of two improvement. I would have expected the index to be much smaller than the table, though they include a bunch of columns in the index. At that ratio I'd be a bit worried about consuming a serious amount of space for that speedup, but that's impossible to judge without knowing the details. And if the performance is important enough here of course this is still worth it even if the index is large.


Thanks for reading!

I hear you on visibility maps being intimidating. In practice, I haven't seen any cases where visibility map issues have prevented an index-only scan. But we did initially think that the visibility map was to blame for what we were seeing!

RE space of the index: it cost us about 1% of the free disk space on our workers. It was worth it for this particular feature.


The other thing i'm currently struggling with is the slow parameter sniffing in SQL server. We would really like to switch to the official mssql-jdbc driver but the performance is abysmal:

https://github.com/microsoft/mssql-jdbc/issues/1196#issuecom...


I had a similar situation a few years back. Postgres has constructs that are optimization barriers. Once I understood this and reviewed the Postgres source code to determine the semantics of the relevant constructs, it was possible to get a 10X improvement for complex set of queries that had already been optimized for several years.


SQLite has a nice solution to this problem: you can pin queries to use specific plans. By strategically controlling when statistics are redone, you can optimize a database for a specific set of queries. This does have the downside that it doesn't react to changes dynamically, but that is a good tradeoff in many circumstances.


They added an index to a table getting very active flow of writes. It would be very interesting to see the write performance change too, so the picture would be more balanced.


You should be able to dictate the exact query plan to every database. SQL is a bad and obsolete API.


happened to us but for an even weirder case. postgres wasn’t doing a simple index-only scan because, as i learned from googling, the row of the table wasn’t that much wider than the index. creating a huge extra column helped

i wish there was no fancy query planner at all, and definitely no obligatory ones. it’s 2021, we either learn how to plan computation, or can afford not to care


There's so many cases where this is the wrong opinion that it is mind boggling to see.

There's so many queries NOT blogged about that just do the thing you want pretty much all the time, for no extra work on your part.


query planner is a very complex, unpredictable piece of software that makes decisions I end up paying for. yes, I want it dismantled, and the same problem addressed somehow otherwise. maybe via dumber "query planners" and better "hints", even though the word "hint" doesn't go far enough in the direction of control that I need. I won't be sorry about this opinion even if you're used to this state of affairs

(i edited my comment very slightly to avoid getting more responses like yours)


Using SQL workarounds/hacks just makes the code harder to read and harder for the query optimizer to detect the original intention and optimize in the future.


This is true, but the workaround proposed in the article is not a change to the query at all. It's adding another index, which is a pretty classic case of "proper" usage of query analysis (as opposed to confusing/hacky mutations of the query itself).


Unless I'm misreading they didn't add another index. They actually just reduced the original index by removing an indexed expression and excluding rows that didn't match that expression instead.




Consider applying for YC's Winter 2026 batch! Applications are open till Nov 10

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: