Postgres feature you're not using – CTEs a.k.a. WITH clauses

saurik · on Aug 21, 2024

> CTEs won’t always be quite as performant as optimizing your SQL to be as concise as possible. In most cases I have seen performance differences smaller than a 2X difference, this tradeoff for readability is a nobrainer as far as I’m concerned. And with time the Postgres optimizer should continue to get better about such performance.

So, my knowledge here might be out of date, but this behavior isn't "the optimizer isn't smart enough": using a CTE in PostgreSQL causes an explicit boundary in the optimizer which prevents some optimizations from being performed. In some cases you really need/want this behavior, and it might increase the performance of your query; in other cases, this is the last thing you would want. People keep asking for optimizer hints, which PostgreSQL refuses to add, and yet they leave this quirk in explicitly as an escape hatch.

hyperman1 · on Aug 21, 2024

This changed in pg 12.

https://www.postgresql.org/docs/release/12.0/ : Automatic (but overridable) inlining of common table expressions (CTEs)

camgunz · on Aug 21, 2024

Yep, it's a huge improvement for complicated pipelines.

One thing to be aware of is "NOT MATERIALIZED". Here's what the docs say:

A useful property of WITH queries is that they are normally evaluated only once per execution of the parent query, even if they are referred to more than once by the parent query or sibling WITH queries. Thus, expensive calculations that are needed in multiple places can be placed within a WITH query to avoid redundant work. Another possible application is to prevent unwanted multiple evaluations of functions with side-effects. However, the other side of this coin is that the optimizer is not able to push restrictions from the parent query down into a multiply-referenced WITH query, since that might affect all uses of the WITH query's output when it should affect only one. The multiply-referenced WITH query will be evaluated as written, without suppression of rows that the parent query might discard afterwards. (But, as mentioned above, evaluation might stop early if the reference(s) to the query demand only a limited number of rows.)

However, if a WITH query is non-recursive and side-effect-free (that is, it is a SELECT containing no volatile functions) then it can be folded into the parent query, allowing joint optimization of the two query levels. By default, this happens if the parent query references the WITH query just once, but not if it references the WITH query more than once. You can override that decision by specifying MATERIALIZED to force separate calculation of the WITH query, or by specifying NOT MATERIALIZED to force it to be merged into the parent query. The latter choice risks duplicate computation of the WITH query, but it can still give a net savings if each usage of the WITH query needs only a small part of the WITH query's full output.

silon42 · on Aug 21, 2024

Any issues with recursive and NOT MATERIALIZED? Recursive is pretty much the reason to use CTEs for me and NOT MATERIALIZED is essential there in most cases.

camgunz · on Aug 21, 2024

I haven't had any, and yeah it's almost always the right thing IME.

muhrizqiardi · on Aug 21, 2024

I'm not using? I'm using that feature almost all the time!

mosselman · on Aug 21, 2024

Postgres feature you aren’t using: SELECT

teaearlgraycold · on Aug 21, 2024

This post was written for juniors

rkachowski · on Aug 21, 2024

in 2013

dveeden2 · on Aug 21, 2024

The good thing here is that other databases like MySQL, TiDB, SQLite etc also support this. so this. And even recursive CTEs are well supported.

https://modern-sql.com/caniuse/with_(non-recursive,_top_leve...

larodi · on Aug 21, 2024

I sure do use them for many years now, my girlfriend now uses them and loves them, my topcoder friend eventually started using them 10 years ago.

So, no, we are using them , but y’all know what ? Unless recursive they are really nothing so special…

yas_hmaheshwari · on Aug 21, 2024

You are right about recursive CTE - that's a functionality that's not possible without CTE

However, I will argue that an equally great benefit is code readability. That allows you to give good names to pieces of SQL block is like having small functions in normal programming

larodi · on Aug 22, 2024

Indeed it is, although as cascade of create temp view statements within transaction works almost the same way for readability. besides, not everyone can count the open/closing braces for the CTE parts.

CTEs also cannot have indexes on the intermediate results, while views/mviews can and this benefits great cascaded analysis in SQL.

speaking from experience, where we had to optimize a particular network analysis for urban planning, which was initially impossible in qgis/argis, then first version ran for 3-10mins, and initially with intermediate indices the whole thing went down to 45sec-1.5min. this is not possible with CTE to my knowledge, even though I've searched for ways to tell it to build intermediate indices.

hansvm · on Aug 21, 2024

Used judiciously, a macro system in front of your SQL is often a good approach here. You gain the "linguistically common subtable" capabilities of a CTE, the ability to name constants, and the ability to name/parameterize common sub-expressions, even in databases not supporting CTEs. With LSP support being what it is, you can even whip up editor integration for your new language in a day or less.

Compared to CTEs (assuming you're not using recursion and don't _want_ the materialization), that extra syntactic sugar helps with readability, and the "macro" nature of the thing ensures you don't have any slowdowns from running the thing due to a different query plan.

Downsides include the ability for juniors to turn your SQL into an awful mess, the fact that you'll still occasionally _want_ materialized CTEs for performance and have to write them anyway, plus if you implement it wrong you'll have runtime overhead and very few introspection capabilities (the simplest version I've seen that's decent to work with is creating SQL files as artifacts from the templates using the build system, obviously depending on how much of a rube goldberg your particular builds are).

felixyz · on Aug 21, 2024

Any particular macro system you have used or created? If so, did you use an existing pre-processor, or just coded something up from scratch?

stareatgoats · on Aug 21, 2024

(2013)

It was also discussed around that time, 78 comments: https://news.ycombinator.com/item?id=7023907

lovasoa · on Aug 21, 2024

It's strange to call CTEs a "postgres feature". It's standard SQL, it's in all databases.

sherburt3 · on Aug 21, 2024

Not in MySQL 5.7, which is officially my least favorite database to maintain.

solatic · on Aug 21, 2024

> CTEs won’t always be quite as performant as optimizing your SQL to be as concise as possible. In most cases I have seen performance differences smaller than a 2X difference, this tradeoff for readability is a nobrainer as far as I’m concerned.

This is the key trade-off you need to keep in the back of your head. Pre-mature performance optimization is the root of all evil; prefer readability first. But the fact remains that you should be setting timeouts and other time budgets, tracing calls from API through to the database, seeing what's taking the most time. If you get to the point where you need to optimize a CTE-based query, be prepared to rewrite it.

cesarb · on Aug 21, 2024

WITH clauses are a somewhat recent feature, at least on PostgreSQL. If you learned SQL before they became commonly available, it's unsurprising if you tend to think of other SQL techniques first.

zaxomi · on Aug 21, 2024

A quick search shows it in the documentation for version 8.4, released July 1, 2009.

Maybe 15 years is "somewhat recent feature". :)

cesarb · on Aug 21, 2024

Well, it wasn't there when I first learned SQL by reading the PostgreSQL manual. I recall that this was back when the size of TEXT columns was limited, so it was before version 7.1 (which introduced TOAST tables), released 13 April, 2001.

So yeah, if you first learned SQL around 25 years ago, something added only 15 years ago is "somewhat recent". And according to a link posted in another comment, other databases introduced this feature (non-recursive WITH clauses) even later.

croes · on Aug 21, 2024

>The general idea is that it allows you to create something somewhat equivilant to a view that only exists during that transaction.

I can't talk for Postgres but on MS SQL it's definitely not like a view.

I had a CTE with a filter condition in the where close but got a conversion error in the later use for a value I already filtered out in the CTE.

So it may look like some kind of temporary view definition but the actual code the optimizer generates may behave unexpected.

ibejoeb · on Aug 21, 2024

I'm all but certain that a CTE and an unindexed view are treated identically in the planning phase.

You're describing predicate pushdown, which can get hung up under certain conditions, like when using window functions or top. If I had to take a shot in the dark, you were dealing with something like that.

croes · on Aug 21, 2024

Thanks for the explanation

marcus_holmes · on Aug 21, 2024

I use CTEs all the time, find them way more readable than subqueries. I've refactored a few complex queries to use them and had the reaction from the original author of "wtf is this crap?". It does take some getting used to ;)

beAbU · on Aug 21, 2024

(I'm not a dev)

This feels to me like a "in other news, water is wet" kind of story. Maybe it's just me, but based on the other comments here CTEs seem to be common knowledge. I went on a SQL trial by fire this year after landing in a role that required massive amounts of weird querying for ad-hoc reporting purposes. Before this my experience and knowledge stopped at different join types.

I naturally discovered CTEs over time after seeking solutions to my problems on the usual online resources. For me the use is two-fold. 1/ I use it to create ad-hoc lookups on data, where the lookups are not available in the DB in the first place. And 2/ In complex queries with lots and lots of conditional joins, I can use a CTE to basically build up my temporary base table, getting all my data in one place, then my main `SELECT` becomes a lot cleaner and easier to read.

I've not looked at recursive CTEs yet, supposedly they can be quite handy. Not sure for what though.

We recently migrated from MySQL to Postgres, and the data reporting went from direct-db querying to Snowflake. And CTEs work on for me on all 3 platforms, so this does not even seem like a Postgres only thing.

Dissori · on Aug 21, 2024

This was posted 4 days ago but now says 7 hours ago, even the comments from back then now show as 30 minutes old?

simonw · on Aug 21, 2024

That happens when a Hacker News moderator decides to give a post a “second chance” at making it to the homepage. https://github.com/minimaxir/hacker-news-undocumented?tab=re...