What’s wrong with CTEs though? I have never thought of them as busywork and star...

tmoertel · on Aug 24, 2024

CTEs are not inherently busywork. I rather like them. What is busywork is having to chop a linear flow of operations into chunks and then wrap those chunks in CTEs that you must wire together. All this, simply because the SQL syntax doesn't let you express that flow directly.

magicalhippo · on Aug 24, 2024

> What’s wrong with CTEs though?

Depends on DB engines I suppose. I've come across that certain operations were not allowed in CTEs, and they can be an optimization barrier.

However if your query is dynamically modified at runtime, then CTEs can be a no-go. For example, we have a grid component which first does a count and then only selects the visible rows. This is great if you have expensive subselects as columns in a large table. However to do the counting it turns the main query into a sub-query, and it doesn't handle CTEs.

geertj · on Aug 26, 2024

Understood. I should have asked my question a bit more specifically: what's wrong with CTEs that wouldn't be an issue with this new pipe syntax. I briefly scanned the paper and it appears there aren't any specific benefits to the pipe syntax that would make optimization easier. So we can expect that if a SQL engine doesn't optimize CTEs well it would likely have the same limitations for the pipe syntax.

Section 2.1.4 the paper lists the benefits of the pipe syntax over CTEs, and they are all based on ergonomics. As someone who has never had issues with the ergonomics of CTEs I must say I am not convinced that proposed syntax is better. It may be that I've been doing SQL for so long that I don't see its warts. Overall SQL feels like a very well designed and consistent language to me. The new pipe syntax appears to bolt on an imperative construct to an otherwise purely functional language.

tmoertel · on Aug 28, 2024

> The new pipe syntax appears to bolt on an imperative construct to an otherwise purely functional language.

It's not imperative. The pipe symbol is a relational operator that takes one table as input and produces one as output. It's still purely functional, but it has the advantage of making the execution order obvious. That is, the order is a linear top-down flow, not the inside-out flow implicit in vanilla SQL. Further, when your wanted flow doesn't match vanilla SQL's implicit ordering, you don't have to invent CTEs to wire up your flow. You just express it directly.

As for ergonomics, consider a simple task: Report some statistics over the top 100 items in a table. Since LIMIT/ORDER processing is last in vanilla SQL's implied ordering, you can't directly compute the stats over the top items. You must create a CTE to hold the top items and then wire it into a second SELECT statement to compute the stats. That's busywork. With pipe syntax, there's no need to invent that intermediate CTE.

geertj · on Aug 29, 2024

> It's not imperative. The pipe symbol is a relational operator that takes one table as input and produces one as output.

Maybe I used the wrong term. In my mental model, the query planner decides the order in which the query is evaluated based on what table stats predict is most efficient query plan, and I actually don't really want to think about the order too much. For example, if I create a CTE, I don't necessarily want it to be executed in that order. Maybe a condition on the later query can be pushed back into the earlier CTE so that less data can be scanned.

I will admit that technically there should be no difference in how a query planner handles either. But to me the pipe syntax does not hint as much at these non-linear optimizations than CTEs do. I called the CTE syntax more functional as it implies less to me.

> but it has the advantage of making the execution order obvious.

So we're back to ergonomics which I just never had an issue with...

> As for ergonomics, consider a simple task: Report some statistics over the top 100 items in a table. Since LIMIT/ORDER processing is last in vanilla SQL's implied ordering, you can't directly compute the stats over the top items.

Could I not compute the stats over all values, then order and limit them, and depend on the query planner to not do the stat calculation for items outside the limit? If the order/limit does not depend on a computed statistic that should be possible? Or does that not happen in practice?

tmoertel · on Aug 29, 2024

No, the wanted stats are a function of the top 100 items.

As a concrete example, consider computing the average sales volume by category for the top 100 items. Here's the vanilla SQL for it:

    WITH
      TopItems AS (
        SELECT category, sales_volume
        FROM Items
        ORDER BY sales_volume DESC
        LIMIT 100
      )
    SELECT category, AVG(sales_volume) AS avg_sales_volume
    FROM TopItems
    GROUP BY category;

Because ORDER/LIMIT processing is implicitly last in vanilla SQL, if you need to do anything after that processing, you must do it in a new SELECT statement. Thus you must capture the ORDER/LIMIT results (e.g., as a CTE or, heaven forbid, as a nested SELECT statement) and then wire those results into that new SELECT statement via its FROM clause.

In contrast, with SQL pipes you can express any ordering you want, so you can feed the ORDER/LIMIT results directly into the statistical computations:

    FROM Items
    |> ORDER BY sales_volume DESC
    |> LIMIT 100
    |> AGGREGATE AVG(sales_volume) AS avg_sales_volume
       GROUP BY category

That's way simpler and the data flows just as it reads: from top to bottom.

geertj · on Aug 29, 2024

Okay, thanks for that example. The ability of the pipe syntax to re-order the standard SQL pipeline order does indeed provide for more compact queries in this case.

RaftPeople · on Aug 25, 2024

> What’s wrong with CTEs though?

At least in SQL Server CTE's are syntax level, so multiple uses of a CTE in a query causes it to get expanded in each of those places, which typically increases the complexity of the query and can cause issues with the optimizer and performance.