with Clause: Performance Impacts


The non-recursive with clause, views, and derived tables (from clause subqueries) provide very similar functionality. The main difference is their visibility. Consequently, the principle of least astonishment suggests that refactoring a derived table into a with query should not have any side effects on performance. Databases generally obey this principle, although PostgreSQL represents a big exception.

The following sections outline the most important optimizations that interfere with the with clause and present an overview of which databases apply them to with queries, views, and derived tables.

Predicate Pushdown

Predicate pushdown is an optimization that applies conditions (predicates) as early as possible—ideally preventing any loading of unneeded rows.

The following example uses a derived table to demonstrate this:

SELECT c1, c2, ...
  FROM (SELECT c1, c2, ...
          FROM base_table
       ) derived_table
 WHERE c1 = ?

If the subquery is executed in isolation, it returns all rows of base_table. Even if there is an index on c1, the isolated subquery doesn't know about the restriction on c1 and so it cannot use the index.

Predicate pushdown passes the condition to the subquery so that the query effectively looks like this:

SELECT c1, c2, ...
  FROM (SELECT c1, c2, ...
          FROM base_table
         WHERE c1 = ?
       ) derived_table

In this example, one could ague that the where clause should be written in the subquery anyway. However, this argument doesn't apply when querying a view instead of a derived table.

Predicate pushdown is subject to some logical constraints. For example, join predicates cannot be pushed below the first join they affect. On the other hand, predicates can be pushed through group by and window functions if they are among the grouping- or partitioning keys.

Predicate pushdown is particularly important to make better use of indexes.

Projection Pushdown

Projection pushdown or column elimination aim to eliminate unneeded columns as early as possible—ideally not fetching them at all.

Consider the following example, which uses a view for demonstration:

CREATE VIEW view_name AS
SELECT *
  FROM base_table
 WHERE deleted = 0
SELECT count(*)
  FROM view_name

Note that count(*) does not need any table columns.0 If the database can push this information down, i.e. the columns that are actually needed, the database can avoid fetching unneded columns.

In the above example, projection pushdown is particularly important if there is an index on the deleted column: in that case that database can satisfy the query from the index alone (the so-called Index-Only Scan) and does not need to fetch the other columns from the table itself. This can improve query speed by an order of magnitude. Learn more about the Index-Only Scan on Use The Index, Luke!.

Sort Elimination

Sort elimination removes redundant sort operations. Redundant sort operations can arise out of multiple order by clauses:

WITH cte
  AS (SELECT id
        FROM feature_with
       ORDER BY id
     )
SELECT id
  FROM cte
 ORDER BY id

There is no reason to sort the result twice. Even if the order by clauses were different, only the outermost order by clause determines the row order in the result. With some exceptions,1 any order by clause except the outermost one can be eliminated.

Redundant sort operations are less obvious when they arise out of various order by, group by (potentially sort-based), over (partition by… order by…), and join (sort-merge) operations. Additionally, indexes might yield rows in a useful order and might thus make a sort operation obsolete.

In the following example, a database might sort the rows on (c1, c2) to resolve the row_number() window function.2 In that case, the order produced by the with query already satisfies the outer order by clause so that the database doesn't need to run another sort operation.

WITH cte
  AS (SELECT c1
           , c2
           , ROW_NUMBER()
             OVER(PARTITION BY c1 ORDER BY c2) rn
        FROM table_name
     )
SELECT c1, rn
  FROM cte
 ORDER BY c1, c2

Note that the outer order by clause is still required because the over clause only defines the relevant order for the window function. If the database executes the window function differently, e.g., distributing the partitions to parallel threads, then the database cannot eliminate the last sort operation. Omitting the outer order by clause would then return the rows in undefined order.

Comparison of Products

Besides PostgreSQL, all tested databases optimize with clauses in the same way that they optimize views or derived tables.

The PostgreSQL query planner considers each with query and the main statement separately. This makes the with clause an optimization fence (as of 9.63): meta data, such as the order in which a with query returns rows, is not available during optimization of the outer query.

Before 5.7, MySQL did not optimize derived tables as well as it optimizes views.4

About the Author

Photo of Markus Winand

Markus Winand teaches efficient SQL—inhouse and online. He minimizes the development time using modern SQL and optimizes the runtime with smart indexing. His book entitled SQL Performance Explained has become standard reading.

Buy his Book on Amazon

Cover of “SQL Performance Explained”: Squirrel running on grass

The essence of SQL tuning in 200 pages

Buy on Amazon
(paperback only)

Paperback and PDF also available at Markus’ store.

Hire Markus

…to answer your current SQL questions.

The quick and easy way to benefit from his extensive knowledge and experience.
Learn more »

Footnotes

  1. SQL:2016, Part 2, §10.9, General Rule 5: the result of count(*) is “hardcoded” as the number of rows—regardless of any null values.

  2. E.g., if there is a fetch first clause; if the order by is in a over or within group clause and the like.

  3. A common way to executed window functions is to first sort on the combined list of partition by and order by expressions. The evaluation of the window function is than taking place on basis of the sorted set.

  4. See “Allow the CTE (Common Table Expression) optimization fence to be optionally disabled” in the PostgreSQL Todo list.

  5. See “Derived Tables in MySQL 5.7”.

“modern SQL” by Markus Winand is licensed under a Creative Commons Attribution-Noncommercial-No Derivative Works 3.0 Unported License.
Legal | Contact | NO WARRANTY | Trademarks | Privacy | CC-BY-NC-ND 3.0 license