with clause, views, and derived tables (
from clause subqueries) provide very similar functionality. The main difference is their visibility. Consequently, the principle of least astonishment suggests that refactoring a derived table into a
with query should not have any side effects on performance. Databases generally obey this principle, although PostgreSQL represents a big exception.
The following sections outline the most important optimizations that interfere with the
with clause and present an overview of which databases apply them to
with queries, views, and derived tables.
Predicate pushdown is an optimization that applies conditions (predicates) as early as possible—ideally preventing any loading of unneeded rows.
The following example uses a derived table to demonstrate this:
SELECT c1, c2, ... FROM (SELECT c1, c2, ... FROM base_table ) derived_table WHERE c1 = ?
If the subquery is executed in isolation, it returns all rows of
base_table. Even if there is an index on
c1, the isolated subquery doesn't know about the restriction on
c1 and so it cannot use the index.
Predicate pushdown passes the condition to the subquery so that the query effectively looks like this:
SELECT c1, c2, ... FROM (SELECT c1, c2, ... FROM base_table WHERE c1 = ? ) derived_table
In this example, one could ague that the
where clause should be written in the subquery anyway. However, this argument doesn't apply when querying a view instead of a derived table.
Predicate pushdown is subject to some logical constraints. For example, join predicates cannot be pushed below the first join they affect. On the other hand, predicates can be pushed through
group by and window functions if they are among the grouping- or partitioning keys.
Predicate pushdown is particularly important to make better use of indexes.
Projection pushdown or column elimination aim to eliminate unneeded columns as early as possible—ideally not fetching them at all.
Consider the following example, which uses a view for demonstration:
CREATE VIEW view_name AS SELECT * FROM base_table WHERE deleted = 0
SELECT count(*) FROM view_name
count(*) does not need any table columns.0 If the database can push this information down, i.e. the columns that are actually needed, the database can avoid fetching unneded columns.
In the above example, projection pushdown is particularly important if there is an index on the
deleted column: in that case that database can satisfy the query from the index alone (the so-called Index-Only Scan) and does not need to fetch the other columns from the table itself. This can improve query speed by an order of magnitude. Learn more about the Index-Only Scan on Use The Index, Luke!.
Sort elimination removes redundant sort operations. Redundant sort operations can arise out of multiple
order by clauses:
WITH cte AS (SELECT id FROM feature_with ORDER BY id ) SELECT id FROM cte ORDER BY id
There is no reason to sort the result twice. Even if the
order by clauses were different, only the outermost
order by clause determines the row order in the result. With some exceptions,1 any
order by clause except the outermost one can be eliminated.
Redundant sort operations are less obvious when they arise out of various
group by (potentially sort-based),
over (partition by… order by…), and join (sort-merge) operations. Additionally, indexes might yield rows in a useful order and might thus make a sort operation obsolete.
In the following example, a database might sort the rows on
(c1, c2) to resolve the
row_number() window function.2 In that case, the order produced by the
with query already satisfies the outer
order by clause so that the database doesn't need to run another sort operation.
WITH cte AS (SELECT c1 , c2 , ROW_NUMBER() OVER(PARTITION BY c1 ORDER BY c2) rn FROM table_name ) SELECT c1, rn FROM cte ORDER BY c1, c2
Note that the outer
order by clause is still required because the
over clause only defines the relevant order for the window function. If the database executes the window function differently, e.g., distributing the partitions to parallel threads, then the database cannot eliminate the last sort operation. Omitting the outer
order by clause would then return the rows in undefined order.
Comparison of Products
Besides PostgreSQL, all tested databases optimize
with clauses in the same way that they optimize views or derived tables.
The PostgreSQL query planner considers each
with query and the main statement separately. This makes the
with clause an optimization fence (as of 9.63): meta data, such as the order in which a
with query returns rows, is not available during optimization of the outer query.
Before 5.7, MySQL did not optimize derived tables as well as it optimizes views.4