Literate programming is an approach to programming introduced by Donald Knuth that, among other ideas, aims to arrange source code in order of human logic so that it can be understood by reading it like a novel: from the beginning to the end.
SQL works in the opposite way: due to the nesting of sub-queries we must find and understand the innermost query and then start working from inside out. Not even well-chosen names can improve the recursive reading process notably because names follow the sub-queries in SQL.
Even the order of clauses in the simplest query has a forward
dependency: the meaning of the
select clause depends on
from clause. That actually poses an unsolvable
problem for development tools: when typing
select, it is
not yet know which tables are accessed and it is therefore also unknown
which columns are visible. Without
from no sensible code
SQL:1999 introduced an alternative to the nesting at least: the
with clause. In context of
literate SQL, the
with clause has two important
properties: (1) names come first; (2) sub-queries can be unnested.
SELECT ... FROM (SELECT ... FROM ... ) intention_revealing_name ...
The longer the inner query gets, the more useful a good name becomes, and the further it is pushed out of sight. How tragic.
clause puts the name before the code—very much like function
declarations work in other programming languages:
WITH intention_revealing_name AS ( SELECT ... FROM ... ) SELECT ... FROM intention_revealing_name irn ...
The topping is that it is still possible to assign a short alias in the
from clause (like
irn in the above
Order of human logic
The second aspect of
with is its ability to turn
nesting into chaining. This is the way to make
complex queries top-down readable.
can define many queries. Each of them can refer to other queries
previously defined in the same
with clause. This is exactly what is needed for
The following example drafts a literate query. The first two query definitions are not even shown—their name alone reveals their purpose:
WITH frequently_bought_together (ordered, suggested, times) AS ( SELECT ... ) , frequently_bought_shortly_after (ordered, suggested, times) AS ( SELECT ... ) , product_recommendation (ordered, suggested, times) AS ( SELECT ordered, suggested, SUM(times) FROM (SELECT * FROM frequently_bought_together UNION ALL SELECT * FROM frequently_bought_shortly_after ) frequently_needed_together GROUP BY ordered, suggested ORDER BY SUM(times) DESC FETCH FIRST 1 ROW ONLY ) SELECT * FROM product_recommendation
The ease of reading is not caused by the top-down structure alone.
With meaningless names like
z the query would still be hard to understand. It is the
combination of good names and good structure that brings the literate
As often, it is possible to do too much of it. Small sub-queries
frequenlty_needed_together in the example might not
hinder readability. It is important to divide the code into manageable,
logically connected units.
Then again it sometimes makes sense to use
clause inside a
with query. You are not limited to a
with clause: syntactically,
with is a prefix for the
keyword. Use it where it helps improving readability.
By now it should be clear why this technique is called “literate SQL”: writing easily readable SQL is an art, not science.
For some database—most notable PostgreSQL— the
with clause has a performance impact. See “Performance