Pivoting data is a rather common problem that comes in many different flavors. At its heart, the requirement is to transpose data from multiple rows into columns of a single row.
This requirement is particularity common in a reporting context. The following explanation is therefore based on a query that reports monthly revenues:
SELECT EXTRACT(YEAR FROM invoice_date) year , EXTRACT(MONTH FROM invoice_date) month , SUM(revenue) revenue FROM invoices GROUP BY EXTRACT(YEAR FROM invoice_date) , EXTRACT(MONTH FROM invoice_date)
The query returns the result in a purely vertical form—one value per row. Quite often, the data is required in another form: for example, just one row per year and a separate column for each month. In other words, the rows of a year should be turned into columns.
The first step in implementing this requirement is to remove the month from the
group by and
select clauses to get one row per year:
SELECT EXTRACT(YEAR FROM invoice_date) year , SUM(revenue) total_revenue FROM invoices GROUP BY EXTRACT(YEAR FROM invoice_date)
Obviously, the result not longer provides a monthly breakdown, nevertheless, this step is required to condense the result to a single row per year.
The next step is to define twelve columns where each column summarizes the revenues of one month only. To get the revenue for January, for example, the expression
sum(revenue) must take only invoices from January into account. This can be easily accomplished with the
sum(revenue) FILTER (WHERE EXTRACT(MONTH FROM invoice_date) = 1)
filter clause limits the rows aggregated to those satisfying the condition in parenthesis. In this example, only the invoices from January. The revenues of the other months can be obtained in the same way.
To make the query more literate, the
extract expression can be moved to a central location. That could be a generated column or a view so that other queries could reuse this expressions. For this example, it is sufficient to centralize the
extract expression within the query—either using the
with clause, or as an inline view:
SELECT year , SUM(revenue) FILTER (WHERE month = 1) jan_revenue , SUM(revenue) FILTER (WHERE month = 2) feb_revenue ... , SUM(revenue) FILTER (WHERE month = 12) dec_revenue FROM (SELECT invoices.* , EXTRACT(YEAR FROM invoice_date) year , EXTRACT(MONTH FROM invoice_date) month FROM invoices ) invoices GROUP BY year
Even though the
filter clause was introduced with SQL:2003, it is hardly supported today. Luckily, this is not a big problem because
case can be used for the very same purpose. The trick is to map values that do not satisfy the filter criteria to neutral values, which do not change the result of the aggregation.
Null is a very good choice for this because it does not change the result of any aggregate function—not even
else null is the default clause for
case expressions without explicit
else clause anyway—it is sufficient to skip the
else clause altogether.
SELECT year , SUM(CASE WHEN month = 1 THEN revenue END) jan_revenue , SUM(CASE WHEN month = 2 THEN revenue END) feb_revenue ... , SUM(CASE WHEN month = 12 THEN revenue END) dec_revenue FROM (SELECT invoices.* , EXTRACT(YEAR FROM invoice_date) year , EXTRACT(MONTH FROM invoice_date) month FROM invoices ) invoices GROUP BY year
CASE WHEN month = 1 THEN revenue END evaluates to the revenue for invoices from January. For other invoices, the implied
else null returns the
null value, which does not change the result of
sum. See also “Conforming Alternatives to
The Special Case of EAV
The greatest challenge with the pivot problem is to recognize it when you run into it. This is particularity true when dealing with the so-called entity-attribute-value (EAV) model: it does not look like a pivot problem, but it can nevertheless be solved in the very same way.
The EAV model takes normalization to the extreme and no longer uses columns in the traditional way. Instead, every single value is stored in its own row. Besides the value, the row also has a column to specify which attribute the value represents and a third column to identify what entity the values belongs to. Ultimately, a three column table can hold any data without ever needing to change the table definition. The EAV model is thus often used to store dynamic attributes.
The EAV model does not come without drawbacks: It is almost impossible to use constraints for data validation, for example. However, the most puzzling issue with the EAV model is that the transformation into a one-column-per-attribute notation is almost always done using joins—quite often one outer join per attribute. This is not only cumbersome, it also results in very poor performance—a true anti-pattern.
However, turning rows into columns is the pivot problem in its purest form. Therefore, these steps should be followed again: (1) use
group by to reduce the rows to one row per entity, (2) use
case to pick the right attribute for each column.
SELECT submission_id , MAX(CASE WHEN attribute='name' THEN value END) name , MAX(CASE WHEN attribute='email' THEN value END) email , MAX(CASE WHEN attribute='website' THEN value END) website FROM form_submissions GROUP BY submission_id
Note the use of the
max function: it is required to reduce the rows of the group (all attributes) into a single value. This is a purely syntactic requirement that is applicable regardless of the actual number of rows that are grouped.
To obtain the original value for each attribute—even though we have to use an aggregate function—the respective filter logic (
filter) must not return more than one not-
null value. In the example above, it is crucial that each of the named attributes (
website) exists only once per
submission_id. If duplicates exist, the query returns only one of them.
The prerequisite that each attribute must not appear more than once is best enforced by a unique constraint.0 Alternatively, the query can count the aggregated rows using
count(*) and the respective
case expressions (or
filter clauses). The results can be validated in the application—if selected as additional columns—or in the
having count(*) filter (...) <= 1.
If the prerequisite is satisfied and the aggregation is always done on a single not-
null value, every aggregate function just returns the input value. However,
max have the advantage that they also work for character strings (
SQL is a statically typed language: the query must list the expected attributes upfront1.
unpivot (SQL Server, Oracle)
model clause (Oracle)
model clause, which was introduced with Oracle Version 10g, can also be used to solve the pivot problem (documentation).
crosstab table function (PostgreSQL)
The PostgreSQL database is delivered with a set of
crosstab table functions to pivot data (documentation).