- ⚡Order-equivalent
overclause can produce different row orders - Apparently, the framing must be identical too to make
overclauses order-equivalent
At the heart of this article, there is a simple question: Will the following query always give the same numbering in both columns?
SELECT ROW_NUMBER() OVER (ORDER BY x)
, ROW_NUMBER() OVER (ORDER BY x)
FROM ...I did not hide any relevant information. The two column definitions are absolutely identical. Yet I can extend the query to make the problem more apparent.
SELECT ROW_NUMBER() OVER (ORDER BY x)
, ROW_NUMBER() OVER (ORDER BY x)
, t.*
FROM (VALUES (1, 10)
, (1, 20)
) t(x, y)The columns x of the order by clause has always the same value. And that brings us to the core of the question: Is the order of peer rows (equally ranked rows) the same across different over clauses? The answer is, luckily: Yes, the SQL standard requires that. There are nonetheless systems that do not provide that guarantee, as you can see in the chart above.0
The SQL standard defines a term for that purpose: order-equivalent.1 Two over clauses2 are order-equivalent if the partition by (if present) and the Order By clauses refer to the same columns and the effective modifiers in the order by clause ([asc|desc], nulls [first|last], collate) are the same. In the case, that standard requires the row order produced by two over clauses to be the same—even among peer rows.3
If you like this page, you might also like …
… to subscribe my mailing lists, get free stickers, buy my book or get training.
Besides the requirement of the standard there are two more reasons to treat peer rows equally among over clauses whenever possible: Maintaining the principle of least astonishment and reducing the response time. Even if not required by the standard in any way, system vendors should generally seek to minimize the number of required sort operations. After all, it would be wasted time to sort the rows twice in the example above.
Order-equivalent over clauses are very common in practice. Thus, I’m happy the standard takes care of it.4 It is less fortunate that there are similar cases that do not fall under the order-equivalence as defined by the standard. Yet equal treatment of peers would be very desirable. Take the next query as an example.
SELECT ROW_NUMBER() OVER (ORDER BY x)
, ROW_NUMBER() OVER (ORDER BY x DESC)
FROM (VALUES (1, 10)
, (1, 20)
) t(x, y)The query adds two columns: one with ascending numbering, the other with descending numbering. This query might return surprising results as the peers might be treated differently by each over clause. The following is a valid result for this query.
| … OVER (ORDER BY x) | … OVER (ORDER BY x DESC) |
|---|---|
| 1 | 1 |
| 2 | 2 |
The reasonable expectation that the numbering always goes in opposite direction is not fulfilled. More formally, one might expect that the total of these columns is same for all rows of the result. When the one column goes up, the other needs to go down by the same amount. As these over clauses are not order-equivalent—they differ in the order by modifiers—, the standard does not require SQL engines to treat peers the same. Yet one of the tested systems has always produced the desirable result of exactly opposite numbering.
Frankly speaking, I think this desirable behavior is rather motivated by performance optimization than by the aim to avoid surprising results. Why should the DBMS re-sort the result if all that needs to be done is to process it in opposite direction? Which coincidentally takes care of equal treatment of peers. This performance aspect is my biggest hope that behavior becomes more common in the not-too-distant future.
Think About It
Do you see other cases where equal treatment of peers would be desirable even though the over clauses are not order-equivalent?5
Till then the question arises how to get a correct numbering in opposite directions—even in presence of peers. The simplest answer is, of course, to avoid peers by extending the order by clause so that now two rows can have the same values in the order keys. Practically it means to include the required columns from primary keys to the order by clause.
Think About It
As opposed to primary key columns it is not generally sufficient to include the unique key columns. Why not?6
When extending the order by clause is not possible or desired, we must use a query that uses order-equivalent over clauses only. The next query does that. Even though the over clauses are not identical, they are still order-equivalent as the different framing (rows between) is irrelevant for order-equivalence as defined by the SQL standard. The example uses a forward-looking frame to count the remaining rows—which gives the desired, opposite descending numbering.
SELECT ROW_NUMBER() OVER (ORDER BY x)
, COUNT(*) OVER (ORDER BY x
ROWS BETWEEN
CURRENT ROW
AND UNBOUNDED FOLLOWING
)
FROM This solution requires proper implementation of the order-equivalence requirements of the standard—which is not always the case✓✗. A solution that does not require conforming treatment of peers in order-equivalent over clauses is also possibly. Just use mathematics.
SELECT rn, cnt - rn + 1
FROM (SELECT ROW_NUMBER() OVER (ORDER BY x) AS rn
, COUNT(*) OVER ( ) AS cnt
FROM
) tThe key to success is to use only one window function (here row_number) that treats peers differently. The count function is used to get the total of all rows, which does not require ordering at all. The outer query can build the opposite numbering by subtracting the ascending numbering from the total row count—and taking care of the off-by-one error ;)
Think About It
Why is it wrong to put the same order by clause in the over clause of the count function?7
This article discusses just one example for problems that can arise out of non-determinism. It is generally a good habit to avoid non-determinism whenever possible. Often by including primary key columns into order by clauses, by avoiding row frames, … the standard has a list for that.8 Yet I believe system vendors could take [even] more care to improve usability by avoiding surprising results. Not just in the extend mandated by the SQL standard. In particular, when it would give even faster responses.
Of course, these topics are also covered in my analysis and aggregation training.

