Order-Equivalent `Over` Clauses

⚡Order-equivalent over clause can produce different row orders
Apparently, the framing must be identical too to make over clauses order-equivalent

At the heart of this article, there is a simple question: Will the following query always give the same numbering in both columns?

SELECT ROW_NUMBER() OVER (ORDER BY x)
     , ROW_NUMBER() OVER (ORDER BY x)
  FROM ...

I did not hide any relevant information. The two column definitions are absolutely identical. The problem becomes more obvious if I add example data that highlights the crucial point.

SELECT ROW_NUMBER() OVER (ORDER BY x)
     , ROW_NUMBER() OVER (ORDER BY x)
     , t.*
  FROM (VALUES (1, 10)
             , (1, 20)
       ) t(x, y)

Two rows have the same value in the column x, which is used in order by. And that brings us to the core of the question: Is the order of peer rows (equally ranked rows) the same across different over clauses? For example, if you sort chronologically in two over clauses, will the rows that happened at the same time have the same order in both over clauses? The answer is, luckily: Yes, the SQL standard requires that. There are nonetheless systems that do not provide that guarantee, as you can see in the chart above.0

The SQL standard defines a term for that purpose: order-equivalent.1 Two over clauses2 are order-equivalent if the partition by (if present) and the Order By clauses refer to the same columns and the effective modifiers in the order by clause ([asc|desc], nulls [first|last], collate) are the same. In the case, that standard requires the row order produced by two over clauses to be the same—even among peer rows.3

If you like this page, you might also like …

… to subscribe my mailing lists, get free stickers, buy my book or get training.

Besides the requirement of the standard there are two more reasons to treat peer rows equally among over clauses whenever possible: Maintaining the principle of least astonishment and reducing the response time. Even if not required by the standard in any way, system vendors should generally seek to minimize the number of required sort operations. After all, it would be wasted time to sort the rows twice in the example above.

Order-equivalent over clauses are very common in practice. Thus, I’m happy the standard takes care of it.4 It is less fortunate that there are similar cases that do not fall under the order-equivalence as defined by the standard. Yet equal treatment of peers would be very desirable. Take the next query as an example.

SELECT ROW_NUMBER() OVER (ORDER BY x)
     , ROW_NUMBER() OVER (ORDER BY x DESC)
  FROM (VALUES (1, 10)
             , (1, 20)
       ) t(x, y)

The query adds two columns: one with ascending numbering, the other with descending numbering. This query might return surprising results as the peers might be treated differently by each over clause. The following is a valid result for this query.

… OVER (ORDER BY x)	… OVER (ORDER BY x DESC)
1	1
2	2

The reasonable expectation that the numbering always goes in opposite direction is not fulfilled. More formally, one might expect that the total of these columns is same for all rows of the result. When the one column goes up, the other needs to go down by the same amount. As these over clauses are not order-equivalent—they differ in the order by modifiers—, the standard does not require SQL engines to treat peers the same. Yet one of the tested systems has always produced the desirable result of exactly opposite numbering.

Frankly speaking, I think this desirable behavior is rather motivated by performance optimization than by the aim to avoid surprising results. Why should the DBMS re-sort the result if all that needs to be done is to process it in opposite direction? Which coincidentally takes care of equal treatment of peers. This performance aspect is my biggest hope that behavior becomes more common in the not-too-distant future.

Think About It

Do you see other cases where equal treatment of peers would be desirable even though the over clauses are not order-equivalent?5

Till then the question arises how to get a correct numbering in opposite directions—even in presence of peers. The simplest answer is, of course, to avoid peers by extending the order by clause so that now two rows can have the same values in the order keys. Practically it means to include the required columns from primary keys to the order by clause.

Think About It

As opposed to primary key columns it is not generally sufficient to include the unique key columns. Why not?6

When extending the order by clause is not possible or desired, we must use a query that uses order-equivalent over clauses only. The next query does that. Even though the over clauses are not identical, they are still order-equivalent as the different framing (rows between) is irrelevant for order-equivalence as defined by the SQL standard. The example uses a forward-looking frame to count the remaining rows—which gives the desired, opposite descending numbering.

SELECT ROW_NUMBER() OVER (ORDER BY x)
     , COUNT(*)     OVER (ORDER BY x
                          ROWS BETWEEN
                               CURRENT ROW
                           AND UNBOUNDED FOLLOWING
                         )
  FROM

This solution requires proper implementation of the order-equivalence requirements of the standard—which is not always the case✓✗. A solution that does not require conforming treatment of peers in order-equivalent over clauses is also possible. Just use mathematics.

SELECT rn, cnt - rn + 1
  FROM (SELECT ROW_NUMBER() OVER (ORDER BY x) AS rn
             , COUNT(*)     OVER (          ) AS cnt
          FROM 
) t

The key to success is to use only one window function (here row_number) that treats peers differently. The count function is used to get the total of all rows, which does not require ordering at all. The outer query can build the opposite numbering by subtracting the ascending numbering from the total row count—and taking care of the off-by-one error ;)

Think About It

Why is it wrong to put the same order by clause in the over clause of the count function?7

This article discusses just one example for problems that can arise out of non-determinism. It is generally a good habit to avoid non-determinism whenever possible. Often by including primary key columns into order by clauses, by avoiding row frames, … the standard has a list for that.8 Yet I believe system vendors could take [even] more care to improve usability by avoiding surprising results. Not just in the extend mandated by the SQL standard. In particular, when it would give even faster responses.

Of course, these topics are also covered in my analysis and aggregation training.

You can’t catch up on 20 years of SQL evolution in one day. Subscribe the newsletter via E-Mail, Bluesky or RSS to gradually catch up and to keep modern-⁠sql.com on your radar.

About the Author

Markus Winand provides insights into SQL and shows how different systems support it at modern-sql.com. Previously he made use-the-index-luke.com, which is still actively maintained. Markus can be hired as trainer, speaker and consultant via winand.at.

Footnotes

The tests backing the chart are a little more tricky than the example discussed here. Keep in mind: Just because you don’t get a non-conforming result in one case, does not mean you’ll never get one. A failure (⚡) in the chart above means that I’m able to trick that system into producing a non-conforming result. Conversely, success (✓) in the chart just means that I did not get a non-conforming result—but that does not rule out you could get one.
ISO/IEC 9075-2:2023 §7.15 SR 17
For simplicity, I refer to over clauses only and do not mention the window clause all over again.
ISO/IEC 9075-2:2023 §7.15 GR 4
From the beginning, I think. At least, the edition of 2008 has that requirement already.
E.g. over(order by a, b) and over(partition by a order by b).
Because unique constraints do not require not null columns like primary keys do. As most unique constraints have nulls distinct semantic, there could be multiple null values which become peers in the order by clause.
Noteworthy deviation: SQLite allows null in primary key columns
Because order by in over generally implies the range between current row and unbounded following.
ISO/IEC 9075-2:2023 §9.16

Order-Equivalent `Over` Clauses

If you like this page, you might also like …

Think About It

Think About It

Think About It

About the Author

Footnotes

Can I use…

News

The Standard

Concepts

Features

Use Cases

Connect with Markus Winand

Order-Equivalent Over Clauses

If you like this page, you might also like …

Think About It

Think About It

Think About It

About the Author

Buy the Book

Hire Markus

Footnotes

Can I use…

News

The Standard

Concepts

Features

Use Cases

Connect with Markus Winand

Order-Equivalent `Over` Clauses