The `ANY_VALUE(…)` Aggregate Function

New in SQL:2023 — and a better feature from 1999

⚡Might return null even if there are non-null values
⚡Not an aggregate function
Without filter clause
Not as window function (over clause)

The aggregate function any_value takes a non-null value from a group of rows.

SELECT product_id
     , ANY_VALUE(order_id) AS example_order_id
  FROM order_lines
 GROUP BY product_id

This query lists ordered product_ids, along with an example order_id for each product.

As with every aggregate function it accepts the filter and over clauses,0 if generally supported by the system.

⚡Might return null even if there are non-null values
⚡Might return null even if there are non-null values • ⚡Not an aggregate function

Note that any_value is non-deterministic. It might pick another value every time—i. e. the query results are not reproducible. This is also true, and most surprising, in context of window functions:

SELECT ANY_VALUE(x) OVER()
  FROM …

The example uses any_value as window function “over everything”. In this case, every other aggregate function delivers the same result for every row, deterministically. But the SQL standard does not require this for any_value. Any_value might take different values for every row, even the over clause always covers the same rows. Having that said, none of my tests ever observed such a behavior yet.

Min and max happen to fulfill all requirements the standard puts upon any_value. Wherever you could use any_value, you can also use min or max. The main difference is that any_value could be implemented more efficiently as it doesn’t need to search for a specific value. It may just take the first non-null value that comes along.

Finally, I’d like to address another proclaimed use-case of any_value: Coping with the missing implementation of feature T301, “Functional dependencies”. Consider this example:

CREATE TABLE customers (
 id   BIGINT PRIMARY KEY,
 name VARCHAR,
 […]
)

CREATE TABLE orders (
 customer BIGINT REFERENCES customers(id),
 placed   TIMESTAMP,
 […]
)

SELECT customers.*, MAX(placed)
  FROM customers
  LEFT JOIN orders
         ON customers.id = orders.customer
 GROUP BY customers.id

The query lists all customers and when they placed their last order. Note that it selects all columns from the customers table—thus also the name column. However, the group by has the customers.id column only. Is that even valid?

If you like this page, you might also like …

… to subscribe my mailing lists, get free stickers, buy my book or join a training.

It is. Logically, as well as according to the SQL standard. The crucial point is that customer names are functionally dependent on the customers primary key (id). For every group of rows for which the customer.id is the same, customer.name must be the same as well. The SQL standard takes great care to document known functional dependencies—but leaves it up to the implementation to consider them or not. T301, “Functional dependencies” is an optional feature.

⚡Doesn’t fail if there is no functional dependency (per default)

As you can see above, this aspect of SQL is widely neglected by the vendors. Some systems raise an error, which means that it don’t implement this optional feature. Some other systems (⚡) just ignore the problem and allow selecting any column—even if there is no functional dependency. Their users cannot notice that there might be something wrong with the query. These systems effectively imply any_value.

While the query above it totally fine, it might fail (lack of T301) or accidentally work correctly (implying any_value). Only two out of nine tested systems give the desired answer for the right reason. These are not exactly the odds I’d like to take.

SQL-92 offered two options to rewrite this query so that it always works correctly: (1) use min or max on every functionally dependent columns that are not mentioned in group by; (2) extend the group by clause by these columns. The new any_value function of SQL:2023 is just a third variant of option 1). Frankly speaking, I find none of them very user friendly. Even worse: They do not protect us from mistakes such as accidentally putting a column from the orders table into the group by clause.1

In 1999, the real solution was added to the SQL standard: the (optional) feature T301, “Functional dependencies”. Thus, I’d like to close this post with an appeal to the vendors: Please go the extra mile and implement T301 rather than T626 (any_value). This is what users expect: no effort, maximum security. Even the performance impact should be low considering it mostly affect the prepare phase. Big kudos to the MySQL team for implementing it in 5.7!

The only downer is that functional dependencies require proper constraints. Systems that lack them, such as data warehouses, cannot benefit from them. Here it might be desirable to have a function that validates that there is only one distinct non-null value. Maybe unique_value(… [null|error] on error)? Or not enforced trusted constraints like in Db2?

You can’t catch up on 20 years of SQL evolution in one day. Subscribe the newsletter via E-Mail, Bluesky or RSS to gradually catch up and to keep modern-⁠sql.com on your radar.

About the Author

Markus Winand provides insights into SQL and shows how different systems support it at modern-sql.com. Previously he made use-the-index-luke.com, which is still actively maintained. Markus can be hired as trainer, speaker and consultant via winand.at.

Footnotes

As well as distinct and all, but that is really pointless.
Or if a customers-column is migrated to the orders table later.

The `ANY_VALUE(…)` Aggregate Function

If you like this page, you might also like …

About the Author

Footnotes

Can I use…

News

The Standard

Concepts

Features

Use Cases

Connect with Markus Winand

The ANY_VALUE(…) Aggregate Function

If you like this page, you might also like …

About the Author

Buy the Book

Hire Markus

Footnotes

Can I use…

News

The Standard

Concepts

Features

Use Cases

Connect with Markus Winand

The `ANY_VALUE(…)` Aggregate Function