null
even if there are non-null
valuesfilter
clauseover
clause)The aggregate function any_value
takes a non-null
value from a group of rows.
SELECT product_id
, ANY_VALUE(order_id) AS example_order_id
FROM order_lines
GROUP BY product_id
This query lists ordered product_id
s, along with an example order_id
for each product.
As with every aggregate function it accepts the filter
and over
clauses,0 if generally supported by the system.
null
even if there are non-null
valuesnull
even if there are non-null
values • ⚡Not an aggregate functionNote that any_value
is non-deterministic. It might pick another value every time—i. e. the query results are not reproducible. This is also true, and most surprising, in context of window functions:
SELECT ANY_VALUE(x) OVER()
FROM …
The example uses any_value
as window function “over everything”. In this case, every other aggregate function delivers the same result for every row, deterministically. But the SQL standard does not require this for any_value
. Any_value
might take different values for every row, even the over
clause always covers the same rows. Having that said, none of my tests ever observed such a behavior yet.
Min
and max
happen to fulfill all requirements the standard puts upon any_value
. Wherever you could use any_value
, you can also use min
or max
. The main difference is that any_value
could be implemented more efficiently as it doesn’t need to search for a specific value. It may just take the first non-null
value that comes along.
Finally, I’d like to address another proclaimed use-case of any_value
: Coping with the missing implementation of feature T301, “Functional dependencies”. Consider this example:
CREATE TABLE customers (
id BIGINT PRIMARY KEY,
name VARCHAR,
[…]
)
CREATE TABLE orders (
customer BIGINT REFERENCES customers(id),
placed TIMESTAMP,
[…]
)
SELECT customers.*, MAX(placed)
FROM customers
LEFT JOIN orders
ON customers.id = orders.customer
GROUP BY customers.id
The query lists all customers and when they placed their last order. Note that it selects all columns from the customers
table—thus also the name
column. However, the group by
has the customers.id
column only. Is that even valid?
… to subscribe my mailing lists, get free stickers, buy my book or join a training.
It is. Logically, as well as according to the SQL standard. The crucial point is that customer names are functionally dependent on the customers primary key (id
). For every group of rows for which the customer.id
is the same, customer.name
must be the same as well. The SQL standard takes great care to document known functional dependencies—but leaves it up to the implementation to consider them or not. T301, “Functional dependencies” is an optional feature.
As you can see above, this aspect of SQL is widely neglected by the vendors. Some systems raise an error, which means that it don’t implement this optional feature. Some other systems (⚡) just ignore the problem and allow selecting any column—even if there is no functional dependency. Their users cannot notice that there might be something wrong with the query. These systems effectively imply any_value
.
While the query above it totally fine, it might fail (lack of T301) or accidentally work correctly (implying any_value
). Only two out of nine tested systems give the desired answer for the right reason. These are not exactly the odds I’d like to take.
SQL-92 offered two options to rewrite this query so that it always works correctly: (1) use min
or max
on every functionally dependent columns that are not mentioned in group by
; (2) extend the group by
clause by these columns. The new any_value
function of SQL:2023 is just a third variant of option 1). Frankly speaking, I find none of them very user friendly. Even worse: They do not protect us from mistakes such as accidentally putting a column from the orders
table into the group by
clause.1
In 1999, the real solution was added to the SQL standard: the (optional) feature T301, “Functional dependencies”. Thus, I’d like to close this post with an appeal to the vendors: Please go the extra mile and implement T301 rather than T626 (any_value
). This is what users expect: no effort, maximum security. Even the performance impact should be low considering it mostly affect the prepare phase. Big kudos to the MySQL team for implementing it in 5.7!
The only downer is that functional dependencies require proper constraints. Systems that lack them, such as data warehouses, cannot benefit from them. Here it might be desirable to have a function that validates that there is only one distinct non-null
value. Maybe unique_value(… [null|error] on error)
? Or not enforced trusted
constraints like in Db2?
You can’t catch up on 20 years of SQL evolution in one day. Subscribe the newsletter via E-Mail, Bluesky or RSS to gradually catch up and to keep modern-sql.com on your radar.
The essence of SQL tuning in 200 pages
Buy now!
(paperback and/or PDF)
Paperback also available at Amazon.com.
Markus offers SQL training and consulting for developers working at companies of all sizes.
Learn more »
As well as distinct
and all
, but that is really pointless.
Or if a customers
-column is migrated to the orders
table later.