SQL’s merge
performs insert
, update
and delete
operations based on when…then
rules. It is useful for synchronizing the contents of a table with newer data. Instead of running a delete
, and an update
and an insert
statement,0 a single merge
statement takes care of everything…well, almost everything. Unfortunately, there is a scenario that standard SQL’s merge
does not cover, so in the end you might still need two statements. Let’s start from the beginning…
Contents:
When … Then
RulesWhen … And
ConditionsNot Matched by Source
: Deleting Extra RowsEvery merge
statement must provide the target table (into
clause), a source (using
clause) and a join condition (on
clause).
MERGE
INTO <target> [[AS] <new target name>]
USING <source> [[AS] <new source name>]
ON <join-condition>
<when…then rules>
While the target is typically the name of a table,1 the source can be anything that returns a table—even subqueries or table functions such as json_table
. The source and the target can be renamed just like in the from
clause.✓✗
json_table
, but with similar functionality provided by that systemWhen … Then
RulesThe base syntax is followed by the when…then
rules. More precisely, by the when [not] matched…then…
rules. Each rule applies either in the case that—for one source row—the on
clause does identify corresponding rows in the target (when matched
) or that it does not (when not matched
). If there are corresponding rows in the target—no matter how many—they can be be updated or deleted. Otherwise, if there are no corresponding rows in the target, an insert
can be done.
on
clausethen update … delete [where]
The following example demonstrates the typical use case of merge
: synchronizing the contents of a table with newer information provided from an external system. In the example, the external system is a web shop’s front end, which provides a user’s wish list. The database keeps the contents of all wish lists in the wish_lists
table (user_id
, product_id
, qty
).2 When a user edits a wish list, the front end provides the edited wish list to merge
, which identifies and performs the required operations. For the time being we assume that the front end puts the edited wish list in the table my_wish_list
(product_id
, qty
). We will directly process a JSON document later.
MERGE
INTO wish_lists
USING my_wish_list
ON wish_lists.user_id = ?
AND wish_lists.product_id = my_wish_list.product_id
WHEN NOT MATCHED THEN INSERT (user_id, product_id, qty)
VALUES ( ?, product_id, qty)
WHEN MATCHED THEN UPDATE SET qty = my_wish_list.qty
The merge
statement names the target table wish_lists
and the source table my_wish_list
. The on
clause only takes those rows in the target table that belong to a specific user into consideration—whereas the user_id
value is provided via a bind parameter (?
). Furthermore, the on
clause uses the product_id
to assign the remaining rows of the wish_lists
table to those in my_wish_list
.
For example, if there is a row in my_wish_list
for a specific product X, the on
clause essentially checks the target table for corresponding rows. If there is no row in wish_lists
for which user_id
and product_id
have the right values, the when not matched
rule applies and the respective row is inserted. This happens if the user has put a new product on the wish list. Otherwise, if there is a corresponding row in the target table, the when matched
clause applies and updates the qty
column of the matched row(s).3 This happens if the user already had this product on the wish list.
The commands after the keyword then
basically follow the syntax known from their stand-alone relatives—albeit in an abbreviated way. First of all, the then
commands don’t specify a target—the merge into
clause sets the target for all operations. Moreover, the commands are executed in the context of one source row and the corresponding target rows. Therefore, update
and delete
do not accept a where
clause✓✗—they know which rows to work on anyway. Finally, the insert
syntax is limited to a single-row values
clause. You cannot use an insert…select
nor can you use a values
clause that produces more than a single row.
I offer SQL training, tuning and consulting. Buying my book “SQL Performance Explained” (from €9.95) also supports my work on this website.
Note that standard SQL’s merge
is driven by the source table. Each row in the source is tested against the when…then
rules. Rows in the target that have no corresponding source row are ignored. In the example, the result of the on
clause is never true for other users, so the merge
command will not affect those rows. That’s fine for our use case where we only want to update the wish list of one specific user. On the other hand, if that specific user removes a product from the list, we should also remove the respective entry from the wish_lists
table. That is the idea behind synchronization. Unfortunately, standard SQL has no nifty syntax to act on extra rows in the target. We will look at this again later.
Before we continue with more advanced syntax, here’s one last fundamental thing to note. A merge
statement modifies each target row at most once.4 If a single target row has multiple corresponding source rows, the merge
statement raises an error✓✗ and aborts.5 That prevents the unfortunate situation that the order in which the source rows are processed affects the effect of merge
. Look at the example above and ask yourself this: What happens if my_wish_list
has several entries for a single product?6 If the qty
values are not the same, then the order of execution makes a big difference.
When … And
ConditionsThe when
conditions are not limited to the [not] matched
keywords. After an and
you can put an additional, arbitrarily complex condition. This allows us to add another when matched
rule that deletes a row if the qty
is 0
.
…
WHEN MATCHED AND my_wish_list.qty = 0 THEN DELETE
WHEN MATCHED THEN UPDATE SET qty = my_wish_list.qty
Due to the additional condition, the first when matched
rule only applies if the qty
has been set to zero. The corresponding target row is deleted in this case. The remaining “matched
” rows cause an update
like before. This update
could be limited by an additional condition as well: wish_lists.qty <> my_wish_list.qty
. This prevents an update
if the quantity is unchanged. That is semantically cleaner—in particular if triggers or change data capture (CDC) is involved—and can generally bring considerable performance improvements in some systems.
In case multiple when
conditions are true, only the first of them takes effect—very much like how case
works. The order of the rules is therefore significant.7 Source rows that don’t fulfill any when
condition don’t cause any action.
where
clause. See “Notable Extensions”Note that there are systems that accept and
conditions but are still limited to a single rule for each matched
/not matched
case.
Sometimes merge
causes confusion by giving surprising error messages. That results from the fact that standard SQL’s merge
first identifies all changes that must be done and then, in a second step, applies those changes. In other words, all when
conditions are evaluated before any changes are made. For the last example, it means that rows deleted by the new rule do not activate any when not matched
rule—not even if it appears later in the merge
command. By the time insert
, update
, or delete
starts, all decisions have already been made.8
Now consider this scenario:9 A merge
statement evaluates the when not matched
rules and identifies a row to be inserted. Meanwhile, before merge
actually inserts the row, another transaction inserts a row with the same primary key values and commits the transaction. When merge
eventually performs the operations, the insert
fails with a constraint violation. This might come as a surprise because the when not matched then insert
rule should not apply if the row exists. Merge
first decides what to do, then it does it. In the meantime, the situation may have changed. Constraint and transaction isolation are preserved—even if they cause surprising errors.
The below mentioned “upsert” statements provided by some systems might behave less surprisingly in such scenarios.
I’ve already explained some syntax differences that apply to insert
, update
and delete
when used in merge
as compared to their stand-alone cousins. However, there is a more subtle aspect to consider: Stand-alone insert
, update
, and delete
statements have only one table in scope, but inside merge
they have two tables in scope. The next example highlights four places where it is unclear which table’s column x
is being referred to if both tables have that column.
MERGE
INTO target
USING source
ON x = x ⓵
WHEN MATCHED AND x > 0 ⓶ THEN UPDATE SET x = x + 1 ⓷
WHEN NOT MATCHED THEN INSERT (x) VALUES (x) ⓸
Of course we can resolve these ambiguities by qualifying the column with the respective table name, e.g. source.x
, but we would also expect that the system reports ambiguities if there are any—and that it doesn’t report them if there are none. It turns out that these totally reasonable expectations are not always met.
update set <target>.<col-name> = …
It is particularly worrisome that some systems don’t report the ambiguity labeled as ⓷
. After all, it makes a big difference whether the update increments the value currently stored in the target (x = target.x + 1
) or sets it to the value x = source.x + 1
. These are two completely unrelated things, and some systems just pick one at their own discretion.
Not Matched by Source
: Deleting Extra RowsStandard SQL has no syntax to process rows in the target that have no corresponding row in the source, but that’s just standard SQL. In fact there are systems that offer such a syntax—beyond what is written in the standard.10
For demonstration, let’s continue the wish list example. The merge
statement should also cover the case that an item that was on the wish list before is removed. Thus, a row in wish_lists
should be removed if there is no corresponding entry in the source. The non-standard syntax for this is when not matched by source
.
…
WHEN NOT MATCHED THEN INSERT (user_id, product_id, qty)
VALUES ( ?, product_id, qty)
WHEN NOT MATCHED BY SOURCE THEN DELETE
…
The new rule applies to those target rows that have no corresponding source row. The then
clause deletes them. All of them. That is, in addition to the wish list items that the particular user removed, all wish list items of all other users are also removed! Look at the on
clause: The condition on user_id
was meant to limit the impact of the merge
statement to one particular user. That works if only source rows can trigger actions. With when not matched by source
, sole target rows can trigger actions too. As items of other users never have a corresponding source row, they are all deleted. The requirement not to affect other users has been criminally neglected.
Luckily, this requirement can be added straight into the when
condition.
WHEN NOT MATCHED BY SOURCE AND user_id = ? THEN DELETE
Due to the and
, the rule only applies to entries of the respective user.
In addition to when not matched by source
, there is also a “by target
” variant which corresponds to the standard SQL condition when not matched
without any by
qualifier.
Standard SQLs merge
can also get the job done as you can use arbitrarily complex sub-queries as the source. That allows you to add the extra rows that need to be deleted to the source—using a full outer join
or a union
. From the perspective of merge
, these rows will be “matched” rows so they can activate a when matched then delete
rule. However, this approach loses almost all the beauty of a single merge
statement. Using a separate delete
statement might be a better option. I’ll leave this as an exercise for the readers and conclude by showing a better example for what you can do in the merge
source.
You might have been wondering, and for a good reason, what’s the point of the wish list example if the front end has to insert
into the my_with_list
table first?11 After all, it might be easier to issue the required delete
, update
and insert
statements—in particular if they are generated by an ORM tool. The point of this example is to demonstrate how combining different SQL features opens up interesting possibilities.
Realistically, a web front end would provide the edited content of the wish list as a JSON document—maybe like this:
[
{"product_id":42 ,"qty":1}
,{"product_id":123,"qty":2}
]
Now, the json_table
function can easily transform such a JSON document into a tabular form, suitable as a source of merge
. Just put it into the using
clause.
…
USING JSON_TABLE( ?
, '$[*]'
COLUMNS ( product_id INT PATH '$.product_id'
, qty INT PATH '$.qty'
)
) my_wish_list
…
If the caller provides the JSON document as the value for the bind parameter (?
), SQL will take care of everything else.
While we’re covering composability, let’s add a short consideration of a similar yet substantially different use case: placing an order. This use case is similar because it would use a similar JSON document. Still, there are three reasons why it is also substantially different: (1) The ordered products just need to be inserted—it is not a case for merge
; (2) A second table needs to be filled—the table for the order itself, not the ordered products. Thus, we need two insert
statements on different targets.12 Again, merge
is no help here, but SQL feature T495, “Combined data change and retrieval”, is helpful. It allows you to embed insert
, update
, delete
and merge
statements via subqueries into select
statements. Thus, you can arbitrarily combine them in a single SQL statement that takes care of all database operations required to place an order. That’s something for another article.
For now, we need to complete the list of three substantial differences between updating a wish list and placing an order: (3) The tabular (relational) storage of orders and ordered items is definitively beneficial for further processing. That is not necessarily the case for a wish list. We could also save the wish list JSON as it is. In case the contents of the wish lists ever needs to be analyzed, the respective SQL functions (like json_table
) can still access the contents of the JSON document. After all, SQL does not dictate how you store the data; in fact, the contrary is true: SQL supports various approaches.13 We can—actually we must—choose the best approach for each use case.
merge
• Only if top-level statement is select
: insert into … select … final from (delete from …)
merge
insert … on conflict do …
(PostgreSQL, SQLite)
insert … log errors …
(Oracle DB)
update … from
(BigQuery, PostgreSQL, SQL Server, SQLite)
Data-Modifying Statements in With
(PostgreSQL)
If you care about maximum compatibility, keep in mind that not all products support all syntax variants.
The merge
statement is defined in ISO/IEC 9075-2:2016 as optional features F312, “MERGE statement”, F313, “Enhanced MERGE statement” and F314, “MERGE statement with DELETE branch”.
The essence of SQL tuning in 200 pages
Buy now!
(paperback and/or PDF)
Paperback also available at Amazon.com.
Markus offers SQL training and consulting for developers working at companies of all sizes.
Learn more »
Sometimes the best approach is to just delete
or truncate
everything and copy the new data over. However, sometimes the functionality provided by merge
is the better option. In particular, side effects like triggers or change data capture (CDC) work best with merge
.
Standard SQL explicitly allows views to be the target of merge
to the same extent it allows views to be the target of insert
, update
, and delete
.
The underlined columns user_id
and product_id
form the primary key.
It also happens if the qty
is unchanged.
Exception: the then update … delete
syntax of the Oracle Database (not standard).
9075-2:2016 14.12 GR 6 a) i 1 B II: cardinality violation (21000).
Assuming there is no constraint on my_wish_list
that prevents it.
Strictly speaking only the relative order of when matched
rules among themselves and when not matched
rules among themselves is significant.
This approach also prevents paradoxical situations when updating columns used in the on
clause.
Assuming that the on
clause uses the primary key and that the system uses multi-version concurrency control (aka snapshot isolation).
And that’s totally fine. The standard just defines the required behavior for the standard syntax. Implementations are allowed to accept non-standard syntax and do whatever they want.
A use case for a global temporary table, by the way.
The multi-table insert
of the Oracle Database is of limited help for this use case.
JSON support was added to the SQL standard in 2016 and was substantially extended in the 2023 release. SQL also supports XML and nested tables, just to name a few more options.