SQL Server Merge Statement
-
Posted on July 18, 2009 by Derek Dieter
-
0
Before getting into the syntax and examples for the Merge statement introduced in SQL Server 2008, I’d like to discuss it’s pro’s, cons, and uses. If you’re like me, the minute you get to develop a new application using 2008 you decide to use every new feature introduced. Well, while you learn everything up front, you also make more mistakes up front.
USES FOR MERGE
When I first wrote a rules processing engine using SQL 2008, I wrote all upserts (updates and inserts) using MERGE. What I came to find out is that they are very difficult to troubleshoot. When writing relatively complex procedures there is always a middle ground between performance and maintenance. You can choose to break up code into chunks for maintainability and troubleshooting, or you can create one set-based operation that may perform marginally faster and will perform within a single batch (one transaction). The merge statement falls into the latter category.
Good Uses
- Synchronization of transactional tables
- Update / Insert operations on highly transactional tables
- Tables that employ effective dating or require opening / closing of records
Not so good uses
- Merging two very large tables (even if merging with a sub select)
- Complex code that may require many operational changes
I would use a merge statement if the update / insert / delete operation was being performed using highly transactional tables. Both tables involved in the transaction will receive benefit. If the data in one table needs to be synchronized with the other, then you can do so without having to take the target table offline or performing two separate batches and wrapping them in a transaction. The source table will also benefit because it will receive less locking because both insert and update operations are in one batch.
Regarding the not-so-good uses of merging large tables, I’ve ran into bad scenarios of merging tables with a million plus rows where the merge statement would not perform an index seek. Come to find, it would not perform the seek because it did not know how many rows were going to be involved in the update vs the insert. So it sometimes decides to scan all rows despite an appropriate index existing.
Another inappropriate use of merge is on code that is mission critical that may need to be troubleshot very quickly. Unfortunately merge does not provide very good insight into what rows have been updated versus inserted or deleted. (I did however provide a technique to do this later in the article).
THE MERGE STATEMENT EXAMPLES AND SYNTAX
So now that we have the uses out of the way, let’s review the merge statement itself. The merge statement does pretty much what the name implies. It takes a data set, compares it with a table, and within one batch performs different actions for both the commonalities and the differences. As discussed before, the benefit to this is that it potentially eliminates 3 separate batches (UPDATE, INSERT, AND DELETE) and puts them into one batch helping to minimize locking and increase efficiency.
When merging a dataset to a table, you specify one set of criteria in which you can perform actions based on:
- The rows that match
- The rows that do not match in the source dataset
- The rows that do not match in the target table
Based on these three conditions you can perform inserts, updates and deletes, all within the same statement. Here is the basic syntax:
[cc lang=”sql”]
MERGE dbo.TargetTable tgt — Target Table
USING dbo.SourceTable src — Source Table
ON tgt.ID = src.ID — Main comparison criteria
WHEN MATCHED — When ID’s exist in both tables
— DO SOMETHING
WHEN NOT MATCHED — When ID’s from Source do not exist in Target
— DO SOMETHING
WHEN NOT MATCHED BY SOURCE — When ID’s from Target do not exist in Source
— DO SOMETHING
[/cc]
It should be noted here that after the main comparison criteria, only one “WHEN” branch is required. So if we only wanted to perform an action when say, the rows do not exist in the target table, we can do so without having to perform any other actions:
[cc lang=”sql”]
MERGE dbo.TargetTable tgt — Target Table
USING dbo.SourceTable src — Source Table
ON tgt.ID = src.ID — Main comparison criteria
WHEN NOT MATCHED — When ID’s exist in both tables
— DO SOMETHING
[/cc]
Now, let’s add additional comparison criteria and we will perform an insert. For this example we want to insert all records that we received in our source table since the last 24 hours and we will insert them where they do not exist in our target table.
[cc lang=”sql”]
MERGE dbo.TargetTable tgt — Target Table
USING dbo.SourceTable src — Source Table
ON tgt.ID = src.ID — Main Comparison
AND src.DateCreated > GETDATE()-1
WHEN NOT MATCHED — ID’s from Source do not exist in Target
THEN INSERT — Insert records from source
(
ID,
Name
)
VALUES
(
ID,
Name
);
[/cc]
Now this example, while very simple, does not provide us anything that we could not have done without the merge statement. We could have also performed the same action using INSERT..SELECT
[cc lang=”sql”]
INSERT INTO dbo.TargetTable
(
ID,
Name
)
SELECT
ID,
Name
FROM dbo.SourceTable src
WHERE NOT EXISTS(SELECT 1
FROM dbo.TargetTable
WHERE ID = src.ID)
AND src.DateCreated > GETDATE()-1
[/cc]
So even though we can perform one action, that is not the benefit of merge. The benefit is in being able to perform two actions at once performing one transaction.
Let’s take advantage of it properly by Inserting the rows that do not exist in the target table, and update the rows that do exist in the target table where the name is different. Again, we are going to limit our source to records created within the last 24 hours.
[cc lang=”sql”]
MERGE dbo.TargetTable tgt — Target Table
USING dbo.SourceTable src — Source Table
ON tgt.ID = src.ID — Main Comparison
AND src.DateCreated > GETDATE()-1
WHEN NOT MATCHED — ID’s from Source do not exist in Target
THEN INSERT — Insert records from source
(
ID,
Name
)
VALUES
(
ID,
Name
)
WHEN MATCHED
AND tgt.Name != src.Name
THEN UPDATE
SET
tgt.Name = src.Name;
[/cc]
This example above is probably one of the most common uses of merge. We are taking advantage of it’s minimum footprint by performing an insert and update in one batch.
We’ll take it one step further. In this example we know that our source table is more comprehensive and accurate than our target table. So we need our target table to mirror our source table. There are only two columns in both these tables, ID and Name. We cannot simply drop and create the table from the source because users are actively querying the table.
Being that the we want to compare the entire source table, we are going to remove the 24 hour limitation. We are also going to delete the records in the target that do not exist in the source.
[cc lang=”sql”]
MERGE dbo.TargetTable tgt — Target Table
USING dbo.SourceTable src — Source Table
ON tgt.ID = src.ID — Main Comparison
WHEN NOT MATCHED BY TARGET — ID’s from Source do not exist in Target
THEN INSERT — Insert records from source
(
ID,
Name
)
VALUES
(
ID,
Name
)
WHEN MATCHED — Update the records where the names do not match
AND tgt.Name != src.Name
THEN UPDATE
SET
tgt.Name = src.Name
WHEN NOT MATCHED BY SOURCE — Delete records in target that do not exist in source
THEN DELETE;
[/cc]
TROUBLESHOOTING AND RECORD COUNTS
The times I’ve had difficulty troubleshooting MERGE have been in determining whether one of the records that had been affected fell under the condition of the insert or the update section of merge. If you are dead set on using merge for a particular process, I might suggest you also create a flag field in your table that will show you whether the row has been modified by an update or insert.
Another difficulty in troubleshooting is not only finding out which records have been updated or inserted, but how many records were updated versus how many were inserted. There is a way to do this, however it’s not pretty. The solution is to use the OUTPUT clause of the merge statement to trap the $action of the row and insert that into a temp table. After being inserted you can count the results:
[cc lang=”sql”]
CREATE TABLE #ActionCount
(
[action] VARCHAR(50)
)
INSERT INTO #ActionCount
(
[action]
)
SELECT [action]
FROM (
MERGE dbo.TargetTable tgt
USING dbo.SourceTable src
ON tgt.ID = src.ID
WHEN NOT MATCHED BY TARGET
THEN INSERT
(
ID,
Name
)
VALUES
(
ID,
Name
)
WHEN MATCHED
AND tgt.Name != src.Name
THEN UPDATE
SET
tgt.Name = src.Name
OUTPUT
$action
) t
(
[action]
)
SELECT
RowsUpdated = COUNT(CASE [action]
WHEN ‘UPDATE’ THEN 1 END)
,RowsInserted = COUNT(CASE [action]
WHEN ‘INSERT’ THEN 1 END)
FROM #actioncount
[/cc]
- Comments (RSS)
- Trackback
- Permalink
[…] Not only is this able to run concurrently, but it’s also cleaner and uses half the scans (meaning less disk io). From here, you could make the argument that MERGE would be a better option. For now, I try to avoid merge for various reasons. You can find them here: The MERGE Statement […]
Spell check.
I am new to this. Since I am coming from a Microsoft SQL Server eovnrinment, and to some other it may seem like the two versions should be very similar, and they are, to a certain degree, but they are also very different in some important and basic ways. Good read, thanks!
when is the merge statement going to have error logging
DML like Oracle… they really missed this one.
Yes, that would be useful. There is a way to do this using SSIS, however having it inline would be nice too.