Using the Merge Statement to populate a historical table with Effective Dating

One of my favorite uses for the MERGE statement introduced in SQL Server 2008 is the updating of a historical table. With versions prior to 2008 this operation had to be performed in two separate statements. Merge helps us to streamline the process.

The advantage to the database engine when using a Merge statement is in it’s minimal locking. Prior to merge, we had to update the record if it existed. And if it did not exist, we inserted the record. Long story short, it was double the amount of locking before the optimizer knew which rows to exclusively lock.

To understand what’s going on in the statement below, you have to read starting from the MERGE statement. This is where the initial updating is taking place first. As you scroll down, you will see the OUTPUT clause. This actually outputs the results of the MERGE statment. You can think of this output as a virtual table named “merged”. The $action variable is an intrinsic column (if you will), that contains the action that row played during the MERGE. Lastly, if you scroll to the top, you will find the insert statement that finally inserts the new “current” record for the corresponding “expired” record we just retired. This was the same row we used for the criteria to UPDATE the expired row, however we did not use any of it’s values.

DECLARE @Now datetime = GETDATE()
DECLARE @EffToDate datetime = '2079-06-06T00:00:00.000'
DECLARE @JobID INT = 1 -- This would be dynamic

INSERT INTO Employee_History
(
     EmployeeID
    ,FirstName
    ,LastName
    ,EmploymentStatus
    ,EffFromDate
    ,EffToDate
    ,IsCurrentRecord
    ,JobID
)
SELECT
     EmployeeID     = merged.EmployeeID
    ,FirstName          = merged.FirstName
    ,LastName           = merged.LastName
    ,EmploymentStatus   = merged.EmploymentStatus
    ,EffFromDate        = merged.EffToDate
    ,EffToDate          = @EffToDate
    ,IsCurrentRecord    = 1
    ,JobID              = merged.JobID
FROM(
  MERGE Employee_History eh
  USING (
        SELECT
            EmployeeID
            ,FirstName
            ,LastName
            ,EmploymentStatus
            ,ChkSum = CHECKSUM(
                 FirstName
                ,LastName
                ,EmploymentStatus
            )
        FROM Employees e
    ) e
      ON eh.EmployeeID = pfs.EmployeeID
      AND eh.IsCurrentRecord = 1
  WHEN MATCHED AND
    eh.ChkSum != e.ChkSum -- Employee exists but a value has changed
    THEN -- Expire this record we have a new one
    UPDATE
        SET  eh.IsCurrentRecord = 0
            ,eh.EFFToDate       = @Now
  WHEN NOT MATCHED
    THEN -- Insert brand new employee
    INSERT
    (
        EmployeeID
        ,FirstName
        ,LastName
        ,EmploymentStatus
        ,EffFromDate
        ,EffToDate
        ,IsCurrentRecord
        ,JobID
    )
    VALUES
    (
        EmployeeID
        ,FirstName
        ,LastName
        ,EmploymentStatus
        ,@Now
        ,@EffToDate
        ,1 --This record is current (its the first)
        ,@JobID
    )
    OUTPUT
            $action
            ,INSERTED.EmployeeID
            ,INSERTED.FirstName
            ,INSERTED.LastName
            ,INSERTED.EmploymentStatus
            ,INSERTED.JobID
            ,INSERTED.EffToDate
) AS merged (
            output_action
            ,EmployeeID
            ,FirstName
            ,LastName
            ,EmploymentStatus
            ,JobID
            ,EffToDate)
WHERE merged.output_action = 'UPDATE'

There are however a couple pitfalls regarding the merge statement. First, it’s not easy to troubleshoot. Many times I will scratch my head wondering why a record didn’t get updated. It always ends up being an issue with what I have written, however because the statement is so complex, it is difficult at times to keep a grasp at what’s going on. The other big issue I have with the merge statement, is that I am unable to tell both how many records it has updated AND how many it’s inserted. Previously with the two statement method, I could select the @@ROWCOUNT afterwards. However now, I find I have to set special flags, the go back and count the flags to determine the updates vs the inserts.

2 comments
David 19 Mar 2012 at 10:55 am

Very nice use of the OUTPUT clause! I found this post when searching for how to do this exact thing: update a history table by closing out the old row and inserted the new row in one step. Thanks!

SK 27 Apr 2011 at 11:29 pm

hi ben,I have loaded data for 1 to 20 of the month…in target DB using Merge statement. and now a restatement of file on day 10 should occure.How to insert/update the new vaules and should we delte the records on that day10 ?how we should implement REstatement of Data using Merge statemnt?

Featured Articles

 Site Author