Transferring Large Amounts of Data using Batch Inserts

Below is a technique used to transfer a large amount of records from one table to another. This scales pretty well for a couple reasons. First, this will not fill up the entire log prior to committing the transaction. Rather, it will populate the table in chunks of 10,000 records. Second, it’s generally much quicker. You will have to play around with the batch size. Sometimes it’s more efficient at 10,000, sometimes 500,000, depending on the system.

If you do not need to insert into an existing table and just need a copy of the table, it is better to do a SELECT INTO. However for this example, we are inserting into an existing table.

Another trick you should do is to change the recovery model of the database to simple. This way, there will be much less logging in the transaction log.

The WITH (TABLOCK) below only works in SQL 2008.

DECLARE @BatchSize INT = 10000

WHILE 1 = 1
BEGIN

    INSERT INTO [dbo].[Destination] --WITH (TABLOCK)  -- Uncomment for 2008
    (
        FirstName
        ,LastName
        ,EmailAddress
        ,PhoneNumber
    )
    SELECT TOP(@BatchSize)
        s.FirstName
        ,s.LastName
        ,s.EmailAddress
        ,s.PhoneNumber
    FROM [dbo].[SOURCE] s
    WHERE NOT EXISTS (
        SELECT 1
        FROM dbo.Destination
        WHERE PersonID = s.PersonID
    )

    IF @@ROWCOUNT < @BatchSize BREAK
   
END

With the above example, it is important to have at least a non clustered index on PersonID in both tables.

Another way to transfer records is to use multiple threads. Specifying a range of records as such:

INSERT INTO [dbo].[Destination]
    (
        FirstName
        ,LastName
        ,EmailAddress
        ,PhoneNumber
    )
    SELECT TOP(@BatchSize)
        s.FirstName
        ,s.LastName
        ,s.EmailAddress
        ,s.PhoneNumber
    FROM [dbo].[SOURCE] s
    WHERE PersonID BETWEEN 1 AND 5000
GO
INSERT INTO [dbo].[Destination]
    (
        FirstName
        ,LastName
        ,EmailAddress
        ,PhoneNumber
    )
    SELECT TOP(@BatchSize)
        s.FirstName
        ,s.LastName
        ,s.EmailAddress
        ,s.PhoneNumber
    FROM [dbo].[SOURCE] s
    WHERE PersonID BETWEEN 5001 AND 10000

For super fast performance however, I’d recommend using SSIS. Especially in SQL Server 2008. We recently transferred 17 million records in 5 minutes with an SSIS package executed on the same server as the two databases it transferred between.

SQL Server 2008
SQL Server 2008 has made changes with regards to it’s logging mechanism when inserting records. Previously, to do an insert that was minimally logged, you would have to perform a SELECT.. INTO. Now, you can perform a minimally logged insert if you can lock the table you are inserting into. The example below shows an example of this. The exception to this rule is if you have a clustered index on the table AND the table is not empty. If the table is empty and you acquire a table lock and you have a clustered index, it will be minimally logged. However if you have data in the table, the insert will be logged. Now if you have a non clustered index on a heap and you acquire a table lock then only the non clustered index will be logged. It is always better to drop indexes prior to inserting records.

To determine the amount of logging you can use the following statement

SELECT * FROM ::fn_dblog(NULL, NULL)

One comment
Subhraz 25 Apr 2013 at 8:32 am

Nice Example.

Featured Articles

 Site Author