Transferring Large Amounts of Data using Batch Inserts

Posted on May 31, 2009 by Derek Dieter
10
4

« SQL Server Pagination using CTE and RowNumber

Finding the total virtual, shared and AWE memory used »

Below is a technique used to transfer a large amount of records from one table to another. This scales pretty well for a couple reasons. First, this will not fill up the entire log prior to committing the transaction. Rather, it will populate the table in chunks of 10,000 records. Second, it’s generally much quicker. You will have to play around with the batch size. Sometimes it’s more efficient at 10,000, sometimes 500,000, depending on the system.

If you do not need to insert into an existing table and just need a copy of the table, it is better to do a SELECT INTO. However for this example, we are inserting into an existing table.

Another trick you should do is to change the recovery model of the database to simple. This way, there will be much less logging in the transaction log.

The WITH (TABLOCK) below only works in SQL 2008.

[cc lang=”sql”]
DECLARE @BatchSize int = 10000

WHILE 1 = 1
BEGIN

INSERT INTO [dbo].[Destination] –WITH (TABLOCK) — Uncomment for 2008
(
FirstName
,LastName
,EmailAddress
,PhoneNumber
)
SELECT TOP(@BatchSize)
s.FirstName
,s.LastName
,s.EmailAddress
,s.PhoneNumber
FROM [dbo].[Source] s
WHERE NOT EXISTS (
SELECT 1
FROM dbo.Destination
WHERE PersonID = s.PersonID
)

IF @@ROWCOUNT < @BatchSize BREAK END [/cc] With the above example, it is important to have at least a non clustered index on PersonID in both tables. Another way to transfer records is to use multiple threads. Specifying a range of records as such: [cc lang="sql"] INSERT INTO [dbo].[Destination] ( FirstName ,LastName ,EmailAddress ,PhoneNumber ) SELECT TOP(@BatchSize) s.FirstName ,s.LastName ,s.EmailAddress ,s.PhoneNumber FROM [dbo].[Source] s WHERE PersonID BETWEEN 1 and 5000 GO INSERT INTO [dbo].[Destination] ( FirstName ,LastName ,EmailAddress ,PhoneNumber ) SELECT TOP(@BatchSize) s.FirstName ,s.LastName ,s.EmailAddress ,s.PhoneNumber FROM [dbo].[Source] s WHERE PersonID BETWEEN 5001 and 10000 [/cc] For super fast performance however, I'd recommend using SSIS. Especially in SQL Server 2008. We recently transferred 17 million records in 5 minutes with an SSIS package executed on the same server as the two databases it transferred between. SQL Server 2008
SQL Server 2008 has made changes with regards to it’s logging mechanism when inserting records. Previously, to do an insert that was minimally logged, you would have to perform a SELECT.. INTO. Now, you can perform a minimally logged insert if you can lock the table you are inserting into. The example below shows an example of this. The exception to this rule is if you have a clustered index on the table AND the table is not empty. If the table is empty and you acquire a table lock and you have a clustered index, it will be minimally logged. However if you have data in the table, the insert will be logged. Now if you have a non clustered index on a heap and you acquire a table lock then only the non clustered index will be logged. It is always better to drop indexes prior to inserting records.

To determine the amount of logging you can use the following statement
[cc lang=”sql”]
SELECT * FROM ::fn_dblog(null, null)
[/cc]

Post a comment

10 comments

Qasim Sarwar 25 May 2017 at 10:05 am

Thanks allot For the Solution
Keep Helping the Humanity 🙂

Reply to this comment

Kevin Ojunta 07 Feb 2017 at 4:26 pm

I used your code to transfer about 500 million rows of data but I had null values in some of the fields on the new table that isn’t on the source table. Is there away to avoid this ? Thanks

Reply to this comment

Gerard 10 Oct 2016 at 1:51 pm

Thank you for the tip with BREAK on @@ROWCOUNT! Up to this date I was using only WHILE condition to catch the exit point. It’s safer, but not so expressive like usage of BREAK.

And what do you think about this approach?

DECLARE @BatchSize INT = 10000;
WHILE @BatchSize > 0 BEGIN
(…)
IF @@ROWCOUNT < @BatchSize BREAK;
END

When you test @BatchSize against zero, you have the same logical result for an infinite loop condition, BUT you get a bonus with the test for nonzero "constant" for size. "1 = 1" has no other purpose then "always say yes".

My test is a potential source of failure, if @BatchSize is not a constant, like in this examples. Then the trivial condition ("1 = 1") is the safer approach.

Reply to this comment

Jatinder Singh 19 Jan 2016 at 6:21 am

Guys, here is another quick way to copy huge data from one table to another in small chunks.

declare @rowCount int=1
while @rowCount != 0
begin
insert into destinationTable
select top (100) * from sourceTable st
where not exists(select 1 from destinationTable dt where st.Key=dt.key)
set @rowCount=@@rowcount
end

Reply to this comment

Adel 26 Oct 2014 at 6:16 pm

SSIS is great but not in case of selecting millions of data prior to inserting.. I like the first example but I have a question though concerning the line:
IF @@ROWCOUNT < @BatchSize BREAK
This will end up with some records bellow 10000 not inserted to destination table.. Am I right?
If so can I replace this line by the following:
IF @@ROWCOUNT = 0 BREAK

Reply to this comment

Abhijit 22 Jul 2015 at 11:47 am

What will be the syntax if two databases are on different machines

Reply to this comment

Jatinder Singh 19 Jan 2016 at 5:58 am

In this case, you need to use Link server and the syntax would be : LinkServer.dbName.dbo.yourTable

Reply to this comment

Ray 02 Nov 2016 at 10:19 pm

Actually, I think using SSIS for large dataset migrations makes the most sense. It’s good about breaking down millions of rows into manageable batch sizes, which can be tuned accordingly. It also makes cross-server migrations much easier.

Even right-clicking on the database and selecting Import/Export data (Which invokes an SSIS component) makes it even simpler. (Then you can save the package for later re-use.

Reply to this comment