One common requirement in SQL when inserting data between tables is to match the new identity column with the old identity column. The most common solution to this problem is to perform a cursor and do the inserts one at a time. While this may work, it is not very efficient because set based operations are the bread and butter of SQL. Anytime looping comes into play, excessive resource consumption also occurs. There are a couple solutions to this, both use the OUTPUT clause introduced in SQL 2005. The first solution uses an intermediary reference table to hold the old and new values. This is done by outputting the inserted identity columns while sorting the source identity columns. You then join back on the source identity columns to get both old and newly created identities. This solution was created by my good SQL developer friend Mike Newell. Method 1 [cc lang=”sql”] — Drop temp tables if they already exist IF OBJECT_ID(‘TempDB..#source’, ‘U’) IS NOT NULL DROP TABLE #source; IF OBJECT_ID(‘TempDB..#target’, ‘U’) IS NOT NULL DROP TABLE #target; IF OBJECT_ID(‘TempDB..#xref’, ‘U’) IS NOT NULL DROP TABLE #xref; GO — Create the temp tables CREATE TABLE #source (id INT PRIMARY KEY IDENTITY(1, 1), data INT); CREATE TABLE #target (id INT PRIMARY KEY IDENTITY(1, 1), data INT); CREATE TABLE #xref (row INT PRIMARY KEY IDENTITY(1, 1), source_id INT, target_id INT); GO — If xref table is being reused, make sure to clear data and reseed by truncating. TRUNCATE TABLE #xref; — Fill source […]
Continue reading ...
The OUTPUT clause in SQL Server 2008 was probably one of the most functional T-SQL enhancements added. I personally don’t use it enough because I often forget about it, however I have used it to overcome some serious deadlock incidents. It basically works in conjunction with INSERT, UPDATE, or DELETE. In my opinion it will probably be most utilized in an update statement, however I’m sure there are many scenarios I may be forgetting. The way it works is while performing one of these statements, you have the option to output any data within the rows that are being INSERTED UPDATED or DELETED — into another table. This is important because it allows only a single pass through on the table. Whereas before when performing an update to a table, in order to get in-row data you would have to perform a separate select. This could especially become a problem if the update and select needed to be within a transaction. But the output clause is transactional by nature because it occurs within a single statement. Let’s look at an example: [cc lang=”sql”] DECLARE @customer_id int = 1234; DECLARE @customer table ( first_name varchar(50), last_name varchar(50), phone_number varchar(50), visit_count int; ); UPDATE c SET customer_visit_count += 1 OUTPUT DELETED.first_name, DELETED.last_name, DELETED.phone_number, INSERTED.customer_visit_count INTO @customer FROM [customer] c WHERE c.customer_id = @customer_id; SELECT first_name, last_name, phone_number, visit_count FROM @customer; [/cc] This allows a single query (and one transaction) to be used in order to update and select from a table. The […]
Continue reading ...
There is definitely a lack of awareness in the SQL world regarding the use of user defined functions and the potential performance hit they can have when using within your queries. Don’t get me wrong, I would love nothing more than to be able to centralize my commonly used code into functions for reuse. In a lot of cases this is possible, however there are specific cases where this can cause a huge performance impact. The Problem The one thing we need to be aware of with SQL is that its efficiency lies in the fact that it deals with data in SETS. Meaning that its power does not come in performing row-by-row operations, rather it wants to retrieve chunks of data and manipulate them as recordsets. Keeping this in mind, you can look out for scenarios where certain operations will cause more of a row-by-row operation and therefore impact performance. The most common no no, is the use of scalar functions within a set based operation. It seems (but I can’t prove) that SQL 2008 has actually made some great strides in being able to deal with these situations, however there will always be a negative impact. First, let’s look at a common scenario. The Test First, let’s deploy this scalar user defined function which calculates the End of month for a given date: [cc lang=”sql”] CREATE FUNCTION [dbo].[ufn_GetLastDayOfMonth] ( @pInputDate DATETIME ) RETURNS DATETIME BEGIN DECLARE @vOutputDate DATETIME SET @vOutputDate = CAST(YEAR(@pInputDate) AS VARCHAR(4)) + ‘/’ + CAST(MONTH(@pInputDate) […]
Continue reading ...
Anytime we are dividing we need to think of the divide by zero scenario. Even if you think you will never encounter this with your result set, it’s advisable to guard against it because when divide by zero is encountered, an error is thrown. The best method I’ve found to overcome this is by using the NULLIF function. This function takes two parameters and if they are equal, a NULL value is returned. Lets take a look at an example that throws a divide by zero error. [cc lang=”sql”] DECLARE @iter float; DECLARE @num float SET @num = 10; SET @iter = 5; WHILE @iter > -5 BEGIN SELECT @num / @iter SET @iter = @iter – 1 END [/cc] Running the following query, we see that once the variable @iter becomes zero, we receive an error. So the most elegant way to overcome this is to use NULLIF function and compare @iter to zero. When it does equal zero, it will instead change it to a null. And when dividing anything by NULL will equal a NULL. [cc lang=”sql”] DECLARE @iter float; DECLARE @num float; SET @num = 10; SET @iter = 5; WHILE @iter > -5 BEGIN SELECT @num / NULLIF(@iter,0); SET @iter = @iter – 1; END [/cc] This executes without error, however we still receive a null as a result. If you need otherwise, then you may want to wrap the equation in an ISNULL, to return a different value. [cc lang=”sql”] DECLARE @iter float; DECLARE […]
Continue reading ...
This is an invaluable trick that can be sometimes be used instead of the BETWEEN operator. A common place this can be used is in an historical table that contains EffectiveFrom & EffectiveTo dates. In these kinds of tables, many historical versions of a record can be stored. Each record in this scenario will contain a Start & End Date which signifies the time span when a record is active. Because two records cannot be active at the same time, we can be sure that the Start & End dates will never overlap. While this describes one particular scenario for this example, there are many other situations this method can be used. Let’s look at the typical query we would use to find the record which is active for an employee at a specific point in time. [cc lang=”sql”] SELECT * FROM [HumanResources].[EmployeeDepartmentHistory] WHERE EmployeeID = 274 AND ‘2001-04-28’ BETWEEN StartDate AND EndDate [/cc] This query will return to us the single record that is active as of ‘2001-04-28’. However if you think about the constraints our table contains, we actually only need to query one column (the StartDate) in order to isolate this record. This is because the StartDate will not overlap any other StartDate & EndDate for this employee. Here’s an example: [cc lang=”sql”] SELECT TOP 1 * FROM [HumanResources].[EmployeeDepartmentHistory] WHERE EmployeeID = 274 AND StartDate >= ‘2001-04-28’ ORDER BY StartDate [/cc] Because only one record can be active for an employee at a single point in time, […]
Continue reading ...