Comparing EXISTS vs LEFT JOIN WHERE NOT NULL

Two of the most common ways to check for the existence of records is to use a WHERE EXISTS or to perform a LEFT JOIN WHERE values are NOT NULL. My personal choice is usually to use the EXISTS method where possible. Call me a prude, but I think just by virtue of EXISTS being the keyword for this method, it’s a safer and more optimization friendly method. If someone asked me how to check for existence I would tell them to use exists. However I do think there are many more considerations that need to be outlined before making a final decision. Let’s look at examples of the two methods to illustrate: [cc lang=”sql”] — using EXISTS to check for existence SELECT P.ProductID FROM Production.Product p WHERE EXISTS ( SELECT 1 FROM Sales.SalesOrderDetail sod WHERE sod.ProductID = p.ProductID ) — Using LEFT JOIN to check for existence GO SELECT DISTINCT P.ProductID FROM Production.Product p LEFT JOIN Sales.SalesOrderDetail sod ON sod.ProductID = p.ProductID WHERE sod.SalesOrderDetailID IS NOT NULL [/cc] Since these examples are straightforward, the optimizer realizes the end result we are looking for and gives us the same execution plan for both. However we don’t always want to rely on the optimizer’s brains. This is because as the execution plan gets more complex (by adding additional tables), it’s less likely to make good suggestions. This brings up a separate point. Anatomy of LEFT JOIN WHERE NOT NULL While there is a chance both methods can take the same execution […] Continue reading ...

Yet Another Temp Tables Vs Table Variables Article

The debate whether to use temp tables or table variables is an old debate that goes back since they were first introduced. The real answer to knowing the difference lies in what is going on under the hood and correlating those specifics to your situation. In this article we’ll touch on (hopefully all) the differences between the two. #Temp tables are just regular SQL tables that are defined and stored in TempDB. The only difference between them and a permanent table is they are not allowed to have foreign keys. You can view the temp tables currently defined via SSMS by going to TempDB and Expanding Temp Tables. Let’s look at a matrix of specific differences then we’ll touch on a few of the key differences below. Item #Temp Tables @Table Variables Can participate in a transaction Writes to Log File Writes only to memory (not disk) Can Qualify for Parallelism Allows creation of statistics Does not affect recompilation Allows nonclustered indexes Allows clustered indexes Can perform SELECT INTO Can access in nested stored procedures Can define globally Can use in user defined functions Can insert from EXEC Allows TRUNCATE Allows ALTER TABLE Scope #Temp tables can be either #locally defined (within the scope of the session) or ##globally defined (within a database). When they are locally defined, the table can be used by whatever is executed within that session (or SPID). In other words, if you define a #temp table in procedure A, then procedure A calls procedure B, […]

The Difference Between UNION and UNION ALL

If you’re like me, after using UNION for many years you may have stumbled upon the UNION ALL and thought, what the heck is the difference? Well it’s pretty simple. UNION performs a distinct and gives you the unique results for all UNION’ed queries, whereas UNION ALL does not perform a distinct and gives you all rows regardless of whether they are duplicate. The performance difference is that UNION ALL can be much faster since distinct is an expensive operation. Here is a simple example detailing the difference. We’ll populate a single temp table with one record, then we’ll do both a UNION, and a UNION ALL to see the results. [cc lang=”sql”] — create the table and populate sample data SELECT name = ‘derek’ ,detail = ‘this is record 1’ INTO #faba — perform the UNION on the same table SELECT * FROM #faba UNION SELECT * FROM #faba — perform the UNION ALL on the same table SELECT * FROM #faba UNION ALL SELECT * FROM #faba [/cc] From the results we see below, the UNION only returns one record. The execution plans are identical except for the distinct operation being performed as the last operation before output. From this example we see how much utilization distinct is consuming relational to the other operations. (It’s a lot, however this is a rather simple operation) Long story short, if you know you’re two separate result sets contain distinct values, it’s best to use UNION ALL. If for some reason […] Continue reading ...

Using WITH (NOLOCK)

The WITH (nolock) hint is an explicit command directed at a specific table or view used to set the transaction isolation level against the table or tables within a view for a query. Once issued, locks will not be used against the data within the table. The advantage to this is there is no chance a deadlock will occur against any other queries running against the table. The other indirect advantage is that less memory will be used in order to hold locks against that data. Example: [cc lang=”sql”] SELECT first_name, last_name, FROM dbo.person p WITH (NOLOCK) JOIN dbo.employee e WITH (NOLOCK) ON e.person_id = p.person_id WHERE p.person_id = 1; [/cc] The nolock setting above is explicit for the table it is being set against. To set the value globally for the scope of the connection, see SET TRANSACTION ISOLATION LEVEL Advantages: Deadlocks will not occur against other queries running against the same data Less memory is utilized due to the lack of row, page, or range level locking Typically allows for much higher concurrency due to lower footprint Disadvantages: Uncommitted data can be read leading to dirty reads Explicit hints against a table are generally bad practice Usage In most places I have worked, with (nolock) has been a generally accepted practice in the specific areas of the system that are not sensitive to data being slightly out of sync. It is important to know where things could go wrong though. The biggest red flag I can think of […]

Detecting Row Level Changes Using HASHBYTES

A common situation in a data warehouse is the requirement to detect changes in data in order to track what rows need to be imported. The traditional method of comparing the values of each field is performance intensive. Luckily there are other methods to quickly track the changes that involve creating hashes (or a fingerprint) of a particular data row. In using this method, if we want to synchronize two separate tables, we can simply join on the primary key and only compare this fingerprint column in order to determine what has changed. There are two major methods I’ve used to create a row valued hash key. The first is by using the CHECKSUM function. The other is to use the HASHBYTES function. Both of these function return back a single value representing a hash, however their parameters differ. With CHECKSUM you can pass in a list of columns to evaluate and it returns an integer value. Whereas HASHBYTES requires a single parameter to be passed in and returns back a 16-bit binary value. The trick to forcing HASHBYTES into accepting multiple column values is to use the FOR XML function which will generate a single value to pass in. The obvious difference between the two functions is the size and datatype of the hash being returned. To make a long story short, there are rare occasions (that I have witnessed more than once) where passing in different column values into CHECKSUM will return back the exact same value. Granted […] Continue reading ...

Featured Articles

 Site Author

  • Thanks for visiting!
css.php