Comparing EXISTS vs LEFT JOIN WHERE NOT NULL
-
Posted on December 26, 2012 by Derek Dieter
-
1
Two of the most common ways to check for the existence of records is to use a WHERE EXISTS or to perform a LEFT JOIN WHERE values are NOT NULL. My personal choice is usually to use the EXISTS method where possible. Call me a prude, but I think just by virtue of EXISTS being the keyword for this method, it’s a safer and more optimization friendly method. If someone asked me how to check for existence I would tell them to use exists. However I do think there are many more considerations that need to be outlined before making a final decision.
Let’s look at examples of the two methods to illustrate:
[cc lang=”sql”]
— using EXISTS to check for existence
SELECT P.ProductID
FROM Production.Product p
WHERE EXISTS
(
SELECT 1
FROM Sales.SalesOrderDetail sod
WHERE sod.ProductID = p.ProductID
)
— Using LEFT JOIN to check for existence
GO
SELECT DISTINCT P.ProductID
FROM Production.Product p
LEFT JOIN Sales.SalesOrderDetail sod
ON sod.ProductID = p.ProductID
WHERE sod.SalesOrderDetailID IS NOT NULL
[/cc]
Since these examples are straightforward, the optimizer realizes the end result we are looking for and gives us the same execution plan for both.
However we don’t always want to rely on the optimizer’s brains. This is because as the execution plan gets more complex (by adding additional tables), it’s less likely to make good suggestions. This brings up a separate point.
Anatomy of LEFT JOIN WHERE NOT NULL
While there is a chance both methods can take the same execution path, it’s not always likely. When the optimizer sees tables being joined together (as in LEFT JOIN) it anticipates that columns will also need to be returned from these tables. To help facilitate this, it will often create a worktable to help with the join. This worktable is an intermediate step to help with further joins. This worktable may be a good choice in some cases we have to keep in mind it contributes to increased IO because it’s writing to TempDB. This is where EXISTS may present a better option.
Anatomy of Exists
When the optimizer sees an EXISTS, it does not anticipate a join of two different tables – so it will often not create a worktable. The reason for this is because the very nature of EXISTS is to return a true or false, therefore, it only needs to use the EXISTS table as a filter. Now this does not mean that EXISTS will always outperform the LEFT JOIN method. The times I’ve seen the LEFT JOIN NOT NULL method beat EXISTS have usually been when too many exists were used in a single statement. This makes sense from a logical standpoint because at a certain threshold a worktable provides the benefit of an intermediary step, rather than having a single complex filter.
The other benefit of EXISTS is that it may not need to scan the entire table in order to return it’s result. Let’s say we are looking for the existence of 1 product in our SalesOrderDetail table. In the best case scenario, this product could be in the first row. If it is, we only need to find the single case where it does exist in order to be finished with our query. Even if the product is listed thousands of time in our table, we only need the first instance. This is not true with the LEFT JOIN WHERE NOT NULL. In this case, we are telling the optimizer to return us back all the joined records. Thankfully it is sometimes smart enough to know that’s not what we really want.
Checking for No Existence (Anti Join)
This gets a bit trickier. If we do a NOT EXISTS, then do you think it’s still true that we need to find the first instance of a product to be finished with our query? Nope, we literally need to touch EVERY row in order to find the rows to return. This is sometimes where I find LEFT JOIN WHERE NULL a better option. This can often be a more efficient query so you’ll have to test it out on your own.
If both operations need to touch all the rows anyway, why use LEFT JOIN WHERE NULL instead of NOT EXISTS? Well, I’m going to give two reasons and you’ll have to take my word for one of them if you don’t already agree.
- As the query grows in complexity, NOT EXISTS can end up confusing the optimizer. You’ll have to take my word for that because I don’t have an example.
- Since you are checking every row in this table, we may as well join it because we might need a column from it later.
If you need a column with an EXISTS table you’ll end up needing to rewrite the query or add the same table being referenced in the exists as a JOIN against the outer table. One negative to note, is that if the two tables are not 1-1 then you will end up having to do a DISTINCT. That does suck, but if you can keep your result set small, it still may be worth it.
Here’s an example:
[cc lang=”sql”]
SELECT p.ProductID
FROM Production.Product p
WHERE NOT EXISTS
(
SELECT 1
FROM Sales.SalesOrderDetail sod
WHERE sod.ProductID = p.ProductID
)
GO
SELECT DISTINCT p.ProductID
FROM Production.Product p
LEFT JOIN Sales.SalesOrderDetail sod
ON sod.ProductID = p.ProductID
WHERE sod.SalesOrderDetailID IS NULL
[/cc]
So here we are hitting plenty of data pages regardless of the method. Only when the plans get more complex have I seen issues occur.
Alternate Methods
There are alternate methods of performing these operations. The first is to use IN or NOT IN. In my experience, these are exactly the same (execution wise) as the EXISTS. The difference is using NOT IN with NULL values can yield some unexpected results. The other option is to use the SET based EXCEPT. This can be tricky however because it functions as an ANTI UNION. Both the result sets in each operation need to match, which is not always ideal.
Conclusion
The conclusion really is not black or white. You will have to test for yourself which is quicker for your own scenario. Since there are many factors including table size, relationship, carnality, result set size, I would suggest you investigate the pros and cons of each. It is generally true however that when checking for mere existence, exists is faster. And when performing an anti-join, the LEFT JOIN method can be safer.
- Comments (RSS)
- Trackback
- Permalink
There is one other thing which should be mentioned in the Exists VS. Left Join and that is that if there are multiple matches in the table on the right side of the join, it will return more records than Exists, because left join will return a match for each row and exists stops at the first match.