Using WHERE EXISTS

As you may know, the WHERE EXISTS clause is used to constrain records from an outer query with records from an inner query. What does that mean? It’s just another way to say “it limits the result set”. And in SQL, especially in optimization purposes, limiting the result set is the name of the game.

USAGE
There are basically 6 ways to constrain data. The relationship between the tables and the performance impact will determine which method you need.

  • EXISTS
  • IN
  • JOIN
  • CROSS APPLY
  • LEFT JOIN with WHERE
  • OUTER APPLY with WHERE

When you need to limit a result set between two tables that have a 1-1 relationship, it makes little difference which method you use. However if you have a 1-many or many-many relationship you are forced to use either EXISTS or IN because a join will match all records that match forcing you to do the dreaded DISTINCT. Overall, the exists is probably the most performant because it is handles limiting result sets for one-to-many and many-to-many tables most efficiently. This is because it performs a boolean operation where it finds the first instance of existence and returns saying “YES THIS DOES EXIST”.

Now, that’s not to say that EXISTS is always more efficient. There are instances I’ve found where a LEFT JOIN WHERE IS NULL is more efficient than using NOT EXISTS but that’s a different article. Also, if there is a chance down the road you will need columns from the table you are constraining against and it’s a 1-1 relationship, I suggest you just join the table. This makes it easier so you won’t have to rewrite your query down the road.

With all that said, let’s look at an example of EXISTS:

SELECT COUNT(1)
FROM sales.SalesOrderDetail sod
WHERE EXISTS
(
    SELECT 1
    FROM production.Product
    WHERE ProductID = sod.ProductID  -- first correlate inner and outer query
    AND name LIKE '%mountain%'
)

The equivelent method using join is:

SELECT COUNT(1)
FROM sales.SalesOrderDetail sod
JOIN production.Product p
    ON p.ProductID = sod.ProductID
WHERE p.name LIKE '%mountain%'

Overall, I try to avoid the ‘IN’ clause because it bombs out when null values come into play.

PERFORMANCE
As of SQL 2005 and later, the optimizer basically treats EXISTS & JOIN the same way for a 1-1 table relationship. This is not the case however with 1-many or many-many. With SQL 2000 though I do remember seeing a difference between the execution plans. A JOIN would cause a worktable to be created in the background, whereas exists would not.

2 comments
Anuj Desai 28 Jan 2013 at 9:33 am

How can we replace the query containing the multiple joins and tables with the WHERE EXISTS clause?
Plz explain by adding one more table and a join condition in the above example ans the alternative EXISTS query

Anuj Desai 28 Jan 2013 at 9:28 am

This was a very useful information. Thanks buddy..

Featured Articles

 Site Author

  • Thanks for visiting!