Sometimes slow queries can be rectified by changing the query around a bit. One of these examples can be illustrated when multiple values are compared within a WHERE clause using an OR or IN statement. Often times, OR can cause a scan against an index or table which may not be the preferable execution plan in terms of IO consumption, or overall query speed.
A lot of variables come into play when the query optimizer creates an execution plan. These variables include a multitude of hardware specs, instance settings, database settings, statistics (table, index, auto-generated), and also the way the query is written. The one we are changing here is the way the query is written. As unsuspecting as it may seem, even though two different queries can return the exact same results, the path at which they take can be entirely different just based upon the format of the query.
UNION vs OR
In most of my experience with SQL Server, the OR is generally less efficient than a UNION. What tends to happen with an OR is it causes a scan more often. Now this sometimes may actually be a better path to take in some cases and I’ll leave that to a separate article but overall I’ve found that touching more records is the biggest cause of slowness. So let’s start our comparison.
Here is our OR statement:
WHERE ProductID = 750 OR ProductID = 953
From this execution plan we see that we are performing a scan against 121,000 rows. (You can’t see the row count, but it is).
Now let’s perform the same query, but written using a UNION instead of OR:
WHERE ProductID = 750
SELECT [SalesOrderID], *
WHERE ProductID = 953
Here we see two branches of operations. One branch is touching 358 rows, while the other branch is 346 rows. Both branches meet to perform the concatenation operation bringing both result sets together. We have two separate seeks, but we also have a key lookup in order to retrieve the SELECT list we need. This wasn’t necessary for the scan operation because we were touching all rows in the scan operation anyway so the data was retrieved at the time of the scan, rather than afterwards. This has to do with the index and the rows we need rather than UNION vs OR. I will say however that the select list is also a factor in the choice of a seek vs scan but we’ll ignore that in this post.
Why UNION causes more seeks instead of scans is because each operation needs to meet a certain selectivity requirement in order to qualify for a seek. (Selectivity is the uniqueness of the particular column being filtered). OR’s happen in a single operation so when the selectivity for each column is combined and it goes over a certain percentage then a scan is deemed more efficient.
Since a UNION by default performs a separate operation for each statement, the selectivity of each column is not combined giving it a greater chance of performing a seek. Now since the UNION performs two operations, they need to match their result sets using a concatenation operation above. Generally this is not an expensive operation.
It should also be noted that the OR clause operates just like an IN statement.
Hopefully this tip helps out. I find it to be pretty valuable when working with systems that require high concurrency.