SQL Server Join Algorithms

Posted on February 20, 2011 by Derek Dieter
6
4

The object name ‘x’ contains more than the maximum number of prefixes. The maximum is 2. »

If you read execution plans enough, you’ve probably realized that when SQL Server joins tables together, it uses different internal algorithms. The three algorithms are:

Loop Join
Merge Join
Hash Join

These alogorithms that are used are based upon factors of the underlying data.

Merge Join

For the most part, this is the most efficient method of joining tables. As the name implies, both tables are essentially merged together, much like a zipper being zipped. It typically occurs when both tables that are being joined, are joined on keys that are presorted and contain all the same keys in both tables (for example joining a primary key with a foreign key). When one table contains keys that the other table does not have, the chance of the merge join being used is less likely.

The physical profile of a merge join is very little CPU usage, and very little reads compared to other types of joins.

Forcing a MERGE join hint is not recommended as it can error out if the requirements are not met

Loop Join

The loop join is more CPU intensive than a merge join. This join typically occurs when one worktable is quite a bit smaller than the other. As the word loop implies, the smaller table being joined is looped until it finds the matching key in the outer (larger) table. This join is most efficient when the resulting output is smaller than 5000 rows. When larger than that, the CPU and reads make the join less efficient.

Hash Join

A hash join is the least efficient of all joins, however that does not mean it is not necessary. When large tables are being joined and not all the common keys being joined exist in both sides and/or they are not sorted, this join type may be chosen. As the word Hash implies, a hash is created against all rows for the common key in each table, essentially breaking the table into approximately 7 buckets. After the table has been broken into these separate buckets, the buckets themselves are joined together resulting in a smaller comparison of a result set.

Within the Hash Join, there are 3 separate types of hash joins (In memory, Grace, and recursive). These hash joins are segragated according to how much memory is required to create them. Essentially the hash joins follow the same methodology as lock escalation. When a hash join is taking too much memory, then the next highest type of hash join is escalated to and pushes the buckets to disk.

Using Query hints for Join types

Increasingly, I do not recommend using query hints for SQL Server. With SQL Server 2000 I was more apt to use query hints. However, as the optimizer gets smarter, and the more you upgrade, you never know what the query optimizer is going to choose for the query or why it is choosing it. That being said, yes you can specify query hints for join types. To do so, see the following queries:

[cc lang=”sql”]
SELECT *
FROM Sales.SalesOrderDetail sod
INNER MERGE JOIN Sales.SalesOrderHeader soh
ON sod.SalesOrderID = soh.SalesOrderID
[/cc]

[cc lang=”sql”]
SELECT *
FROM Sales.SalesOrderDetail sod
INNER LOOP JOIN Sales.SalesOrderHeader soh
ON sod.SalesOrderID = soh.SalesOrderID
[/cc]

[cc lang=”sql”]
SELECT *
FROM Sales.SalesOrderDetail sod
INNER HASH JOIN Sales.SalesOrderHeader soh
ON sod.SalesOrderID = soh.SalesOrderID
[/cc]

Post a comment

6 comments

Hoang 22 Jul 2014 at 12:41 am

Hi,
Good article – short and precise
one point though: for loop join, isn’t that the small table is the one that will be used for outer loop and the bigger table the inner table?

Reply to this comment

Garland 17 Jan 2014 at 4:41 am

Your means of explaining the whole thing in this piece of writing is actually good, all be capable of easily be aware of it, Thanks a lot.

Reply to this comment

Martin R 05 Mar 2013 at 6:19 pm

I loved your article, very concise!. But let me add something, and please correct me if I am wrong. MERGE JOIN requires at least one equijoin predicate and sorted inputs from both sides, this means that the merge join should be supported by indexes on both tables involved in the join operation. This is a very important limitation. Cheers

Reply to this comment

Derek Dieter 05 Mar 2013 at 6:39 pm

Hey Martin,

That is the way I understand it. I think we said the same thing in different ways. For the record, I never force MERGE join hints because they can easily error out if those requirements are not met. If however you can setup the two inputs with a likelihood they will utilize a merge join, that makes for a good scenario.

Reply to this comment

T C Lonyem 19 Nov 2012 at 6:23 pm

Hi Derek – This post is nothing short of amazing. I’ve been trying to get a query return in less than 5 seconds which normally executes for 3 mins 45 seconds. Having tried all options I could possibly think of, what the ‘inner merge join’ query hint you explained above did pull the trick for me. The query now executes in exactly one second. I’m flabbergasted. Thanks so much.

Reply to this comment

Derek Dieter 19 Nov 2012 at 9:50 pm

Great T C,

Glad that helped. These join types in my experience have the greatest influence on the overall execution plan generated. I’ve found lately that the queries that benefit from hinting are typically also join a lot of different tables and are complicated to the optimizer. I’m trying more to create short queries that may dump results to table vars, then join the table vars.

Reply to this comment

Site Author

Thanks for visiting!

SQL Server Join Algorithms

Merge Join

Loop Join

Hash Join

Using Query hints for Join types

Featured Articles

Site Author

Resources