SQL Server Hints Explained

Posted on August 6, 2015 by Derek Dieter
1
-1

« What to do when your SQL CPU is at 100%

Coronavirus (Covid19) Victims Explain First Symptoms »

The news is out. The consensus on using hints is “Do not use them”. They will ruin you. But much like the 1950’s people keep their dirty laundry to themselves and outwardly show their best “non hint” selves, this applies as well. With that said, I will go ahead and use the disclaimer that you SHOULD NOT use hints. But, if you ARE like me, and touch a lot of SQL code where there is a huge difference in site performance between a 1 second execution and 20ms execution, then sometimes you don’t have much of an option.

With that said, I will talk about the optimizer. It’s getting smarter and smarter at dealing with the majority of queries and keeping them within an even keel baseline. But that comes at a cost. It can’t take as many chances so it plays it safe. And this is where you come in. But before you go around adding hints everywhere in your system (which is a horrible idea). You need to know how to write code so that it will not NEED hints. To do this is very simple. The answer is KEEP IT SIMPLE. In other words, keep the queries simple. Do not create huge SQL Statements. It’s much better to break the queries up so that you do not give the optimizer many different routes to take. Breaking them up typically entails materializing the data into temp tables (write as little data as possible) so you can break up large queries into multiple small queries.

With that said, let’s get onto the hints. I’m not going to go over all of them, just the ones I use the most.

LOOP JOIN

This hint works in two ways. Often times when you are joining two tables, one table will be much smaller and IF the optimizer chose to join using this table first, you would have a very fast query. The inner loop join, changes the join order forcing these two tables to FIRST limit the result set by joining on each other first, AND it does so using a SEEK. The important thing to note about this technique, is that both of the join columns need to be indexed. If you try a loop join on two non indexed columns you will be in trouble.

To do an inner loop join, do the following:

[cc lang=”sql”]
SELECT *
FROM table_a a
INNER LOOP JOIN table_b b ON b.id = a.id;
[/cc]

Test the outcome before you do it. I usually only do this as an emergency measure, or a way to squeeze another couple hundred ms out of a highly called proc. I have never seen INNER LOOP JOIN error out. The optimizer has always been able to create an execution plan.

INDEX HINT

This actually is probably the most common. For whatever reason, the optimizer decides to use the wrong index. Most of the time it’s because stats changed or possibly a parameter sniffing issue on the last compile. This may be fine a lot of the time, but if it’s a critical process, it won’t fly, and you don’t want to be woken up at 2am.

[cc lang=”sql”]
SELECT *
FROM table_a a WITH (INDEX(ix_someindex))
WHERE a.id > 50;
[/cc]

INNER HASH JOIN

This join is the most efficient if you do not have indexes on the join columns. It saves time because not only does the optimizer not have to consider which type of join to use, but since you know the join columns have no indexes (or they are small tables) there is no reason to do a loop join. A hash join operates by create buckets of each side of the join columns, matching the buckets, then sorting each bucket to join the columns. That’s at least how I understand it. I have not seen the optimizer fail by using a hash join hint.

[cc lang=”sql”]
SELECT *
FROM table_a a
INNER HASH JOIN table_b b ON b.id = a.id;
[/cc]

INNER MERGE JOIN

This is a hint I never touch. The reason is because the optimizer CAN error out when trying to create a plan when this hint is applied. This is because each column must be sorted before the join. However, a merge join is the most efficient in both CPU and IO. A really cool trick I learned from an Adam Machanic post is below (he advocated not to do this as well). In theory this probably would not error out, but I would still not suggest doing it in production. However this would join things VERY quickly in a lot of cases. Again don’t do this

[cc lang=”sql”]
SELECT *
FROM table_a a
INNER MERGE JOIN
(
SELECT TOP (1000000000) — really large unreasonable number
FROM table_b b
ORDER BY b.id
) b ON b.id = a.id;
[/cc]

MAXDOP

This is one of the most common ones I use. The reason is because sometimes there will be a process that is invoked from the website, that really does not benefit from parallelism, or the parallelism ends up eating a bunch of CPU. Or on the flip side, a process that consumes a lot of data, needs a bit of a boost. A setting of 1, turns parallelism off. A setting higher than that, basically assigns the same number of spids to the number you assign.

[cc lang=”sql”]
— turn parallelism off
SELECT *
FROM table_a a
INNER JOIN table_b b ON b.id = a.id
OPTION (MAXDOP 1);

— turn it on to use 4 processors
SELECT *
FROM table_a a
INNER JOIN table_b b ON b.id = a.id
OPTION (MAXDOP 4);
[/cc]

FORCEORDER

I do not advocate using this one. What it’s supposed to do, never really works in my opinion. Forceorder should process the order of events from top to bottom. Utilizing the driver table (first table you specify) as being the lead table to limit the result set. For me, this simply does not work. The last time it worked for me was in SQL 2000. A better way to force the order of your query, is to force the order of events using virtual tables.

[cc lang=”sql”]
— this will in most cases, force the processing of table_a first
SELECT *
FROM (
SELECT a.id
FROM
(
SELECT a.id
FROM table_a
WHERE a.id = (1,4)
) t1
) t2
INNER JOIN table_b b ON b.id = t2.id;
[/cc]

OPTION (RECOMPILE)

This is another very common one. It usually gets placed on the big fat query you can’t touch (or don’t want to break up). It gets a bad hop on an execution plan one too many times and you say fock it, I’m going to throw this on so the bad hops don’t happen. Hey.. I’m just being real. But having to use this one too many times typically means your queries are overly complicated. You can probably benefit from breaking them up.

[cc lang=”sql”]
SELECT *
FROM table_a a
INNER JOIN table_b b ON b.id = a.id
INNER JOIN fat_table_c ON c.id = b.id
INNER JOIN fatter_table_d ON d.id = c.id
INNER JOIN fat_who_created_this_e e ON e.id = d.id
OPTION (RECOMPILE);
[/cc]
That’s all folks.

Post a comment

One comment

Avi Jaman 24 Jun 2016 at 6:08 pm

Thanks for this, and your site in general! It’s rare to find informational sources in SQL Server, or development in general for that matter, that speak in plain English. Kudos!

Reply to this comment

Site Author

Thanks for visiting!