How to Write Optimized Stored Procedures

Posted on April 15, 2010 by Derek Dieter
10
9

There are many factors to speeding up stored procedures. Knowing these in advance can save you and your company many hours of headaches. Using these practices when writing procedures, you can turn your SQL Server into a fine-tuned machine. This tutorial will cover the specifics to help you tune your procedures optimally.

Know your tables

The biggest way you can help SQL Server speed up your queries, is by helping it determine how to limit the result set as soon as possible. One of the biggest factors in this is in knowing the tables you are joining together.

This translates to:

How many records are in your tables?
What are the best ways to join your tables?
What are the existing indexes on the tables?

Knowing the number of records in a table is very important. When joining tables using inner joins, you always want to join the smallest table first. Why you ask? Say you do not use a WHERE clause and you simply join three tables together. If you start your FROM clause with the largest table, then the first table SQL sees is a very large table, so it sets up an execution plan for a very large result set. Then let’s say the last of the 3 tables only has 10 records, by the time SQL visits that table it’s too late, the execution plan has been set to output 100,000 records, whereas if you specified the first table with 10 records after the FROM clause, the execution plan may be entirely different. This first table is known as the driver table.

There is a disclaimer to this. The SQL optimizer maintains statistics for all the tables, and despite the order of your joins, SQL can and will reverse these to achieve the best plan. If this is the case you say, then why join smallest first? Because the optimzer can only be so smart. You need to take the guess work away from the optimizer. I’ve seen many instances where the query got so complex that the optimizer simply utilized the join order.

From now on, you should consider writing queries using a top-down approach. Eliminate the most records by joining the smallest tables first. Also do the same in the where clause. Use the most limiting where clause statements as soon as you can.

Utilize indexes

One of the simplest ways to become a hero in your organization is to add indexes. While this is very simple, it is often overlooked. This index tutorial will help.

Find common queries

If you have a large procedure, it may be likely you also have the same query being executed multiple times. If this is the case, then instead of performing the same query multiple times, dump the results into a temp table and reference the temp table in the rest of your queries. This is critical for speed. There have been many queries I have optimized in the past that run the same query a few times in the stored procedure.

Always go SET based

This means, never use while loops or cursors. This should actually be #1 on the list, however I’m hoping you already know this. There should be no reason to use loops or cursors. It has been proven time and time again that everything can be solved using SET based methods. SET based just means a simple select statement.

Do Not Use Scaler UDF’s

Scaler user defined functions make a query run as if they were in a loop. This defeats the entire purpose. SQL is not object oriented, do not try to make it so. It is the habit of a front-end developer to modularize and object orient procedures. Do not try this with SQL. Many will pay dearly. Granted, SQL may seem as if it is behind the times, however you cannot possibly make anything execute faster. No front-end process will compare to any result set processed by SQL. That said, you can use a scaler UDF when setting a single variable, just do not use it in a SELECT statement that returns a result set. There is a workaround here on my site if you need to encapsulate.

Use Exists instead of JOIN

If you are doing an Inner Join to a table just to eliminate a result set, and the table you are joining is not contributing any columns to the select list, then you are creating a Worktable in the background and using extra IO resources. Use exists instead. There is a caveat to this however. If you are doing a NOT EXISTS (anti-join) then the entire table needs to be scanned anyway. In a lot of cases it is better to do a LEFT JOIN WHERE NULL. The reason sometimes is that it moves the elimination of records higher up in the order of operations. Keep in mind if there is a many-to-1 relationship, you’d have to use DISTINCT otherwise your result set will return duplicates.
[cc lang=”sql”]
SELECT [DISTINCT] t1.column
FROM TABLE1 t1
LEFT JOIN TABLE2 T2 ON T2.table2_id = t1.table1_id
WHERE t2.table2_id IS NULL
[/cc]
VS
[cc lang=”sql”]
SELECT t1.column
FROM TABLE1 t1
WHERE NOT EXISTS (SELECT 1 FROM TABLE2 WHERE table2_id = t1.table1_id)
[/cc]

Avoid Distinct

The DISTINCT clause is basically a big group by. Well, it is slightly more optimal, however you get the point. Do not use this unless you can’t avoid it. If you do find yourself using it, then you might have a table design issue somewhere (or an opportunity).

Limit the Select list

Returning too many columns can have a drastic effect on your query. Not only will it increase you chances for bookmark lookups (or key lookups), but the network and disk latency add to the query. Not to mention you will be squeezing more data into your buffer cache.

Use the Least Amount of Tables to Compile Your Select List

An example would be let’s say you need to join on 2 tables in order to get your result set. If one of the tables contains all the fields needed for the select list, but you are also able to get the same field(s) from the other table, always go with only returning the values from the one table. Doing so will limit the number of IO operations necessary to give you your result.

Index large reporting temp tables

Temp tables are treated just like permanent tables according to SQL. They can have indexes & statistics. The only downfall is that they often cause recompiles for the statement when the result sets differ. To counter this read reducing temp table recompiles or use table variables if you have to.

Learn execution plans

Bad times in execution plan-ville include:

large number of output rows – dictated by thick arrows
table & clustered index scans
parallelism
Hash-Joins
Sorts
Key Lookups (bookmark lookups)
Table spool – eager spool (halloween spools)

Avoid poor performing techniques

CTE’s – Common table expressions
Instead of using CTE’s use temp tables where applicable. (Yeah I said it). They can perform badly. Although it is an apples and oranges comparison, there are times when they can be interchanged. CTE’s can create a convoluted execution plan whereas a temp tables separates the execution plans between statements. This is important to have so you do not get “bad hops” in execution compiles. Better to keep plans simple. CTE’s themselves provide no optimization advantage, they only make the code prettier. They do have a good use, and that is recursion.

Table Variables
Yeah I said this too. I know some developers will give me flack for this, however I have rarely seen a table variable perform well. Yes, if you have under 1000 rows, consider it. Otherwise, I suggest #temp tables. That doesn’t mean go crazy with it, always write as little as possible to tempdb and break the queries up only when necessary to squeeze big performance gains.

Post a comment

10 comments

Manjunatha 28 Apr 2016 at 2:39 pm

Thanks Derek Dieter for nice article… Thanks a lot

Reply to this comment

Juan Cardona 06 Sep 2015 at 7:57 pm

Good Article, Simple and well written, thank you for sharing.

Reply to this comment

Freidimar 29 Aug 2015 at 8:49 am

I have the list? Scabies Receivers nger (5000 +) from my music business. I need to bulk e-mails with one click Schaltfl? Che send, and I m? RIGHTS php.Savez sure of each e-mail mass maiinlg scripts make PHP? Thank you!

Reply to this comment

Andrew 21 Jan 2014 at 5:56 pm

Good article! Although Microsoft disagrees with you about the Table Variables at this link. http://support.microsoft.com/kb/305977/en-us

Reply to this comment

Derek Dieter 27 Jan 2014 at 7:04 pm

Thanks! I don’t see anything at that link that I didn’t already know or base my decision off of, and I don’t see any disagreement.

Reply to this comment

Subrat 21 Oct 2016 at 10:25 am

I am agree with that table variable should be used when number of records is less. For large number of records using of temp table decreases the execution time by 50%.
I have practically experienced that.

Reply to this comment

Darek 04 Jan 2013 at 5:27 pm

Well, I cannot really agree with the ‘use temp tables instead of CTEs’. CTE are the same as derived tables but can be referenced in many places in a query at the same time. Their usage is encouraged as they introduce and promote code modularization. By writing queries that use CTEs one achieves clarity (cannot stress that too much) and, nicely, there is no impact on the query execution time. It’s exactly as we’d use derived tables. For a good introduction to CTEs and explanation of why they are GOOD I’d recommend “Inside Microsoft SQL Server 2008: T-SQL Querying” by Itzik Ben-gan, Lubor Kollar, Dejan Sarka and Steve Kass.

Reply to this comment

Derek Dieter 05 Jan 2013 at 12:03 am

How does a temp table not provide modularization and reuse? Besides recursion, I do not see what a CTE provides better than a temp table. A temp table also has the advantage of being used more than once. CTE’s can only be used in the immediately following transaction.

On more occasions than not I have seen temp tables perform better because they are simply a table. They can contain statistics and indexes. A CTE does not. It also requires the optimizer to correlate the execution plan of the current query with the execution plan of the CTE making it more of a crap shoot. For me, keeping execution plans simple is the key to optimization.

Reply to this comment

anil kumar pawar 25 Jul 2011 at 7:06 pm

i want example theory is good but i want example

Reply to this comment

Anonymous 12 Jul 2011 at 6:37 am

Good Post but need more details with example for
Folowing point

Table Variables

Yeah I said this too. I know some developers will give me flack for this, however I have rarely seen a table variable perform well. Yes, if you have under 1000 rows, consider it. Otherwise, I suggest #temp tables.

Reply to this comment

Site Author

Thanks for visiting!

How to Write Optimized Stored Procedures

Know your tables

Utilize indexes

Find common queries

Always go SET based

Do Not Use Scaler UDF’s

Use Exists instead of JOIN

Avoid Distinct

Limit the Select list

Use the Least Amount of Tables to Compile Your Select List

Index large reporting temp tables

Learn execution plans

Avoid poor performing techniques

Featured Articles

Site Author

Resources