How to Write Optimized Stored Procedures
-
Posted on April 15, 2010 by Derek Dieter
-
9
There are many factors to speeding up stored procedures. Knowing these in advance can save you and your company many hours of headaches. Using these practices when writing procedures, you can turn your SQL Server into a fine-tuned machine. This tutorial will cover the specifics to help you tune your procedures optimally.
Know your tables
The biggest way you can help SQL Server speed up your queries, is by helping it determine how to limit the result set as soon as possible. One of the biggest factors in this is in knowing the tables you are joining together.
This translates to:
- How many records are in your tables?
- What are the best ways to join your tables?
- What are the existing indexes on the tables?
Knowing the number of records in a table is very important. When joining tables using inner joins, you always want to join the smallest table first. Why you ask? Say you do not use a WHERE clause and you simply join three tables together. If you start your FROM clause with the largest table, then the first table SQL sees is a very large table, so it sets up an execution plan for a very large result set. Then let’s say the last of the 3 tables only has 10 records, by the time SQL visits that table it’s too late, the execution plan has been set to output 100,000 records, whereas if you specified the first table with 10 records after the FROM clause, the execution plan may be entirely different. This first table is known as the driver table.
There is a disclaimer to this. The SQL optimizer maintains statistics for all the tables, and despite the order of your joins, SQL can and will reverse these to achieve the best plan. If this is the case you say, then why join smallest first? Because the optimzer can only be so smart. You need to take the guess work away from the optimizer. I’ve seen many instances where the query got so complex that the optimizer simply utilized the join order.
From now on, you should consider writing queries using a top-down approach. Eliminate the most records by joining the smallest tables first. Also do the same in the where clause. Use the most limiting where clause statements as soon as you can.
Utilize indexes
One of the simplest ways to become a hero in your organization is to add indexes. While this is very simple, it is often overlooked. This index tutorial will help.
Find common queries
If you have a large procedure, it may be likely you also have the same query being executed multiple times. If this is the case, then instead of performing the same query multiple times, dump the results into a temp table and reference the temp table in the rest of your queries. This is critical for speed. There have been many queries I have optimized in the past that run the same query a few times in the stored procedure.
Always go SET based
This means, never use while loops or cursors. This should actually be #1 on the list, however I’m hoping you already know this. There should be no reason to use loops or cursors. It has been proven time and time again that everything can be solved using SET based methods. SET based just means a simple select statement.
Do Not Use Scaler UDF’s
Scaler user defined functions make a query run as if they were in a loop. This defeats the entire purpose. SQL is not object oriented, do not try to make it so. It is the habit of a front-end developer to modularize and object orient procedures. Do not try this with SQL. Many will pay dearly. Granted, SQL may seem as if it is behind the times, however you cannot possibly make anything execute faster. No front-end process will compare to any result set processed by SQL. That said, you can use a scaler UDF when setting a single variable, just do not use it in a SELECT statement that returns a result set. There is a workaround here on my site if you need to encapsulate.
Use Exists instead of JOIN
If you are doing an Inner Join to a table just to eliminate a result set, and the table you are joining is not contributing any columns to the select list, then you are creating a Worktable in the background and using extra IO resources. Use exists instead. There is a caveat to this however. If you are doing a NOT EXISTS (anti-join) then the entire table needs to be scanned anyway. In a lot of cases it is better to do a LEFT JOIN WHERE NULL. The reason sometimes is that it moves the elimination of records higher up in the order of operations. Keep in mind if there is a many-to-1 relationship, you’d have to use DISTINCT otherwise your result set will return duplicates.
[cc lang=”sql”]
SELECT [DISTINCT] t1.column
FROM TABLE1 t1
LEFT JOIN TABLE2 T2 ON T2.table2_id = t1.table1_id
WHERE t2.table2_id IS NULL
[/cc]
VS
[cc lang=”sql”]
SELECT t1.column
FROM TABLE1 t1
WHERE NOT EXISTS (SELECT 1 FROM TABLE2 WHERE table2_id = t1.table1_id)
[/cc]
Avoid Distinct
The DISTINCT clause is basically a big group by. Well, it is slightly more optimal, however you get the point. Do not use this unless you can’t avoid it. If you do find yourself using it, then you might have a table design issue somewhere (or an opportunity).
Limit the Select list
Returning too many columns can have a drastic effect on your query. Not only will it increase you chances for bookmark lookups (or key lookups), but the network and disk latency add to the query. Not to mention you will be squeezing more data into your buffer cache.
Use the Least Amount of Tables to Compile Your Select List
An example would be let’s say you need to join on 2 tables in order to get your result set. If one of the tables contains all the fields needed for the select list, but you are also able to get the same field(s) from the other table, always go with only returning the values from the one table. Doing so will limit the number of IO operations necessary to give you your result.
Index large reporting temp tables
Temp tables are treated just like permanent tables according to SQL. They can have indexes & statistics. The only downfall is that they often cause recompiles for the statement when the result sets differ. To counter this read reducing temp table recompiles or use table variables if you have to.
Learn execution plans
Bad times in execution plan-ville include:
- large number of output rows – dictated by thick arrows
- table & clustered index scans
- parallelism
- Hash-Joins
- Sorts
- Key Lookups (bookmark lookups)
- Table spool – eager spool (halloween spools)
Avoid poor performing techniques
CTE’s – Common table expressions
Instead of using CTE’s use temp tables where applicable. (Yeah I said it). They can perform badly. Although it is an apples and oranges comparison, there are times when they can be interchanged. CTE’s can create a convoluted execution plan whereas a temp tables separates the execution plans between statements. This is important to have so you do not get “bad hops” in execution compiles. Better to keep plans simple. CTE’s themselves provide no optimization advantage, they only make the code prettier. They do have a good use, and that is recursion.
Table Variables
Yeah I said this too. I know some developers will give me flack for this, however I have rarely seen a table variable perform well. Yes, if you have under 1000 rows, consider it. Otherwise, I suggest #temp tables. That doesn’t mean go crazy with it, always write as little as possible to tempdb and break the queries up only when necessary to squeeze big performance gains.
- Comments (RSS)
- Trackback
- Permalink
I have the list? Scabies Receivers nger (5000 +) from my music business. I need to bulk e-mails with one click Schaltfl? Che send, and I m? RIGHTS php.Savez sure of each e-mail mass maiinlg scripts make PHP? Thank you!
Good article! Although Microsoft disagrees with you about the Table Variables at this link. http://support.microsoft.com/kb/305977/en-us
Well, I cannot really agree with the ‘use temp tables instead of CTEs’. CTE are the same as derived tables but can be referenced in many places in a query at the same time. Their usage is encouraged as they introduce and promote code modularization. By writing queries that use CTEs one achieves clarity (cannot stress that too much) and, nicely, there is no impact on the query execution time. It’s exactly as we’d use derived tables. For a good introduction to CTEs and explanation of why they are GOOD I’d recommend “Inside Microsoft SQL Server 2008: T-SQL Querying” by Itzik Ben-gan, Lubor Kollar, Dejan Sarka and Steve Kass.
How does a temp table not provide modularization and reuse? Besides recursion, I do not see what a CTE provides better than a temp table. A temp table also has the advantage of being used more than once. CTE’s can only be used in the immediately following transaction.
On more occasions than not I have seen temp tables perform better because they are simply a table. They can contain statistics and indexes. A CTE does not. It also requires the optimizer to correlate the execution plan of the current query with the execution plan of the CTE making it more of a crap shoot. For me, keeping execution plans simple is the key to optimization.
Good Post but need more details with example for
Folowing point
Table Variables
Yeah I said this too. I know some developers will give me flack for this, however I have rarely seen a table variable perform well. Yes, if you have under 1000 rows, consider it. Otherwise, I suggest #temp tables.