How to Optimize a Stored Procedure using the Execution Plan

Posted on September 10, 2010 by Derek Dieter
3
2

Finding the Slowest Query using the Execution Plan »

Find the most costly statements
Determine why the statement is costly
Get an accurate baseline for the procedure
Optimize

Find the Most Costly Statments

There are two main ways to determine costly queries. One is to execute the procedure including an Actual Execution Plan. The other is to get a time difference of a before and after for each statement. The are also other ways (including IO usage and CPU usage but we won’t cover those here).

I like to start with the execution plan method. Though this is not the most precise method, it is the easiest.. so let’s start there. Here is how to find the slowest query in a procedure using the execution plan.

Determine why the statement is costly

This can be them most difficult task. With the use of the execution plan, we can help deduce the most common issues.

Now that we’ve found the statement, we need to drill down even further to find out what is causing the statement to be slow. This is done by looking at the cost for each operation in the execution plan. Under each operation, you will see a “cost: x%”. You want to find the operation with the highest percentage and focus on that.

When you do find it, many times it will be one of the following operations:

Interpreting the Execution Plan
Symptom	Cause(s)	example
Table scan, index scan, clustered index scan	Missing or improper Indexes, cardinality < 5%
thick lines (arrows) from one operation to another	Bad Joins, missing filter operation
RID Lookups, Key Lookups	Data is not at the leaf level, can use include clause for indexes
Paralellism	Bad join, improper order of operations, maxdop or parallel threshold set too low on server
Compute Scalar	Joining on differing data types, implicit conversion required

Scans

Scan operations are not bad when the cardinality of the column being searched is less than approx 6%, or when the amount of records being scanned is below around 500 records. Otherwise, if you have a scan that has a high cost associated with it, you will want to find the appropriate indexes to fix this issue.

Large Work Tables / Results

This is the issue above that is depicted by thick arrows. When this happens a lot of times it is a bad order of operations. What that means is that the optimizer did not choose the limit the overall result set by joining the smallest tables first. Instead, it is trying to join and filter from the largest tables, and then join the smaller tables. While this may not be the only scenario this happens in, it does happen often.

To fix this, you may consider breaking the query up and dump results into a temp table prior joining on the rest of the tables. Make sure to index your temp table!

RID Lookups / Key Lookups

These are also not always bad. In fact you cannot get around this issue all the time. Why? Because you can only have one clustered index, and you don’t want to include every column in the table in all your non clustered indexes. However if you run into this issue having a high cost, you will want to consider changing your clustered index, or adding the columns being looked up in the Key Lookup using the INCLUDE statement for indexes.

Parallelism

Parallelism can have a significant impact on queries and your server. What this means is that the amount of data being usurped by your query is very large, so in order to speed it up, SQL Server thinks it would be best to create more SPIDs to handle the operation. When this happens look for your page life expectency to dip low and your disk utilization to go high. Consider employing the same technique as above of breaking up your query. Also look at the order of operations and change the query around. If that doesn’t help, add the query hint OPTION (MAXDOP 1). Like this:

[cc lang=”sql”]
SELECT *
FROM dbo.mytable mt
JOIN dbo.yada y
ON mt.ID = y.ID
OPTION (MAXDOP 1)
[/cc]

Compute Scalar
This guy may fool you when it comes to the cost factor, however it can be a big hindrance to performance when this operation is between table joins. The “scalar” part should worry you, because it does perform somewhat of a row-by-row operation. When this is between table joins, this operation is there to do an implicit conversion of a datatype in order to join the columns correctly. Since it cannot join the two tables directly it will often force too many records to be touched in order to be converted. The conversion may then be dumped into a worktable (in the background) then that worktable will be used for joins after the conversion. Just make sure to join like data types.

Get an Accurate Baseline

Next before you make changes, get an accurate baseline prior to making changes so when you do make changes, you can be sure that they sped things up.

When you execute a procedure, the time it takes to execute is in the lower right-hand corner as shown below.

Run Duration

From the above, we can see the problem with only using the client side timer. It does not show milliseconds, and milliseconds do matter. The other problem that we cannot see above with the client-side timer is that it also includes the round trip time to the server and back to your computer. While this may seem like a more accurate depiction, it makes optimizing harder because you get less consistent results. In my opinion, it’s better to get the server-side timings.

To get the server-side timings, we turn on the time statistics, using set statistics time command.

We enable it using this:

[cc lang=”sql”]
SET STATISTICS TIME ON
[/cc]

Once executed, it stays on for the entire session (or window) until you close the window or turn it off (by replacing ON with OFF). So you only need to execute this one time, then you can remove the statement.

The next step needed to get consistent results is to clear the caches. One cache holds the compiled version of the SQL Procedure (the execution plan), the other cache holds the data pages that are retrieved from disk.

To clear the compiled execution plan, we use:

[cc lang=”sql”]
DBCC FREEPROCCACHE
[/cc]

To clear the data pages, we use:

[cc lang=”sql”]
DBCC DROPCLEANBUFFERS
[/cc]

Optimize

Now optimize and apply the changes discovered above! I know, easier said than done sometimes. Even though we did not cover all situations for query slowness, hopefully this will provide a good launching pad. We will delve further into fixes in a later article.

Post a comment

3 comments

SQL Server – 如何使用Execution Plan 优化存储过程？ | EntLib.net 技术分享平台 07 May 2012 at 9:14 am

[…] 英文链接：How to Optimize a Stored Procedure using the Execution Plan […]

Reply to this comment

Khaliah 29 Aug 2015 at 6:11 am

2KEYWORDS:34Wayland is a nano display severr, relying on drm modesetting, gem5batchbuffer submission and hw initialization generally in the kernel.6Wayland puts the compositing manager and display severr in the same7process. Window management is largely pushed to the clients, they8draw their own decorations and move and resize themselves, typically9implemented in a toolkit library. More of the core desktop could be10pushed into wayland, for example, stock desktop components such as the11panel or the desktop background.1213The actual compositor will define a fair bit of desktop policy and it14is expected that different use cases (desktop environments, devices,15appliances) will provide their own custom compositor.1617It is still designed with a windowed type of desktop in mind, as18opposed to fullscreen-all-the-time type of interface, but should be19useful wherever several processes contribute content to be composited.2021Current trends goes towards less and less rendering in X severr, more22hardware setup and management in kernel and shared libraries allow23code sharing without putting it all in a severr. freetype,24fontconfig, cairo all point in this direction, as does direct25rendering mesa.2627Client allocates DRM buffers, draws decorations, and full window28contents and posts entire thing to severr along with dimensions.2930Everything is direct rendered and composited. No cliprects, no31drawing api/protocl between severr and client. No32pixmaps/windows/drawables, only surfaces (essentially pixmaps). No33gcs/fonts, no nested windows. OpenGL is already direct rendered,34pixman may be direct rendered which adds the cairo API, or cairo35may gain a GL backend.3637Could be a shell for launching gdm X severr, user session severrs,38safe mode xseverrs, graphics text console. From gdm, we could also39launch a rdp session, solid ice sessions.4041All surface commands (copy, attach, map=set quads) are buffered until42the client sends a commit command, which executes everything43atomically. The commit command includes a cookie, which will be44returned in an event generated by the severr once the commit has been45executed. This allows clients to throttle themselves against the46severr and implement smooth animations.474849ISSUES:5051Include panel and desktop background in wayland?5253How does clients move their surfaces? set a full tri-mesh every time?5455How does the severr apply transformations to a surface behind the56clients back? (wobbly, minimize, zoom) Maybe wobble is client side?5758How do apps share the glyph cache?5960Input handling keyboard focus, multiple input devices, multiple61pointers, multi touch.6263Drawing cursors, moving them, cursor themes, attaching surfaces to64cursors. How do you change cursors when you mouse over a text65field if you don’t have subwindows?6667synaptics, 3-button emulation, xkb, scim6869changing screen resolution, adding monitors.7071What to do when protocol out buffer fills up? Just block on write72would work I guess. Clients are supposed to throttle using the bread73crumb events, so we shouldn’t get into this situation.7475When a surface is the size of the screen and on top, we can set the76scanout buffer to that surface directly. Like compiz unredirect77top-level window feature. Except it won’t have any protocol state78side-effects and the client that owns the surface won’t know. We lose79control of updates. Should work well for X severr root window under80wayland.8182Throttling/scheduling there is currently no mechanism for scheduling83clients to prevent greedy clients from spamming the severr and84starving other clients. On the other hand, now that recompositing is85done in the idle handler (and eventually at vertical retrace time),86there’s nothing a client can do to hog the severr. Unless we include87a copyregion type request, to let a client update it’s surface88contents by asking the severr to atomically copy a region from some89other buffer to the surface buffer.9091Atomicity we have the map and the attach requests which sometimes92will have to be executed atomically. Moving the window is done using93the map request and will not involve an attach requet. Updating the94window contents will use an attach request but no map. Resizing,95however, will use both and in that case must be executed atomically.96One way to do this is to have the severr always batch up requests and97then introduce a kind of commit request, which will push the batched98changes into effect. This is easier than it sounds, since we only99have to remember the most recent map and most recent attach. The100commit request will generate an corresponding commit event once the101committed changes become visible on screen. The client can provide a102bread-crumb id in the commit request, which will be sent back in the103commit event.104105- is batching+commit per client or per surface? Much more convenient106if per-client, since a client can batch up a bunch of stuff and get107atomic updates to multiple windows. Also nice to only get one108commit event for changes to a bunch of windows. Is a little more109tricky severr-side, since we now have to keep a list of windows110with pending changes in the wl_client struct.111112- batching+commit also lets a client reuse parts of the surface113buffer without allocating a new full-size back buffer. For114scrolling, for example, the client can render just the newly115exposed part of the page to a smaller temporary buffer, then issue116a copy request to copy the preserved part of the page up, and the117new part of the page into the exposed area.118119- This does let a client batch up an uncontrolled amount of copy120requests that the severr has to execute when it gets the commit121request. This could potentially lock up the severr for a while,122leading to lost frames. Should never cause tearing though, we’re123changing the surface contents, not the severr back buffer which is124what is scheduled for blitting at vsync time.125126127RMI128129The wayland protocol is a async object oriented protocol. All130requests are method invocations on some object. The request include131an object id that uniquely identifies an object on the severr. Each132object implements an interface and the requests include an opcode that133identifies which method in the interface to invoke.134135The severr sends back events to the client, each event is emitted from136an object. Events can be error conditions. The event includes the137object id and the event opcode, from which the client can determine138the type of event. Events are generated both in repsonse to a request139(in which case the request and the event constitutes a round trip) or140spontanously when the severr state changes.141142the get_interface method is called on an object to get an object143handle that implements the specified interface.

Reply to this comment