Understanding SQL Server Deadlocks
-
Posted on August 30, 2011 by Derek Dieter
-
1
In order to fix deadlocks, it’s critical to understand why they occur. I’ve gotten some push back sometimes from readers when I write articles that do not give cookie cutter solutions, but the “teach a man to fish” adage couldn’t be more true for deadlocks.
All of it revolves around locking, which is a central piece to SQL Server’s default transaction isolation level of read-committed. Now what is read committed? Some people are critical of this isolation level because with a different isolation level like Snapshot, it is much less likely that deadlocks occur because each transaction utilizes it’s own version of what a table looks like at the very time the transaction begins. There are trade-offs with shapshot however. Snapshot isolation is much more disk intensive because these “snapshots” need to be persisted in a temporary data store. So if you are ever considering enabling snapshot isolation as your default isolation level for SQL Server you will want to make sure you have a beefy RAID 10 tempdb storage.
So with read-committed we have less tempdb data swapping, but more locking. There are many different kinds of locks and many different objects that can get locked. Each of these types of locks may or may not be compatible with other types of locks. Locks are acquired so that users have a consistent view of what the data looks like at the time they are querying it. Imagine user 1 begins updating a table to inactivate all the records while user 2 queries this table in the middle of the operation. Without locks, user 2 may see half of the records inactivated, while the other half is activated. With locks, User 2 would have to wait in line until the update is completed. Once completed user 2 will only ever see the result of a completed update rather than a half completed update.
A deadlock happens when two transactions each grab one resource, then the second resource each transaction needs, is the first resource that each of them grabbed. To understand this you can think of the game twister. Each person represents a transaction and each transaction needs a certain color circle to finish their goal. But if the only color each of them can reach is a spot that each of them already occupy, then they are deadlocked.
Troubleshooting
Four things come to mind when deadlocks occur.
- Footprint
- Coincidence
- Transaction Order
- Transaction Isolation
In that order. By far the biggest offender I’ve seen on most systems is the footprint issue. By footprint I mean the amount of locks and duration those locks are held by a query. Some queries can acquire a lot of locks yet be done in a short period of time. Most slow queries have a large footprint by means of both locks and duration.
Coincidence is the frequency in which the colliding queries are run. This is also crucial. You can have colliding queries that have a relatively low footprint, however because they run so frequently, the chances of them colliding are much greater. In these cases your best bet is to try and decrease the footprint. If you can’t do that then you may need to look at a design change. Maybe the the procedure can run as a scheduled job rather than being user-invoked.
Transaction order applies if you utilize explicit transactions (BEGIN TRAN.. END TRAN). I try to avoid explicit transactions as much as possible for this reason. Let’s say you have two separate transactions that both update and select records from the same table. If transaction 1 updates before it selects, and transaction 2 selects before it updates, then the chances of these two transactions colliding is greatly increased. You always want to perform the same operations against a table in the same order.
Transaction Isolation is important because it is set at a connection level. It is possible to have rogue application connections come into the database and request a very restrictive isolation level (say serializable). Once an isolation level like this is set, no two types of locks will be compatible, greatly increasing the chances for a deadlock.
What to Do
Now that we know that footprint is the biggest cause of deadlocks, how do we fix it? By minimizing the number of locks and the duration each of these locks is held. To do that, first we need to identify the queries involved in the deadlocks. The best way to do this is to turn on trace flags. When the deadlock trace flag is turned on, then a server-side trace is run which constantly monitors for deadlocks. When a deadlock is encountered, then the specifics of the deadlock are output to the SQL Server Error Log. Depending on which trace flag you implement, it will show you different things. My favorite trace flag is 1222, which was introduced in SQL 2005. It shows the two queries that were involved in the deadlock. The other trace flag (1204) will not show as much information however it is still useful. The following article will show the differences between the two trace flags.
The most typical things you need from the output is:
- The queries involved
- The time they happened
- The objects the queries held
We can get into all the specifics regarding the output of the trace flag, however let me tell you this. 95% of the time it comes down to footprint and coincidence. And about 90% of that time this footprint and coincidence comes down to optimizing the queries shown in the deadlock trace. Many times, you can simply add a nolock to SELECT statements that are causing excessive blocking. The majority of other times, you can typically add an index. But it can get much deeper than that unfortunately. For these cases you will want to run the query and get the actual execution plan. NOT the estimated. The estimated is not a real representation, it’s only what the optimizer thinks it will do based on the statistics.
With this execution plan, the biggest thing you are looking for is large thick arrows for output.
These thick arrows represent a lot of rows being locked. Do not be fooled by index seeks. I have seen index seeks that output hundreds of thousands of rows. Always look for the thick arrows. This will cover you in the footprint category. Now for the coincidence category, if you can decrease the number of rows being locked, then you will likely also decrease the coincidence.
Deadlocks at a certain time of day
I’ve also seen deadlocks on queries that run all day long just fine, however at a certain time of day, they always deadlock. In my experience this is because intense processes were running at this time. When that happened, all the data was flushed from the buffer cache and SQL had to go to the disk in order to retrieve the data. When this happened, the footprint was increased because the time it took to read all the data. The solution is to make sure the page life expectancy does not fall low by either optimizing the queries run at that time, or increasing the RAM (but don’t tell your DBA I said that :))
An Outlier
I did recently fix a strange deadlock occurring on a SQL 2008 R2 instance. What was strange is that INSERTs were deadlocking on each other. And the INSERTs were not the INSERT..SELECT type, they were the INSERT..VALUES type. In other words, they had a very small footprint. At the time I could only think of two scenarios that was causing this, lock escalation or that these inserts were waiting behind something blocking, then when freed up they collided. The deadlock graph confused things even more. The partition it showed being held by an X lock was partition: 0, however the table did have a clustered index and was not a heap. Further investigation showed that the partition it was referencing was actually part of a new type of partition introduced in 2005 used for a feature called “lock partitioning“. The problem turned out to be that one of the indexes did not have the options of “use row locks” or “use page locks” selected. Because of this, the insert (which also needs to update the indexes) had to utilize a table lock. When it did, the X lock (from two separate inserts) against the lock partitions collided because they seemingly did not lock the partitions in the same order every time. (Which I wonder if this is a bug). The fix for the deadlock was simply to enable the page & row locks on the index.
Additional Resources
Need Additional Guidance? SQL Server 2008 Query Performance Tuning Distilled by Sajal Dam & Grant Fritchey has a chapter dedicated to deadlock causes and resolutions.
- Comments (RSS)
- Trackback
- Permalink