1) FIND THE CULPRITS
Like other software’s, we need to understand that MS SQL server is also a computer program but a complex program. Here Microsoft written this complex program. So if there is any problem with the SQL server, we need to understand why this program is not running as we expected.
From SQL Server we need to pull and push data as fast & accurate as possible. If we face any issues, reasons may be
- SQL Server (complex program) needs certain hardware and installation settings which we are not providing properly.
- The way SQL Server implemented and the way it understands T-SQL code, we are not providing proper T-SQL code to it
Even though MS SQL Server is a proprietary software, they provided us a lot of ways to understand the Server and what’s going on inside so that we can use it efficiently.
If the SQL server is running without errors, first we need to calculate wait statistics of different threads. SQL server uses threads for every user request. Again a thread is nothing but another program inside complex program which is called SQL server (This thread is not Operating system thread on which SQL server installed. This is related to SQLOS thread which is a pseudo operating system for the SQL Server).
We can find wait statistics using “sys.dm_os_wait_stats” DMV. There are many scripts online to query this view as per your needs. I like Paul Randal script(http://www.sqlskills.com/blogs/paul/wait-statistics-or-please-tell-me-where-it-hurts)
[wait_time_ms] / 1000.0 AS [WaitS],
([wait_time_ms] - [signal_wait_time_ms]) / 1000.0 AS [ResourceS],
[signal_wait_time_ms] / 1000.0 AS [SignalS],
[waiting_tasks_count] AS [WaitCount],
100.0 * [wait_time_ms] / SUM ([wait_time_ms]) OVER() AS [Percentage],
ROW_NUMBER() OVER(ORDER BY [wait_time_ms] DESC) AS [RowNum]
WHERE [wait_type] NOT IN (
AND [waiting_tasks_count] > 0
MAX ([W1].[wait_type]) AS [WaitType],
CAST (MAX ([W1].[WaitS]) AS DECIMAL (16,2)) AS [Wait_S],
CAST (MAX ([W1].[ResourceS]) AS DECIMAL (16,2)) AS [Resource_S],
CAST (MAX ([W1].[SignalS]) AS DECIMAL (16,2)) AS [Signal_S],
MAX ([W1].[WaitCount]) AS [WaitCount],
CAST (MAX ([W1].[Percentage]) AS DECIMAL (5,2)) AS [Percentage],
CAST ((MAX ([W1].[WaitS]) / MAX ([W1].[WaitCount])) AS DECIMAL (16,4)) AS [AvgWait_S],
CAST ((MAX ([W1].[ResourceS]) / MAX ([W1].[WaitCount])) AS DECIMAL (16,4)) AS [AvgRes_S],
CAST ((MAX ([W1].[SignalS]) / MAX ([W1].[WaitCount])) AS DECIMAL (16,4)) AS [AvgSig_S]
FROM [Waits] AS [W1]
INNER JOIN [Waits] AS [W2]
ON [W2].[RowNum] <= [W1].[RowNum]
GROUP BY [W1].[RowNum]
HAVING SUM ([W2].[Percentage]) - MAX ([W1].[Percentage]) < 95; -- percentage threshold
<span style="line-height: 1.5">
Please concentrate on top most rows first. They represent the maximum wait type. Also please read about different wait types on your top 10 rows online. You need to understand them perfectly so that you will take a good decision.
If you have too much “PAGEIOLATCH_XX” that means a thread is waiting for data page reads from the disk into buffer (which is nothing but memory). We need to be very careful in understanding this scenario. It does not mean we have poor IO subsystem or less memory. You can solve this problem by increasing IO subsystem & memory temporarily. But to find out permanent solution we need see why we are reading so much data from the disk? What type of SQL commands are behind this? Are we unnecessarily reading too much data instead of reading less data using filters like where clause? Is too much data read happening because of table scans or index scans? Can we convert them to index seeks by implementing or modifying existing indexes? Are we writing SQL queries which are misunderstood by SQL optimizer (another program inside complex SQL server program)?
We need to think in different angles and use different test cases to come up with a final solution. Each of the above wait type needs a different solution. Any DBA need to research them thoroughly before taking any action. But most of the time 60 to 70% of the problems will be solved by finding problematic T-SQL queries and tuning them.
2) FINDING PROBLEMATIC QUERIES
You can use following query to find out top 20 worst performing queries:
total_worker_time/execution_count AS Avg_CPU_Time
,total_elapsed_time/execution_count as AVG_Run_Time
) FROM sys.dm_exec_sql_text(sql_handle)
) AS Query_Text
ORDER BY Avg_CPU_Time DESC
What we need to observe here is even though a query can have maximum avg run time, but if it runs only once then the total effect on the server is low comparing to a query which has medium avg run time & runs lots of times in a day.
3) FINE TUNING QUERIES
Fine tuning T-SQL queries is a very big concept. Fundamental thing we need to understand is how well we can write T-SQL queries and implement indexes etc… So that SQL optimizer (a program inside MSSQL program) will find an optimized plan to do what we wanted it to do. Every new release of a SQL Server, we will get more sophisticated optimizer so that it will cover our mistakes in writing optimized SQL queries and also any bugs related to previous optimizer program. Like any human being how intelligent he may be, if we can’t communicate properly to him then he will misunderstand us. Same goes with computer programs.
MSSQL Server uses advanced search & sorting algorithms. If you are good at search and sorting algorithms then most of the time you can easily guess why SQL Server is taking particular action. The best book for understanding such algorithms is The Art of Computer Programming (Donald Knuth).
When we examine queries which are need to be fine-tuned, we need to use Execution plan of those queries so that we can find out how SQL server is interpreting them.
I can’t cover all the aspects of execution plan here but on a basic level I can explain the things we need to consider in execution plan.
- First we need to find out which are the operators taking most of the query cost
- If the operator is taking more cost then we need to see the reason for it. Most of the time scans will represent more cost than seeks. We need to examine why particular scan is happening (table scan or index scan) instead of index seek. We can solve this problem by implementing proper indexes on table columns but as with any complex computer program like SQL Server there is no fixed solution. For example if the table is small then scans are faster than seeks.
- There are approximately 78 operators which represent various actions and decisions of the SQL Server execution plan. We need to study them in-depth so that we can understand them better to take proper action.
Even if we implement proper indexes on tables & write good T-SQL code, if the execution plan is not reused then it creates performance problems.
4) EXECUTION PLAN RE-USE
After fine tuning the queries, we need to make sure that execution plan will be reused when necessary. Most of the CPU time will be spent on calculating execution plan which can be eliminated if we re-use the plan.
You can use below query to find out execution plan re-use:
, DB_NAME([est].[dbid]) AS [db_name]
, [est].[text] as [query_ext]
FROM sys.dm_exec_cached_plans ecp
CROSS APPLY sys.dm_exec_sql_text ( ecp.plan_handle ) est
CROSS APPLY sys.dm_exec_query_plan ( ecp.plan_handle ) eqp
Here “usecounts” represents how many times plan reused.
The best way to re- use execution plan is by implementing parameterized stored procedures. When we are not in a position to implement stored procedures we can use “sp_executesql”.
But as with any complex computer program there is no fixed solution. Sometimes it is better to compile the plan again.
- select name from table where name = ‘sri’;
- select name from table where name = ‘pal’;
Let us assume we have non-clustered index on “name” column and half of the table has value ‘sri’ and few rows have “pal” in the “name” column. So for the 1st query SQL Server uses Table Scan because half of the table has same values. But for the 2nd query it is better to use Index Scan because only few rows have ‘pal’ value.
So even though queries are similar, same execution plan may not be good. Most of the time it may not be the case. So we need to carefully analyse everything before we decide whether it is better to use same execution plan or not.
If we don’t want to re-use execution plan we can always use “recompile” option in stored procedures.
Even after using stored procedures or “sp_executesql” there are times when execution plan won’t be re-used. They are
- When indexes used by the query changes or dropped
- When statistics, structure or schema of a table used by the query changes
- When we use “recompile” option
- When large number of insertions, updates or deletes happen
- When we mix DDL & DML within single query etc…
5) REMOVING UNNECESSARY INDEXES
After fine tuning the queries, we need to check how we are using indexes over all in SQL Server. Index maintenance requires lots of CPU & I/O. It is better to remove indexes if they are not used at all because every time we insert data into SQL Server it needs to update indexes also.
SQL server provides us “dm_db_index_usage_stats” DMV to find index statistics. You can use below query to find out index usage statistics:
OBJECT_NAME(IUS.[OBJECT_ID]) AS [OBJECT NAME],
DB_NAME(IUS.database_id) AS [DATABASE NAME],
I.[NAME] AS [INDEX NAME],
FROM SYS.DM_DB_INDEX_USAGE_STATS AS IUS
INNER JOIN SYS.INDEXES AS I
ON I.[OBJECT_ID] = IUS.[OBJECT_ID]
AND I.INDEX_ID = IUS.INDEX_ID
6) MS SQL SERVER INSTALLATION & DATABASE SETUP
Please keep data and log files separately. Main reason for this is writing & accessing data files are not sequential, where as to log files it is sequential. So if we put them on the same drive then we can’t use them in an optimized way.
Please note that when you purchase SAN from vendor they may give you some recommendations on how to setup but it is not always useful. Please have a detailed discussion with your hardware & networking guys so that you can keep data and log files separately in an optimized way.
7) DON’T OVERLOAD SQL SERVER
The primary duty of any DBA is to make sure Production Server runs smoothly and serves the customers as best as possible. To make this happen we need to maintain separate databases (if possible separate machines) for the following environments:
For Production we need a database with Full recovery mode & for other works Simple recovery mode is enough.
If we do testing on Production database then it will put lots of load on transaction log, indexes, CPU & I/O. That’s why we need to use separate databases for production, development, testing & analytical.
If possible please use separate machines for these activities because it will decrease load on CPU & I/O.
8) TRANSACTION LOG, TEMPDB & MEMORY
Please make sure your Log file has enough free space for normal operations because auto-grow operation on log file is time-consuming and it can force other operations to wait until it is completed.
You can use “DBCC SQLPERF(logspace)” to find out Log file size for each database and how much it is used.
The best way to setup tempdb is to put it on separate disk. Please keep initial size as max as you can afford because when it reaches auto grow situation performance decreases.
We need to make sure that SQL server runs on separate machine and there won’t be any other application on that machine. We need to keep some memory for operating system & some more if it is part of cluster etc… . So most cases keep around 2 GB for the OS and rest will be for the SQL Server.
Final conclusion: The above suggestions are for the basic performance tuning only. To do advanced performance tuning for the SQL Server, we need to dig deeper into each of the above areas.