Skip to content
 

Choosing the Best Clustered Index

One of the most common database performance issue most companies face is the improper choice of indexes or the lack of indexes. Most essential, is the clustered index. A well-chosen clustered index will save CPU, Memory, and IO. Choosing the clustered index for a table goes hand-in-hand with the writing of the queries for a table. It is best to design the queries and the index simultaneously to keep consistency. The majority of the time this is not an option. When we have not written the queries, the best way to understand the optimal clustered index is to study the queries that are run against the table.

Once you understand the queries that run against a table, it is helpful to think of each of these queries broken down into three parts:

  • The select list – Be on the lookout for the queries that return the most columns from the table.
  • The join columns – The most common columns used to join other tables against the table.
  • The where clause – The most common columns used to filter the result sets.

Generally speaking, the join columns and the columns used in the where clause are the most appropriate candidates for the clustered index. However the question still arises as to which columns to use. The short answer is to use the columns that get queried more frequently, and if both are queried an equal number of times, use the query that returns the largest select list. The majority of times this will yield the best results.

Go here for a more comprehensive tutorial on indexes. If you are not clear on indexes, I highly recommend studying the topic.

CLUSTERED INDEX DMV

Luckily, dynamic management views (DMVs) were introduced in the new versions of SQL Server that store meta data about the queries themselves.

One of the most useful DMVs are the sys.indexes and sys.dm_db_index_usage_stats. Using these two DMVs alone can help optimize a SQL Server enough to take it from poor performance to exceptional performance. Using these views, we can differentiate what is likely the biggest mistake made in indexing. Improper clustered indexes.

The clustered index is the actual data of the table, sorted according to the columns that are selected to be the clustered index. The columns that are selected to be the clustered index, should generally be the most commonly queried fields in the table. Once the rows used in a WHERE clause or a JOIN are appropriately isolated in the query, the actual data that is inline to the row typically needs to be gathered for the SELECT list. If the rows are isolated using a non-clustered index, and the non-clustered index does not contain the appropriate columns for the select list, a lookup to the clustered index occurs. This lookup itself takes a fair amount of resources. It is much better to be able to isolate the rows using the clustered index because once the rows are isolated, the data is already inline because the clustered index is the data.

The following query helps determine improper clustered indexes by comparing the number of seeks the clustered index receives with the number of seeks the most used nonclustered index receives. It then compares the number of seeks the most used nonclustered index receives with the number of lookups the clustered index receives. If this particular nonclustered index is responsible for more that 50% of the lookups to the clustered index, it is a very likely candidate to be the more appropriate clustered index.

DECLARE @NonClusteredSeekPct float
DECLARE @ClusteredLookupFromNCPct float

-- Define percentage of usage the non clustered should
-- receive over the clustered index
SET @NonClusteredSeekPct = 1.50 -- 150%

-- Define the percentage of all lookups on the clustered index
-- should be executed by this non clustered index
SET @ClusteredLookupFromNCPct = .75 -- 75%

SELECT
	TableName					= object_name(idx.object_id)
	,NonUsefulClusteredIndex    = idx.NAME
	,ShouldBeClustered			= nc.NonClusteredName
	,Clustered_User_Seeks		= c.user_seeks
	,NonClustered_User_Seeks	= nc.user_seeks
	,Clustered_User_Lookups		= c.user_lookups
	,DatabaseName				= db_name(c.database_id)
FROM sys.indexes idx
LEFT JOIN sys.dm_db_index_usage_stats c
ON idx.object_id = c.object_id
AND idx.index_id = c.index_id
--AND c.database_id = @DBID
JOIN (
	SELECT
		idx.object_id
		,nonclusteredname = idx.NAME
		,ius.user_seeks
	FROM sys.indexes idx
	JOIN sys.dm_db_index_usage_stats ius
	ON idx.object_id = ius.object_id
		AND idx.index_id = ius.index_id
	WHERE idx.type_desc = 'nonclustered'
	AND ius.user_seeks =
	(
		SELECT MAX(user_seeks)
		FROM sys.dm_db_index_usage_stats
		WHERE object_id = ius.object_id
		AND type_desc = 'nonclustered'
	)
	GROUP BY
		idx.object_id
		,idx.NAME
		,ius.user_seeks
) nc
ON nc.object_id = idx.object_id
WHERE
	idx.type_desc IN ('clustered','heap')
-- non clustered user seeks outweigh clustered by 150%
AND nc.user_seeks > (c.user_seeks * @NonClusteredSeekPct)
-- nc index usage is primary cause of clustered lookups 80%
AND nc.user_seeks >= (c.user_lookups * @ClusteredLookupFromNCPct)
ORDER BY nc.user_seeks DESC


Popular search terms:

One Comment

  1. kumar says:

    my query is
    retrieving records from 3 tables,
    having 5 where caluse conditions
    and 2 equi joins.
    now how many indexes should be created in database?

post a comment OR Post Your Question on our ASK! Community!