SQL Server Indexes Tutorial

One of the important parts of SQL Server development and optimization is the creation of indexes. In order to create proper indexing strategies it is necessary to understand how indexes work. This tutorial will guide you step by step to understand some index basics.

There are only two different types of indexes. Clustered and NonClustered. There can only be one clustered index on a table and the reason is simple:

  • A Clustered index is the data of table sorted according to the columns you choose.
  • A NonClustered index is just like the index of a book. It contains data sorted so that it’s easy to find, then once found, it points back to the actual page that contains the data. (In other words, it points back to the clustered index)

Suppose we are reading a book about biographical information of all the U.S. Presidents, and the book itself orders the biographies starting from the first president to the latest president. This ordering of the pages would represent the clustered index.

Now suppose you asked two different people to find Franklin D. Roosevelt’s biography. Person-A was a historian and Person-B was unschooled. Person-A would quickly be able to find the presidents biography while Person-B would have to scan through each page in order to find the biography. Even if the Person-B used the book’s index (akin to the non-clustered index), he would still have to search for the page after he found the page number.

So it is always faster to find information off of the clustered index because the data in already at the “leaf-level” off the index.

The clustered index should be a key that does not get modified. It should also ideally be sequential so that the underlying data pages do not become fragmented [more information]

With this information, how do we determine what the clustered index should be? Well, it depends on the population of the people searching for the biographies. If it is mostly unschooled people, then it would be more efficient to sort the book alphabetically rather than the historical order of the presidents.

Now let’s say that 75% of the population are historians and the other 25% are unschooled. Let’s assume the data the historians will need consists of a lot of different information regarding the president’s biographies, while all the unschooled need is the president’s age at the time they took office. In this scenario, it is more plausible to keep the ordering of the book (or the clustered index) based on the order of the president, then simply add the age of the president in the back index of the book (the non clustered index). That way the unschooled people do not have to look into the front of the book (clustered index) for the president’s age. They could simply find it in the back of the book by doing one single lookup. The presidents age being stored in the rear index would be considered at the “leaf” level. This would satisfy both requirements and would be efficient for both historians and the unschooled group.

First let’s create our president’s table download and run: PresidentsTable

After running, let’s turn on the execution plan (In SQL Server Managment Studio place your mouse in the query window and select Query -> Include Actual Execution Plan)

Now execute the following query:

SELECT
    PresidentNumber
    ,President
    ,YearsInOffice
    ,YearFirstInaugurated
FROM Presidents
WHERE PresidentNumber = 32

Now let’s view the execution plan:
tablescanexecutionplan
Without a clustered index, our book is in no particular order. To find president 32, we need to scan every page.

Now let’s add a clustered index so we can organize our book according to PresidentNumber:

CREATE UNIQUE CLUSTERED INDEX IDX_C_Presidents_PresidentNumber ON Presidents(PresidentNumber)

And let’s run our query again:

SELECT
    PresidentNumber
    ,President
    ,YearsInOffice
    ,YearFirstInaugurated
FROM Presidents
WHERE PresidentNumber = 32

tableseekexecutionplan
Our execution plan now shows a “clustered index seek”. Meaning we did not have to look through every page of our book. We jumped right to page 32 and found the information on our president there.

In summary, when we look up information based on the clustered index (the way the table is physically sorted), we naturally find all the information we are looking for already there (President, YearsInOffice, YearFirstInaugurated).

Now, let’s create a non clustered index and look up the YearFirstInaugurated by president’s name:

CREATE NONCLUSTERED INDEX IDX_NC_Presidents_President ON Presidents(President)

Now let’s run our query to find the YearFirstInaugurated:

-- Force our query to use the index
-- (table is so small SQL Server bypasses it)
SELECT
    YearFirstInaugurated
FROM Presidents WITH(INDEX(IDX_NC_Presidents_President))
WHERE President = 'Franklin Roosevelt'

If we look at our execution plan now, we will see that we initially looked the president’s name up in our index, then after finding the page where the presidents biography was located, we went to that page to grab the YearFirstInaugurated. This is denoted by the “Key Lookup”. (Also known as “Bookmark Lookup”)
bookmarklookup
This is a more expensive operation because our data is not at the “leaf-level” (or inline with the index we just searched), rather it is in the clustered index instead.

So how do we fix this? In SQL Server 2005, a new feature was introduced called “included columns”. This allows us to include data at the leaf-level of an index. So rather than looking up YearFirstInaugurated in the clustered index, we can find it in the nonclustered index. Let’s drop our index and include YearFirstInagurated in our nonclustered index:

DROP INDEX Presidents.IDX_NC_Presidents_President
GO
CREATE NONCLUSTERED INDEX IDX_NC_Presidents_President ON Presidents(President) INCLUDE(YearFirstInaugurated)

And run our query one more time:

-- Force our query to use the index
-- (table is so small SQL Server bypasses it)
SELECT
    YearFirstInaugurated
FROM Presidents WITH(INDEX(IDX_NC_Presidents_President))
WHERE President = 'Franklin Roosevelt'

Now we only have an index seek. Because as soon as we looked the president up in the index, we immediately also found the YearFirstInaugurated:
indexseek

Additional Resources

For a more in-depth discussion regarding indexes, I recommend the following book SQL Server 2008 Query Performance Tuning Distilled by Sajal Dam & Grant Fritchey. It contains 50 pages dedicated to indexes and they do a good job explaining index structure and design, along with when to use what type of index.

query_tuning_distilled

As a disclaimer, I do receive a commission if you purchase this book.

Comments are closed.

Featured Articles

 Site Author