Skip to content

Configure Change Data Capture Settings

After setting up change data capture (CDC), there are a few configuration options that allow for optimal performance based upon applications load.

Using the default values, the most number of transactions that can be grabbed every 5 seconds is 5000. Remember, a transaction is not the same as a database row. One transaction can update “n” number of rows.

In order to determine latency for you CDC solution, you will need to query a DMV view:

select latency, *
from sys.dm_cdc_log_scan_sessions

The latency will be shown in seconds, if it is getting too high, start increasing maxtrans, maxscans, and decrease the polling interval.

  • polling interval – (default 5 seconds) the amount of time to wait between log scans
  • maxtrans – (default 500) the number of transactions to grab with each scan
  • maxscans – (default 10) the number of scans performed between each polling interval

In order to change these settings, use the system stored procedure (sys.sp_cdc_change_job) after changing the database which contains your capture instance. The CDC job needs to be restarted after making these changes.

-- Change to the database that contains your capture instance
use YourDatabaseHere
go
exec sys.sp_cdc_change_job @job_type = 'capture'
						,@maxtrans = 501
						,@maxscans = 10
						,@continuous = 1
						,@pollinginterval = 5

Changing the latency between the cleanup is done using the same stored procedure:

exec sys.sp_cdc_change_job @job_type = 'cleanup'
						,@retention = 4320 -- Number of minutes to retain (72 hours)
						,@threshold = 5000

To view the results of the changes, execute the following:

-- View the results of the changed parameter is the cdc_jobs table
SELECT *
FROM msdb.dbo.cdc_jobs

Get End of Month

The following short code will return the date and time for the very end of the month passed in. It works by first getting the beginning of the month for the date passed in, adding one month to that date, then subtracting 3 milliseconds. This gives the latest possible date for the end of the month as the DateTime data type has a granularity of 3 milliseconds.

DECLARE @DateIn datetime

SET @DateIn = GETDATE()

SELECT DATEADD(ms,-3,DATEADD(mm,0,DATEADD(mm,DATEDIFF(mm,0,@DateIn)+1,0)))

Beginning of Month

Possibly the easiest way to calculate the beginning of the month, is to use the DateAdd function and add a date difference of nothing to the month:

DECLARE @Date datetime
SET @Date = GETDATE()
SELECT DATEADD(mm,DATEDIFF(mm,0,GETDATE()),0)

Create Index on Table Variable

Creating an index on a table variable can be done implicitly within the declaration of the table variable by defining a primary key and creating unique constraints. The primary key will represent a clustered index, while the unique constraint a non clustered index.

	DECLARE @Users TABLE
	(
		UserID	INT PRIMARY KEY,
		UserName varchar(50),
		UNIQUE (UserName)
	)

The drawback is that the indexes (or constraints) need to be unique. One potential way to circumvent this however, is to create a composite unique constraint:

	DECLARE @Users TABLE
	(
		UserID	INT PRIMARY KEY,
		UserName varchar(50),
		FirstName varchar(50),
		UNIQUE (UserName,FirstName)
	)

Generally, temp tables perform better in situations where an index is needed. The downfall to temp tables is that they will frequently cause recompilation. This was more of an issue with SQL 2000 when compilation was performed at the procedure level instead of the statement level. SQL 2005 and above perform compilation at the statement level so if only one statement utilizes a temp table then that statement is the only one that gets recompiled.

Create Index on Temp Table

One of the most valuable assets of a temp table (#temp) is the ability to add either a clustered or non clustered index. Additionally, #temp tables allow for the auto-generated statistics to be created against them. This can help the optimizer when determining cardinality. Below is an example of creating both a clustered and non-clustered index on a temp table.

	CREATE TABLE #Users
	(
		ID			int IDENTITY(1,1),
		UserID		int,
		UserName	varchar(50)
	)

	INSERT INTO #Users
	(
		UserID,
		UserName
	)
	SELECT
		 UserID		= u.UserID
		,UserName	= u.UserName
	FROM dbo.Users u

	CREATE CLUSTERED INDEX IDX_C_Users_UserID ON #Users(UserID)

	CREATE INDEX IDX_Users_UserName ON #Users(UserName)

Even though you can implicitly create a clustered index on a table variable (@table) by defining a primary key or unique constraint, it is generally more efficient to use a temp table. The downside is recompilation, however this can be suppressed with the use of (Keep Plan) or (Keep Fixed Plan) hints.

Random Number

The following SQL script will generate a single random number:

SELECT CAST(1000000-CEILING(RAND()* 899001) + (DATEPART(ss, GETDATE()) * 1000 ) + DATEPART(ms, GETDATE()) as int)

Simplifying Security Using Schemas

One of my favorite aspects of schemas (which were introduced in 2005) is the ability easily manage permissions. For those getting caught in terminology, schemas are the new abstraction layer that replace object owners. In SQL 2000, every object had an owner. And that owner was a user. In most cases the user was dbo. So your stored procedures would have a naming convention of dbo.spExecuteMe. With schemas, the model has changed.

While object owners were actual users, schemas are not. Schemas are abstract objects used to separate owners from the objects. The beauty of this is that you can assign a user access to an entire schema, which may be comprised of hundreds of objects. Instead of assigning permissions at the object level, you can say, “Any user assigned to the role of ‘ExecApp’ will be able to execute any object that is created within the schema ‘App’. This way, by simply adding a user to the ExecApp role, he will inherently be able to execute any procedure in the application schema. Likewise, any procedure created within the schema ‘App’, will automatically be able to execute that procedure.

So how do you do it? Start by Creating a schema.


CREATE SCHEMA App

Next, transfer stored procedures to this schema, or simply create stored procedures with this schema.

To transfer:


ALTER SCHEMA App TRANSFER dbo.spGetCustomers

Lastly, create a role that will be allowed to execute procedures in the App schema; Add a user to the role; and then allow the role to execute anything on the schema App.


CREATE ROLE [ExecApp] AUTHORIZATION [dbo]

EXEC sp_addrolemember N'ExecApp', N'JohnJacobs'

GRANT EXECUTE ON SCHEMA::[App] TO [ExecApp]

Viola!

What Version of SQL Server Am I Running?

The following shows how to find the version of SQL Server you are running (described as productversion). It also shows how to query the level of the product. The level of the product indicates whether it was the initial release, a service pack, or a beta version:

  1. RTM = shipping version
  2. SPn = service pack (where n is the version)
  3. CTP = Community Technology Preview version
SELECT SERVERPROPERTY('productversion'), SERVERPROPERTY ('productlevel'), SERVERPROPERTY ('edition')

If the above statement does not work, you may be running SQL Server 7.0. In which case, use the following statement:

SELECT @@VERSION

Determine Database Owner

Determining the database owner is important if you want to take advantage of cross-database-ownership-chaining. If databases have different owners, then you have issues with accessing objects between databases.

To find the database owners:

SELECT SUSER_SNAME(owner_sid)
FROM sys.databases

To change the owner of a database:

USE database
EXEC sp_changedbowner 'sa'

The standard owner for databases is usually sa.

Find Best Clustered Index

The following query will compare the nonclustered indexes vs the clustered index and determine which index would qualify as the best clustered index based upon the DMV statistics.

If a clustered index does not exist on the table, it will also suggest one of the existing nonclustered indexes as the clustered index.

DECLARE @DBID INT
SET @DBID = db_id()

SELECT
TableName          = object_name(idx.object_id)
,UselessClusteredIndex    = idx.NAME
,ShouldBeClustered      = nc.NonClusteredName
,Clustered_User_Seeks   = c.user_seeks
,Clustered_User_Lookups   = c.user_lookups
,NonClustered_User_Seeks  = nc.user_seeks
FROM sys.indexes idx
LEFT JOIN sys.dm_db_index_usage_stats c
ON idx.object_id = c.object_id
AND idx.index_id = c.index_id
AND c.database_id = @DBID
JOIN (
SELECT
idx.object_id
,nonclusteredname = idx.NAME
,ius.user_seeks
FROM sys.indexes idx
JOIN sys.dm_db_index_usage_stats ius
ON idx.object_id = ius.object_id
AND idx.index_id = ius.index_id
AND ius.database_id = @DBID
WHERE
idx.type_desc = 'nonclustered'
AND ius.user_seeks = (SELECT MAX(user_seeks)
FROM sys.dm_db_index_usage_stats
WHERE object_id = ius.object_id
AND type_desc = 'nonclustered')
GROUP BY idx.object_id
,idx.NAME
,ius.user_seeks
) nc
ON nc.object_id = idx.object_id
WHERE idx.type_desc = 'clustered'
OR idx.type_desc = 'heap'
AND nc.user_seeks > c.user_seeks
AND (c.user_lookups * .5) <= nc.user_seeks
ORDER BY c.user_lookups DESC

The way it performs this determination is by comparing the lookups for a particular nonclustered index to the current number of seeks for the current clustered index.