Introduction to Change Data Capture (CDC)

Posted on November 1, 2010 by Derek Dieter
4

Change Data Capture (CDC) is a powerful feature included in SQL Server 2008. Using change data capture, it is possible to determine which values have been added, changed, or deleted within a table. Setting up CDC is done by specifying tables to monitor. Under the hood, CDC is written using the same repl logreader function that transactional replication uses. Don’t worry too much though. If you are like me, then when you hear the word replication, you start running for the hills. Well, CDC is not as much maintenance. The biggest thing you have to worry about with CDC, is that the disk that contains the log never gets full. When that does happen, and it should be rarely, then yes it is a pain. What you have to do in that case is flip the do not truncate bit for the log file to ‘no’. That aside, let’s get started. First thing you want to do is enable change data capture. This is done at the database level. [cc lang=”sql”] USE AdventureWorks GO DECLARE @ReturnCode int EXEC @ReturnCode = sys.sp_cdc_enable_db SELECT @ReturnCode SELECT name ,is_cdc_enabled FROM sys.databases [/cc] Even though we performed the above action, nothing will happen until we define the tables and columns that it should monitor. When defining a table to monitor, there are a few parameters that need to be passed: CDC Parameters Parameter Description @source_schema The schema name of the table to monitor (ex: dbo) @source_name The name of the table to monitor @role_name […]

How to Detect Table Changes or Changes in Data

Posted on October 20, 2010 by Derek Dieter
2

This is a handy method can be used to detect if a change has happened to any set of data. This can include an entire table or even just two rows. [cc lang=”sql”] SELECT CHECKSUM_AGG(BINARY_CHECKSUM(*)) FROM Sales.SalesOrderDetail [/cc] From our execution, we see that a number is returned. The number itself doesn’t have a particular meaning. All it is, is a hash. A hash is a number or sequence that is generated that represents the data you are hashing. The idea is that, if you change the underlying data you are hashing, then the hash will change. In this case, we are using the hash to say, “This number represents all the data contained ithin this table.” If the data changes, then the hash will change also. So let’s change a minute piece of data and see what happens. We’ll update the unitprice within one row by a penny. [cc lang=”sql”] SELECT CHECKSUM_AGG(BINARY_CHECKSUM(*)) FROM Sales.SalesOrderDetail UPDATE Sales.SalesOrderDetail SET UnitPrice = UnitPrice + .01 WHERE SalesOrderID = 43659 SELECT CHECKSUM_AGG(BINARY_CHECKSUM(*)) FROM Sales.SalesOrderDetail [/cc] Here’s what we see: So what use is this for? Well, there are plenty of uses. The most common use comes when trying to persist data. What this means, is that say we have lookup tables that we want to cache either on the web tier, or even on the client’s machines. What we can do, is send the lookup tables, along with a hash key that represents the state of the data. Then, we can have the […]

Using Synonyms to Decouple

Posted on October 8, 2010 by Derek Dieter
6

Synonyms were first introduced in SQL 2005 and prove to be a great feature so far. What they basically do is decouple the database you are working in from other databases or objects you reference. This helps because it allows you to move objects (tables, procedures) to other locations without having to change the existing objects that reference them. I used this recently in a project where I implemented a new system within an existing system. From a purist standpoint, the system should be on it’s own, but since there was no time to create subscriptions, we implemented it where most of the data was. While I did not like this method, I still to reference the data that should be external via synonyms. To create a synonym, you can either use the dialog, or code. The dialog is pretty intuitive. Synonyms are created within a database, so expand the database and right click on Synonyms. Once you do that, a dialog pops up: Synonym Name Akin to the table or procedure name Synonym Schema Akin to the schema Server Name Server to reference, blank for current Database name Database to reference Schema external schema to reference Object Type [Drop down values] Object Name [Drop down values] And the code to create a synonym is: [cc lang=”sql”] CREATE SYNONYM [Subs].[Employees] FOR [ServerX].[AdventureWorks].[dbo].[Employee] [/cc] Reference it in your code the same way you do a table: [cc lang=”sql”] SELECT * FROM Subs.Employees [/cc] *Intellisense doesn’t work with it as of this […]

Choosing a Rules Engine Design

Posted on August 18, 2010 by Derek Dieter
No

For those that don’t know. A rules engine is a way to harness powerful decision-making based on source data. These decisions are defined in rules. Rules will generally consist of a query that outputs a list of successes or failures, depending on what needs to be reported on. In my experience, I have only found two major methods for building a rules-based engine in SQL Server. They both have their pros and cons. The trick is finding the middle ground which is defined as a third method between the two different extremes that will fit your requirement. With that said we will discuss the two extremes — and the middle ground. The Three Kinds of SQL Rules Engines Extreme 1 – The Dynamic SQL Rules Engine The first method involves dynamic SQL. Using dynamic SQL, the sky is the limit in regards to possibilities. The downfall however is maintainability. Dynamic SQL is very hard to maintain and even harder to troubleshoot and debug. My experience dealing with dynamic SQL is that it often starts out small, but once you’ve gone the dynamic route, it gets bigger and bigger and harder to maintain. Therefore I try to avoid this method altogether. If you’re into it, you’ll definitely find more job security, I’m not into that game personally. Extreme 2 – The Hard-coded Rules Engine The next method is to write each rule as a hardcoded procedure. When the rule executes, it returns back the list of results that fail (or pass) […]

Extracting Data from a Source System to History Tables

Posted on April 8, 2010 by Derek Dieter
No

This is a topic I haven’t found much information written about, however nearly every system I’ve worked with needs this exact functionality. It is important that the method for extracting data be done in a way that does not hinder performance of the source system. In this example, the goal is to extract data from a source system, into another database (or server) all while requiring as little resources as possible. This is why I choose to pull from a source system in two separate stages. First Stage – Staging Import The first step is to do a very simple select statement into a staging table. This first select statement may do some ETL — mostly in regards to lookups that are needed from the source system. There could be multiple select statements pulling data into multiple staging tables. I prefer to pull tables from the source to staging in a one to one relationship. So for every table we need, we also have a corresponding staging table. See the diagram below: The reason for pulling one to one is simple. First of all, the query is a very simple select. Second, it makes troubleshooting very simple. After importing into staging, the next step is to move the records to the history table(s). Second Stage – Historical Import In the historical import, we compare what we have in our history table with what is in staging. Each record in staging is joined with the corresponding current record in the history […]

Introduction to Change Data Capture (CDC)

How to Detect Table Changes or Changes in Data

Using Synonyms to Decouple

Choosing a Rules Engine Design

Extracting Data from a Source System to History Tables

Featured Articles

Site Author

Resources