Skip to content

Choosing a Rules Engine Design

For those that don’t know. A rules engine is a way to harness powerful decision-making based on source data. These decisions are defined in rules.

Rules will generally consist of a query that outputs a list of successes or failures, depending on what needs to be reported on.

In my experience, I have only found two major methods for building a rules-based engine in SQL Server. They both have their pros and cons.

The trick is finding the middle ground which is defined as a third method between the two different extremes that will fit your requirement.

With that said we will discuss the two extremes — and the middle ground.

The Three Kinds of SQL Rules Engines

  1. Extreme 1 – The Dynamic SQL Rules Engine
  2. The first method involves dynamic SQL. Using dynamic SQL, the sky is the limit in regards to possibilities. The downfall however is maintainability. Dynamic SQL is very hard to maintain and even harder to troubleshoot and debug. My experience dealing with dynamic SQL is that it often starts out small, but once you’ve gone the dynamic route, it gets bigger and bigger and harder to maintain. Therefore I try to avoid this method altogether. If you’re into it, you definitely find more job security, I’m not into that game personally.

  3. Extreme 2 – The Hard-coded Rules Engine
  4. The next method is to write each rule as a hardcoded procedure. When the rule executes, it returns back the list of results that fail (or pass) its test. Using this method, it is also likely that you will not code yourself into a corner.
    The downfalls however is a slow turn around time in order to generate rules, along with a lack of code reuse. This lack of code-reuse opens the possibility to query the same tables using different methods — which is a negative.
    To elaborate, imagine a set rules that need to query the same set of tables in order to find similar result set. This opens a negative opportunity to write queries that join agains tables differently — which sucks.

  5. Middle Ground – The Modular Filter Rules Engine
  6. This last method is my favorite, yet it requires some inginuity. It involves creating modular procedures that take input parameters as criteria, and return back “pass-or-fail” result sets. I got this method by reading this article http://msdn.microsoft.com/en-us/library/aa964135(SQL.90).aspx by Joshua Greenberg, Ph.D. In this method, you are basically defining procedures that take multiple parameters and return back a list of objects that pass or fail. This is a powerful method because it allows code reuse, does not require extensive maintenance, and also provides a quick turn around time for new rules to be defined. Regarding coding yourself into a corner, you have an opportunity to make that mistake based on the design you choose.

    In the next article, we will provide a methodology to create a rules-based engine.

Select Distinct

Select Distinct a way to tell SQL Server to not return duplicate rows. Adding the distinct keyword makes it an expensive query because it essentially performs a group by for all of the returned columns. That being the case, it should only optimally be used on queries that return a small number of rows, or small number of columns.

When using Select Distinct, you should keep in mind that this is generally a workaround to either improper joining of tables, or an improper where clause that is not limiting the result set to only returning one instance of a record.

In order to utilize distinct, all you need to do, is place it before the select list.

SELECT DISTINCT *
FROM yourtable

This will surely only return a unique instance of every record in the result.

SPID – What is it?

A SPID in SQL Server is a Server Process ID. These process ID’s are essentially sessions in SQL Server. Everytime an application connects to SQL Server, a new connection (or SPID) is created. This connection has a defined scope and memory space and cannot interact with other SPIDs. The term SPID is synonymous with Connection, or Session.

Viewing SPIDs

In order to view all the connections in SQL Server execute the following query.

SELECT *
FROM sys.dm_exec_sessions

From here we see a session_id shown in the left hand column. This is also known as the SPID.

To find the SPID for your current execution window run this.

SELECT @@SPID

Connection Settings

Each SPID can have it’s own connection settings. Connection settings can be defined by the connection string or the default values for the SQL Server instance, so if two SPIDs come in through the same connection string, then they will also have the same connection settings. This is important because execution plans that are generated for queries are dependent on the connection settings. If two different SPIDs come in with different connection settings, then they will not use the same execution plan defined for a SQL statement. In order to view the connections settings execute the SQL Statement shown above.

Transaction Isolation

SPIDs can have their own transaction isolation levels defined globally. When set within a connection, all proceeding executions maintain the same transaction isolation. For more on isolation levels, view this

Temp Table sharing

Temp tables that are created within a SPID are accessible from any proceeding execution in that SPID. This is how separate stored procedures can share the temp table because any temp table created in a spid’s session is global to the session. This comes in useful when sharing data between stored procedures.

Blocking

Because SPIDs define an atomic operation and are independent, they can often compete with each other depending on their Transaction Isolation levels, the objects they are accessing, and the operation they are performing. The quickest way to find blocking is to use the sp_who2 procedure.

SPID Status

Because a SPID is defined as a connection, it is not always running (or executing). In order to find the status of the SPIDs, execute the following:


SELECT
    SPID                = er.session_id
    ,Status             = ses.status
    ,[Login]            = ses.login_name
    ,Host               = ses.host_name
    ,BlkBy              = er.blocking_session_id
    ,DBName             = DB_Name(er.database_id)
    ,CommandType        = er.command
    ,SQLStatement       = st.text
    ,ObjectName         = OBJECT_NAME(st.objectid)
    ,ElapsedMS          = er.total_elapsed_time
    ,CPUTime            = er.cpu_time
    ,IOReads            = er.logical_reads + er.reads
    ,IOWrites           = er.writes
    ,LastWaitType       = er.last_wait_type
    ,StartTime          = er.start_time
    ,Protocol           = con.net_transport
    ,ConnectionWrites   = con.num_writes
    ,ConnectionReads    = con.num_reads
    ,ClientAddress      = con.client_net_address
    ,Authentication     = con.auth_scheme
FROM sys.dm_exec_requests er
OUTER APPLY sys.dm_exec_sql_text(er.sql_handle) st
LEFT JOIN sys.dm_exec_sessions ses
ON ses.session_id = er.session_id
LEFT JOIN sys.dm_exec_connections con
ON con.session_id = ses.session_id

Trunc Date in SQL Server

In Oracle there is a function (trunc) used to remove the time portion of a date. In order to do this with SQL Server, you need to use the convert function. Convert takes 3 parameters, the datatype to convert to, the value to convert, and an optional parameter for the formatting style. It is this third parameter we will use in order to customize the format of the date.

-- Month first
SELECT CONVERT(varchar(12),GETDATE(), 101)  -- 06/29/2009
SELECT CONVERT(varchar(12),GETDATE(), 110)  -- 06-29-2009
SELECT CONVERT(varchar(12),GETDATE(), 100)  -- Jun 29 2009
SELECT CONVERT(varchar(12),GETDATE(), 107)  -- Jun 29, 2009

-- Year first
SELECT CONVERT(varchar(12),GETDATE(), 102)  -- 2009.06.29
SELECT CONVERT(varchar(12),GETDATE(), 111)  -- 2009/06/29
SELECT CONVERT(varchar(12),GETDATE(), 112)  -- 20090629

-- Day first
SELECT CONVERT(varchar(12),GETDATE(), 103)  -- 29/06/2009
SELECT CONVERT(varchar(12),GETDATE(), 105)  -- 29-06-2009
SELECT CONVERT(varchar(12),GETDATE(), 104)  -- 29.06.2009
SELECT CONVERT(varchar(12),GETDATE(), 106)  -- 29 Jun 2009

-- Time only
SELECT CONVERT(varchar(12),GETDATE(), 108)  -- 07:26:16
SELECT CONVERT(varchar(12),GETDATE(), 114)  -- 07:27:11:203

Drop Temp Table If Exists

If you’re here then you’ve probably run into the situation where you’ve automatically created a temp table in your script, and every time you execute the script you have to drop the temp table manually. Yes, this can be a pain. I’ve done this myself many times.

So here’s the easy solution. We need to check if the temp table exists within the TempDB database and if it does, we need to drop it.

IF OBJECT_ID('tempdb..#Temp') IS NOT NULL
BEGIN
	DROP TABLE #Temp
END

To replicate this, let’s run the following command in the same window multiple times:

-- (Replace the product table below with your table)
SELECT *
INTO #Temp
FROM SalesLT.Product

We get the error message:

There is already an object named '#Temp' in the database.

Now we simple add our snippet of code, and we are able to execute without having to manually drop anymore.

IF OBJECT_ID('tempdb..#Temp') IS NOT NULL
BEGIN
	DROP TABLE #Temp
END

-- (Replace the product table below with your table)
SELECT *
INTO #Temp
FROM SalesLT.Product

Insert From Select

The method of inserting records from one table to another differs depending on whether or not the target table already exists. Neither method is difficult, however one method requires more coding.

Insert where target table does not exist

If it does not, then you are in luck, there is a method that could not be simpler using SELECT..INTO. Select into automatically creates a table based on the result of a select statement. This select statement can be as simple or complex as you like. The only think you need to do, is add the line: INTO TABLENAME right above the FROM statement. Here goes:

-- Copy all the records into a new table
SELECT *
INTO CustomerCopy
FROM SalesLT.Customer

-- View the table
SELECT *
FROM CustomerCopy

It should be noted that this method does not create any keys that may be existing from the old table to the new table.

Insert where target table does exist

It’s not that this method is more difficult, it just requires more code. To do this, you need to use the INSERT INTO statement. When using this method, you need to specify the column names of the target table you want to insert, and the column names of the source table you are inserting from. Here is an example:


INSERT INTO [AdventureWorksLT2008].[SalesLT].[Customer]
(
	[NameStyle]
	,[Title]
	,[FirstName]
	,[MiddleName]
	,[LastName]
	,[Suffix]
	,[CompanyName]
	,[SalesPerson]
	,[EmailAddress]
	,[Phone]
	,[PasswordHash]
	,[PasswordSalt]
	,[ModifiedDate]
)

SELECT
	[NameStyle]
	,[Title]
	,[FirstName]
	,[MiddleName]
	,[LastName]
	,[Suffix]
	,[CompanyName]
	,[SalesPerson]
	,[EmailAddress]
	,[Phone]
	,[PasswordHash]
	,[PasswordSalt]
	,[ModifiedDate]
  FROM [AdventureWorksLT2008].[SalesLT].[Customer2]

The brackets surrounding the column names above is optional. If you are using SQL Server Management Studio (ssms), the easiest way to do this is to right click on the table name, choose ‘Script Table As’ -> ‘Create To’ -> ‘New Query Window’. This will automatically generate the insert code for all the column names to insert into. The only thing you have to do is fill in your select statement.
Insert From One Table To Another

Convert Text String to Numbers (Int)

Here is a simple method to convert a text string to numbers (or an integer). This method evaluates each value in a column to determine if the datatype is numeric. If it is, then it converts it to an integer.

SELECT
	CASE WHEN
		ISNUMERIC(PostalCode) > 0
	THEN CAST(PostalCode AS INT)
	ELSE 0 END
FROM SalesLT.Address

Now, what if the column values contain text characters and you want to strip those characters out and then convert the value to an integers. Well, luckily, there is a way to do this. Most of the examples to do this will loop the values using a while loop, however this example uses a dynamic numbers table in conjunction with a trick shown to me by Simon Sabin from his blog. This method will replace all the non numeric characters and convert the values to an integer.

-- define max number of character values
-- in the string being evaluated
DECLARE @MaxNumber INT = 5000

;WITH Numbers AS
(
    SELECT 1 AS Num
    UNION ALL
    SELECT Num+1
    FROM Numbers
    WHERE Num <= @MaxNumber
)

SELECT
	CAST
	(
		(
			SELECT CASE
				WHEN SUBSTRING(PostalCode,Num,1) LIKE '[0-9]'
				THEN SUBSTRING(PostalCode,Num,1)
				ELSE '' END
			FROM Numbers
			WHERE Num <= LEN(PostalCode)
			FOR XML PATH('')
		) AS int
	 )
FROM SalesLT.Address

OPTION(MAXRECURSION 32767)

To make this work with your example, simply replace the value [PostalCode] with your field and the FROM Clause should be your table you are querying from.

List All Columns in Database or Server

To get a list of all columns within a database, you can use the ANSI compliant INFORMATION_SCHEMA.COLUMNS system view.

SELECT *
FROM INFORMATION_SCHEMA.COLUMNS

In order to get all the columns in all the databases however, you need to loop through the databases. To do this, you can use the undocumented sp_MSForEachDB procedure that Microsoft ships.

sp_MSForEachDB @command1='USE ?;

SELECT
	Table_Catalog
	,Table_Schema
	,Table_Name
	,Column_Name
	,Data_Type
	,Character_Maximum_Length
FROM INFORMATION_SCHEMA.COLUMNS'

List All Databases

To list all databases in a SQL Server instance, run the following:

SELECT *
FROM sys.sysdatabases

This also outputs metadata about each database including createdate, compatibility level, and the MDF filename.

List All Tables in a Database

There are a few methods for listing all the tables in a database. Some methods will be able to show more specifics than others due to the capabilities inherent in the tables being joined. I’ll show you the simplest way first which is probably the most supported way.

SELECT TABLE_SCHEMA + '.' + TABLE_NAME, *
FROM INFORMATION_SCHEMA.TABLES
WHERE TABLE_TYPE = 'BASE TABLE'
ORDER BY TABLE_SCHEMA + '.' + TABLE_NAME

This method makes use of documented INFORMATION_SCHEMA system view.

The next method makes use of two not so well documented system views.

SELECT
	FullName	= s.name + '.' + t.name
	,SchemaName	= s.name
	,TableName	= t.name
FROM sys.tables t
JOIN sys.schemas s
ON s.schema_id = t.schema_id
ORDER BY s.name, t.name

There is really no difference between these two methods.

The third method uses the sp_tables stored procedure and passes in the parameter of ‘TABLE’. Though the parameter being passes looks funny, this is how to make it work.

EXEC sp_tables @table_type = "'TABLE'"

Show all Tables with Rowcount and Dataspace

This method is the most extensive. Not only will it show you all the tables, but it will also display the rowcount and datasize in Megabytes.

SELECT *
FROM (
	SELECT
		TableName	= t.TABLE_SCHEMA + '.' + t.TABLE_NAME
		,[RowCount]	= SUM(sp.[rows])
		,Megabytes	= (8 * SUM(CASE WHEN sau.type <> 1 THEN sau.used_pages WHEN sp.index_id < 2 THEN sau.data_pages ELSE 0 END)) / 1024
	FROM INFORMATION_SCHEMA.TABLES t
	JOIN sys.partitions sp
		ON sp.object_id = OBJECT_ID(t.TABLE_SCHEMA + '.' + t.TABLE_NAME)
	JOIN sys.allocation_units sau
		ON sau.container_id = sp.partition_id
	WHERE TABLE_TYPE = 'BASE TABLE'
	GROUP BY
		t.TABLE_SCHEMA + '.' + t.TABLE_NAME
) A
ORDER BY TableName

This method uses the INFORMATION_SCHEMA system view and joins agains the sys.partitions view in order to get the counts. The megabytes calculation multiples the pages by the page size, then divides by 1024 in order to get the megabytes calculation. I took a piece of code that was executed by SSMS in order to generate this query.