In any SQL Server system, you will have jobs that run on a schedule or at specified intervals. In these cases, it’s always nice to keep track of certain aspects over time, so you can compare when things go wrong or how performance has been affected over time. In my experience these are indepsensible when it comes to troubleshooting, and for running delta jobs. Here we’ll show a small example of the log tables you can create to facilitate this. First let’s look at some DDL for 2 different tables: [cc lang=”sql”] CREATE TABLE [dbo].[ProcessLogMaster]( [process_log_master_id] [int] IDENTITY(1,1) CONSTRAINT PK_process_log_master PRIMARY KEY CLUSTERED NOT NULL, [process_master_name] [varchar](100) NOT NULL, [datetime_start] [datetime] NULL DEFAULT (getdate()), [datetime_end] [datetime] NULL, [elapsed_ms] [int] NULL, [rows_updated] [int] NULL, [rows_inserted] [int] NULL, [rows_deleted] [int] NULL, [complete] [tinyint] NULL DEFAULT ((0)), [success] [tinyint] NULL, [error_description] [varchar](max) NULL ) CREATE TABLE [dbo].[ProcessLogDetail]( [process_log_detail_id] [int] IDENTITY(1,1) NOT NULL CONSTRAINT [PK_process_log_detail] PRIMARY KEY, [process_log_master_id] [int] NOT NULL, [process_detail_name] [varchar](100) NOT NULL, [datetime_start] [datetime] NULL, [datetime_end] [datetime] NULL, [elapsed_ms] [int] NULL, [rows_updated] [int] NULL, [rows_inserted] [int] NULL, [rows_deleted] [int] NULL, [complete] [tinyint] NULL, [success] [tinyint] NULL, [error_description] [varchar](max) NULL ) [/cc] What we have here are two different tables that can be used to describe job steps. The ProcessLogMaster table, is used to specify the Master or Top level of the job. If there are multiple steps to the job, that report back to the master, we would enter their entries into the ProcessLogDetail table. We could then sum up the […]
Continue reading ...
There are a lot of different design patterns that lend themselves to creating the shortest path to the data. One of the most efficient is the caching of stored procedure result sets. In order to do this, we need to read the incoming parameters and create a cache key. This cache key is then stored along with the stored procedures result set as a unique identifier representing that combination of the stored procedures parameters. The caveat with this method is that the business requirement needs to allow stale data. There are times where you will need to use values other than the passed in parameters in order to create the cache key. Some examples include datetime data types or keys that are unique (like a customerkey). If the hash that gets created from the parameters is unique, then you will never reuse that dataset again. With this in mind you would even have determine whether the procedure is even cacheable. Another concern to keep in mind is the duration of time you can serve stale data. Maybe 30 seconds, 1 minute, or 1 hour? Any time increment is able to be worked with by clearning the cache tables at the desired interval. Design Let’s look at the basic workflow for how this procedure will work. First of all, we will need to hash all the parameters that are coming into the procedure (unless they are unique in which case we may not be able to cache, or we can possibly […]
Continue reading ...
This is one of those topics that will get people fired up. But here goes. I am mostly an explicit temp table developer. By contrast, I am not an implicit temp table developer. What I mean by that is – in writing SQL, you simply cannot avoid the use of tempdb. Either you write to tempdb by breaking queries out and intentionally create temp tables, or you write to tempdb by not breaking queries out (keeping them as long convoluted statements with a long convoluted query plan) and let the optimizer create “worktables”. In either case you are writing to tempdb whether you like it or not. Yet.. the difference is this: Breaking them out: You can control the size of the result set being written to disk You can ensure that the execution plan is simple You can utilize the materialized temp table data throughout the entire procedure Temp tables contain statistics and can be indexed To compare temp table development to CTE development is somewhat of an apples and oranges comparison. A CTE uses nothing special on the back end. It is simply a (potentially) clean way to write a query. The difference is this however. With a CTE, the execution plan of the main query becomes intertwined with the CTE, leaving more room for the optimizer to get confused. By contrast, when a temp table divides two queries, the optimizer is not intertwined with that which created the temp table and the execution plans stay simple and […]
Continue reading ...
Highly concurrent systems that feed off normalized data stores typically require a middle layer of logic to serve the front-end needs. More often than not, this middle layer of logic is stored in the same stored procedures that the web layer accesses. While sometimes this may be the right place for simple logic, for more complex calculations and joins it is simply not efficient. The answer in these cases is to create a new meta layer of data that pre-joins data and rolls up necessary aggregations. To paint a better picture – in an ideal database each procedure that feeds the front-end would house a simple select statement from a single table. We know in real life this is not always possible, however we should think in these terms with every web proc we write. The reason is simple – complex logic is both IO and CPU intensive. We have no control over the web traffic, but we do have control over what logic we use to serve the web. Often times it is better to run jobs in the background that perform complicated logic on behalf of the web procs and dump the results into static tables. This methodology basically creates a denormalized meta layer of data on top of the normalized data. The argument against this is that the data will not truly be real-time. However you need to ask yourself what’s more important, “real-time” data that is 5-10 times slower or preaggregated data that is potentially seconds […]
Continue reading ...
Every once in a while I’ll find a field someone created where the datatype choice was not the best. This comes up relatively often with Varchar vs Char. All of the confusion just comes from not understanding the differences and causes and effects of each. With that, I’ll outline the only times I will use one over the other. Let’s take a look at the behavior of char vs varchar: [cc lang=”sql”] CREATE TABLE #meme ( first_name char(50), last_name varchar(50) ) INSERT INTO #meme (first_name, last_name) SELECT ‘john’, ‘smith’ SELECT * FROM #meme SELECT DATALENGTH(first_name), DATALENGTH(last_name) FROM #meme [/cc] You can see from the value returned from DATALENGTH that first_name is 50 bytes long despite only taking 4 characters. The other characters are empty strings. Storage wise, varchar is out the gate up to 2 bytes larger than a char because it needs to store the length. It’s 1 byte larger if it’s over 255 characters. That’s why you will often see varchar definitions as varchar(255). Pain-in-the-butt wise, char is far more a pain to deal with unless all the data in the char column takes up the exact amount of space defined for the char column. I typically only use char if all the fields are going to be the exact same size guaranteed. Even then, I’m hesitant. Partly because disk space is not as much of an issue as it was before, and mostly because if you DO have variable length fields in a char column, then the […]
Continue reading ...