Highly concurrent systems that feed off normalized data stores typically require a middle layer of logic to serve the front-end needs. More often than not, this middle layer of logic is stored in the same stored procedures that the web layer accesses. While sometimes this may be the right place for simple logic, for more complex calculations and joins it is simply not efficient. The answer in these cases is to create a new meta layer of data that pre-joins data and rolls up necessary aggregations. To paint a better picture – in an ideal database each procedure that feeds the front-end would house a simple select statement from a single table. We know in real life this is not always possible, however we should think in these terms with every web proc we write. The reason is simple – complex logic is both IO and CPU intensive. We have no control over the web traffic, but we do have control over what logic we use to serve the web. Often times it is better to run jobs in the background that perform complicated logic on behalf of the web procs and dump the results into static tables. This methodology basically creates a denormalized meta layer of data on top of the normalized data. The argument against this is that the data will not truly be real-time. However you need to ask yourself what’s more important, “real-time” data that is 5-10 times slower or preaggregated data that is potentially seconds […]
Continue reading ...
Sometimes slow queries can be rectified by changing the query around a bit. One of these examples can be illustrated when multiple values are compared within a WHERE clause using an OR or IN statement. Often times, OR can cause a scan against an index or table which may not be the preferable execution plan in terms of IO consumption, or overall query speed. A lot of variables come into play when the query optimizer creates an execution plan. These variables include a multitude of hardware specs, instance settings, database settings, statistics (table, index, auto-generated), and also the way the query is written. The one we are changing here is the way the query is written. As unsuspecting as it may seem, even though two different queries can return the exact same results, the path at which they take can be entirely different just based upon the format of the query. UNION vs OR In most of my experience with SQL Server, the OR is generally less efficient than a UNION. What tends to happen with an OR is it causes a scan more often. Now this sometimes may actually be a better path to take in some cases and I’ll leave that to a separate article but overall I’ve found that touching more records is the biggest cause of slowness. So let’s start our comparison. Here is our OR statement: [cc lang=”sql”] SELECT SalesOrderID, * FROM sales.SalesOrderDetail WHERE ProductID = 750 OR ProductID = 953 [/cc] From this execution […]
Continue reading ...
There is definitely a lack of awareness in the SQL world regarding the use of user defined functions and the potential performance hit they can have when using within your queries. Don’t get me wrong, I would love nothing more than to be able to centralize my commonly used code into functions for reuse. In a lot of cases this is possible, however there are specific cases where this can cause a huge performance impact. The Problem The one thing we need to be aware of with SQL is that its efficiency lies in the fact that it deals with data in SETS. Meaning that its power does not come in performing row-by-row operations, rather it wants to retrieve chunks of data and manipulate them as recordsets. Keeping this in mind, you can look out for scenarios where certain operations will cause more of a row-by-row operation and therefore impact performance. The most common no no, is the use of scalar functions within a set based operation. It seems (but I can’t prove) that SQL 2008 has actually made some great strides in being able to deal with these situations, however there will always be a negative impact. First, let’s look at a common scenario. The Test First, let’s deploy this scalar user defined function which calculates the End of month for a given date: [cc lang=”sql”] CREATE FUNCTION [dbo].[ufn_GetLastDayOfMonth] ( @pInputDate DATETIME ) RETURNS DATETIME BEGIN DECLARE @vOutputDate DATETIME SET @vOutputDate = CAST(YEAR(@pInputDate) AS VARCHAR(4)) + ‘/’ + CAST(MONTH(@pInputDate) […]
Continue reading ...
This is an invaluable trick that can be sometimes be used instead of the BETWEEN operator. A common place this can be used is in an historical table that contains EffectiveFrom & EffectiveTo dates. In these kinds of tables, many historical versions of a record can be stored. Each record in this scenario will contain a Start & End Date which signifies the time span when a record is active. Because two records cannot be active at the same time, we can be sure that the Start & End dates will never overlap. While this describes one particular scenario for this example, there are many other situations this method can be used. Let’s look at the typical query we would use to find the record which is active for an employee at a specific point in time. [cc lang=”sql”] SELECT * FROM [HumanResources].[EmployeeDepartmentHistory] WHERE EmployeeID = 274 AND ‘2001-04-28’ BETWEEN StartDate AND EndDate [/cc] This query will return to us the single record that is active as of ‘2001-04-28’. However if you think about the constraints our table contains, we actually only need to query one column (the StartDate) in order to isolate this record. This is because the StartDate will not overlap any other StartDate & EndDate for this employee. Here’s an example: [cc lang=”sql”] SELECT TOP 1 * FROM [HumanResources].[EmployeeDepartmentHistory] WHERE EmployeeID = 274 AND StartDate >= ‘2001-04-28’ ORDER BY StartDate [/cc] Because only one record can be active for an employee at a single point in time, […]
Continue reading ...
When optimizing procedures often times you need to rewrite the procedure in order to speed it up. However, when you’re finished how can you be sure that the result set is still the same? This article will show you how to run a thorough test to make sure. We’ll execute the old and new stored procedure then compare their result sets by using a checksum aggregate. We’ll also compare rowcounts and elapsed execution time. The benefit of this test harness is that it will allow you to loop through and execute the procedures with as many dynamic parameter values as you wish, then compare the result set of the old procedure vs the new procedure for each execution. First, we need to do a couple things to set up our environment. Setup a local linked server Create a wrapper procedure We need the local linked server in order to dynamically retrieve the result set of the stored procedure using OPENQUERY. We could skip this step and create the temp table explicitly if we knew all the columns and data types being returned, but sometimes this is a hassle because there could be many columns that are returned and explicitly creating a temp table for each procedure we wanted to test is a pain. And that’s probably why you’re here in the first place isn’t it? So to create a local linked server, do the following: [cc lang=”sql”] EXEC sp_addlinkedserver @server=’LOCALSERVER’, @srvproduct=’SQLSERVER’, @provider=’SQLNCLI’, @datasrc=’Your_Server_Name_Here’ [/cc] For this example, the server name […]
Continue reading ...