# How to Order Numeric Values in a Varchar Field

If you have ever tried to order numerical values in a varchar field you’ll know that the sort doesn’t occur numerically as it would if the datatype were numeric. In other words, the values of 1 and 10 will be sorted together because they both start with a leading 1. To overcome this, we have to cast the values as numeric. But this raises another issue. Since it’s a varchar field we cannot ensure that the values are all numeric which means we additionally have to take into account characters. Let’s start by creating a sample table and populating some values: [cc lang=”sql”] CREATE TABLE #varchar_field ( ID INT IDENTITY(1,1), mixed_field varchar(100), ) INSERT INTO #varchar_field (mixed_field) SELECT ‘1’ union all SELECT ‘4.9’ union all SELECT ’10’ union all SELECT ’50’ union all SELECT ‘6’ union all SELECT ‘a’ union all SELECT ‘z’ union all SELECT ‘A’ [/cc] Now let’s run a simple sort so we can see the default behavior. [cc lang=”sql”] SELECT * FROM #varchar_field ORDER BY mixed_field [/cc] Here we see everything is only sorted by it’s leading character, not taking into consideration the value of the numbers. Now let’s execute a revised version of this order by. [cc lang=”sql”] SELECT * FROM #varchar_field ORDER BY CASE WHEN ISNUMERIC(mixed_field) = 1 THEN CAST(mixed_field AS FLOAT) WHEN ISNUMERIC(LEFT(mixed_field,1)) = 0 THEN ASCII(LEFT(LOWER(mixed_field),1)) ELSE 2147483647 END [/cc] Here we check to see if the field is numeric first. If it is, we cast it to float to deal with […] Continue reading ...

# Creating Hot Swap Tables to Hide Complex Logic

Highly concurrent systems that feed off normalized data stores typically require a middle layer of logic to serve the front-end needs. More often than not, this middle layer of logic is stored in the same stored procedures that the web layer accesses. While sometimes this may be the right place for simple logic, for more complex calculations and joins it is simply not efficient. The answer in these cases is to create a new meta layer of data that pre-joins data and rolls up necessary aggregations. To paint a better picture – in an ideal database each procedure that feeds the front-end would house a simple select statement from a single table. We know in real life this is not always possible, however we should think in these terms with every web proc we write. The reason is simple – complex logic is both IO and CPU intensive. We have no control over the web traffic, but we do have control over what logic we use to serve the web. Often times it is better to run jobs in the background that perform complicated logic on behalf of the web procs and dump the results into static tables. This methodology basically creates a denormalized meta layer of data on top of the normalized data. The argument against this is that the data will not truly be real-time. However you need to ask yourself what’s more important, “real-time” data that is 5-10 times slower or preaggregated data that is potentially seconds […]

# Substring Between Two Characters

Though SQL may not be the most elegant language for string handling, it does perform most functions and in a set based manner. The SQL substring function basically has the same syntax as found in other languages. In this example we will take the common scenario of extracting a string from between two fixed characters. In our example we’ll start with a domain name that contains the subdomain. We will extract only the top level domain from the results. Keep in mind for the purposes of this example all the URLs listed in the table need to have a subdomain. Setup the Data Let’s begin by creating a new table named URLs and insert some data. [cc lang=”sql”] IF EXISTS (SELECT * FROM dbo.sysobjects WHERE id = OBJECT_ID(N’dbo.URLs’) AND OBJECTPROPERTY(id, N’IsUserTable’) = 1) –DROP TABLE dbo.URLs CREATE TABLE dbo.URLs( url nvarchar(1024) NULL ) INSERT dbo.URLs (url) VALUES (N’www.litwareinc.com/’) GO INSERT dbo.URLs (url) VALUES (N’www.treyresearch.net/’) GO INSERT dbo.URLs (url) VALUES (N’www.adatum.com/’) GO INSERT dbo.URLs (url) VALUES (N’www.northwindtraders.com/’) GO INSERT dbo.URLs (url) VALUES (N’www.wideworldimporters.com/’) GO INSERT dbo.URLs (url) VALUES (N’www.proseware.com/’) [/cc] Now that we have the data setup let’s start with the extraction. For the substring function, the first parameter is the string we want to parse from, the next parameter is the starting position to be extracted and the last parameter is the number of characters to extract. [cc lang=”sql”] DECLARE @first_char nvarchar(10) DECLARE @second_char nvarchar(10) SET @first_char = ‘.’; SET @second_char = ‘/’; SELECT SUBSTRING ( — column url — […] Continue reading ...

# Using Union Instead of OR

Sometimes slow queries can be rectified by changing the query around a bit.  One of these examples can be illustrated when multiple values are compared within a WHERE clause using an OR or IN statement.  Often times, OR can cause a scan against an index or table which may not be the preferable execution plan in terms of IO consumption, or overall query speed. A lot of variables come into play when the query optimizer creates an execution plan.  These variables include a multitude of hardware specs, instance settings, database settings, statistics (table, index, auto-generated), and also the way the query is written.  The one we are changing here is the way the query is written.  As unsuspecting as it may seem, even though two different queries can return the exact same results, the path at which they take can be entirely different just based upon the format of the query. UNION vs OR In most of my experience with SQL Server, the OR is generally less efficient than a UNION. What tends to happen with an OR is it causes a scan more often. Now this sometimes may actually be a better path to take in some cases and I’ll leave that to a separate article but overall I’ve found that touching more records is the biggest cause of slowness. So let’s start our comparison. Here is our OR statement: [cc lang=”sql”] SELECT SalesOrderID, * FROM sales.SalesOrderDetail WHERE ProductID = 750 OR ProductID = 953 [/cc] From this execution […]

# Selecting Top N Records per Group

When you have multiple rows for let’s say a productID, there will probably be times when you need to select the top n subset of each those productid’s based upon a sort criteria. This is a relatively common TSQL need and there are a couple different methods to do this. First let’s create some dummy data. [cc lang=”sql”] CREATE TABLE [#productzabub]( [SalesOrderID] [int] NOT NULL, [ProductID] [int] NOT NULL ) INSERT [#productzabub] ([SalesOrderID], [ProductID]) VALUES (71782, 707) INSERT [#productzabub] ([SalesOrderID], [ProductID]) VALUES (71783, 707) INSERT [#productzabub] ([SalesOrderID], [ProductID]) VALUES (71784, 707) INSERT [#productzabub] ([SalesOrderID], [ProductID]) VALUES (71782, 708) INSERT [#productzabub] ([SalesOrderID], [ProductID]) VALUES (71783, 708) INSERT [#productzabub] ([SalesOrderID], [ProductID]) VALUES (71784, 708) INSERT [#productzabub] ([SalesOrderID], [ProductID]) VALUES (71782, 711) INSERT [#productzabub] ([SalesOrderID], [ProductID]) VALUES (71783, 711) INSERT [#productzabub] ([SalesOrderID], [ProductID]) VALUES (71784, 711) INSERT [#productzabub] ([SalesOrderID], [ProductID]) VALUES (71782, 712) INSERT [#productzabub] ([SalesOrderID], [ProductID]) VALUES (71783, 712) INSERT [#productzabub] ([SalesOrderID], [ProductID]) VALUES (71784, 712) INSERT [#productzabub] ([SalesOrderID], [ProductID]) VALUES (71782, 714) INSERT [#productzabub] ([SalesOrderID], [ProductID]) VALUES (71783, 714) INSERT [#productzabub] ([SalesOrderID], [ProductID]) VALUES (71784, 714) INSERT [#productzabub] ([SalesOrderID], [ProductID]) VALUES (71782, 715) INSERT [#productzabub] ([SalesOrderID], [ProductID]) VALUES (71783, 715) INSERT [#productzabub] ([SalesOrderID], [ProductID]) VALUES (71784, 715) INSERT [#productzabub] ([SalesOrderID], [ProductID]) VALUES (71783, 716) INSERT [#productzabub] ([SalesOrderID], [ProductID]) VALUES (71784, 716) [/cc] Now that we have our data created, let’s select the top 2 rows for each productid based upon the earliest salesorderid. This salesorderid could actually represent any sequential column in the table, be it a date, or a primary […] Continue reading ...