Highly concurrent systems that feed off normalized data stores typically require a middle layer of logic to serve the front-end needs. More often than not, this middle layer of logic is stored in the same stored procedures that the web layer accesses. While sometimes this may be the right place for simple logic, for more complex calculations and joins it is simply not efficient. The answer in these cases is to create a new meta layer of data that pre-joins data and rolls up necessary aggregations. To paint a better picture – in an ideal database each procedure that feeds the front-end would house a simple select statement from a single table. We know in real life this is not always possible, however we should think in these terms with every web proc we write. The reason is simple – complex logic is both IO and CPU intensive. We have no control over the web traffic, but we do have control over what logic we use to serve the web. Often times it is better to run jobs in the background that perform complicated logic on behalf of the web procs and dump the results into static tables. This methodology basically creates a denormalized meta layer of data on top of the normalized data. The argument against this is that the data will not truly be real-time. However you need to ask yourself what’s more important, “real-time” data that is 5-10 times slower or preaggregated data that is potentially seconds [...]
Continue reading ...
Every once in a while I’ll find a field someone created where the datatype choice was not the best. This comes up relatively often with Varchar vs Char. All of the confusion just comes from not understanding the differences and causes and effects of each. With that, I’ll outline the only times I will use one over the other. Let’s take a look at the behavior of char vs varchar: CREATE TABLE #meme ( first_name CHAR(50), last_name VARCHAR(50) ) INSERT INTO #meme (first_name, last_name) SELECT ‘john’, ‘smith’ SELECT * FROM #meme SELECT DATALENGTH(first_name), DATALENGTH(last_name) FROM #meme You can see from the value returned from DATALENGTH that first_name is 50 bytes long despite only taking 4 characters. The other characters are empty strings. Storage wise, varchar is out the gate up to 2 bytes larger than a char because it needs to store the length. It’s 1 byte larger if it’s over 255 characters. That’s why you will often see varchar definitions as varchar(255). Pain-in-the-butt wise, char is far more a pain to deal with unless all the data in the char column takes up the exact amount of space defined for the char column. I typically only use char if all the fields are going to be the exact same size guaranteed. Even then, I’m hesitant. Partly because disk space is not as much of an issue as it was before, and mostly because if you DO have variable length fields in a char column, then the remaining [...]
Continue reading ...
This is typically a hot topic and I’m going to try and tackle it with my rudimentary math. Ultimately given time and the frequency of generation, there is no such thing as a completely unique random number. There will always be some chance that a random number can be regenerated even though the chances do go way down when given a larger set of bytes. Using Numbers Let’s first look at probably the most random way to generate a random number. Using a function introduced in SQL 2008 called CRYPT_GEN_RANDOM(). This function takes a parameter of byte length, and returns a random hex based number within that range of bytes. We can easily convert this varbinary data to an integer based value. This function is based upon a low level windows API and is cryptographically secure. SELECT CAST(CRYPT_GEN_RANDOM(8) AS BIGINT) Using the method above returning an 8 byte random number and casting it as a bigint, your chances of repeating a duplicate are (roughly) between 1 and 15 billion. To break that out in terms of time, if you generated a random number every second you would without a doubt hit a duplicate random number between 31 and 47 years. However there’s no guarantee that it might not happen way before or way after however the chances go down the farther you move from the mean. Even though this may sound like a long time, the randomness that a bigint provides may not be enough for your application. In which [...]
Continue reading ...
This is an invaluable trick that can be sometimes be used instead of the BETWEEN operator. A common place this can be used is in an historical table that contains EffectiveFrom & EffectiveTo dates. In these kinds of tables, many historical versions of a record can be stored. Each record in this scenario will contain a Start & End Date which signifies the time span when a record is active. Because two records cannot be active at the same time, we can be sure that the Start & End dates will never overlap. While this describes one particular scenario for this example, there are many other situations this method can be used. Let’s look at the typical query we would use to find the record which is active for an employee at a specific point in time. SELECT * FROM [HumanResources].[EmployeeDepartmentHistory] WHERE EmployeeID = 274 AND ’2001-04-28′ BETWEEN StartDate AND EndDate This query will return to us the single record that is active as of ’2001-04-28′. However if you think about the constraints our table contains, we actually only need to query one column (the StartDate) in order to isolate this record. This is because the StartDate will not overlap any other StartDate & EndDate for this employee. Here’s an example: SELECT TOP 1 * FROM [HumanResources].[EmployeeDepartmentHistory] WHERE EmployeeID = 274 AND StartDate >= ’2001-04-28′ ORDER BY StartDate Because only one record can be active for an employee at a single point in time, we can be sure that if [...]
Continue reading ...
If you’ve ever wondered how to make a procedure available in any database, it’s actually pretty simple. If you create a procedure in the master database with the prefix of “sp_”, it will be callable from any database. I personally think this feature is great for utilities, however I would not use this for dependent objects. In other words, I would not advise creating a procedure (or function that is referenced by other procedures) within the master database. The biggest reason is that when restoring databases to other servers, you typically do not restore the master database. So it can be easily left out or forgotten. If that happens you may have a mission critical troubleshooting situation on your hands. However sometimes there may not be a way around it. But you can look into utilizing synonyms as an alternative. Another interesting observation regarding placing procedures in the master database with the sp_ prefix is that the procedure is actually executed from the context of the master database. In other words, if you create a procedure to enumerate all the other procedures within your database and place it in master, once you execute the procedure from a database other than master, it will still enumerate the procedures in the master database. Here’s an example: After creating this procedure and calling it from AdventureWorks, it still enumerates all the procedures within the master database. To get around this, we can use the undocumented procedure which will mark the procedure as a [...]
Continue reading ...