How to cache stored procedure results using a hash key

Posted on May 16, 2015 by Derek Dieter
2
1

« Overcoming Issues with TempDB Contention

Creating log tables to track running processes »

There are a lot of different design patterns that lend themselves to creating the shortest path to the data. One of the most efficient is the caching of stored procedure result sets. In order to do this, we need to read the incoming parameters and create a cache key. This cache key is then stored along with the stored procedures result set as a unique identifier representing that combination of the stored procedures parameters. The caveat with this method is that the business requirement needs to allow stale data.

There are times where you will need to use values other than the passed in parameters in order to create the cache key. Some examples include datetime data types or keys that are unique (like a customerkey). If the hash that gets created from the parameters is unique, then you will never reuse that dataset again. With this in mind you would even have determine whether the procedure is even cacheable. Another concern to keep in mind is the duration of time you can serve stale data. Maybe 30 seconds, 1 minute, or 1 hour? Any time increment is able to be worked with by clearning the cache tables at the desired interval.

Design

Let’s look at the basic workflow for how this procedure will work. First of all, we will need to hash all the parameters that are coming into the procedure (unless they are unique in which case we may not be able to cache, or we can possibly use another value).

Caching the parameters

[cc lang=”sql”]
— example parameters
DECLARE @cust_type INT = 5
DECLARE @cust_location int = 10
DECLARE @location_category int = 20
DECLARE @page_number int = 2
DECLARE @records_per_page int = 100

DECLARE @param_hash varbinary(20)

SET @param_hash = HASHBYTES(‘MD5’,
coalesce(convert(nvarchar(MAX),@cust_type),”) +
coalesce(convert(nvarchar(MAX),@cust_location),”) +
coalesce(convert(nvarchar(MAX),@location_category),”) +
coalesce(convert(nvarchar(MAX),@page_number),”) +
coalesce(convert(nvarchar(MAX),@records_per_page),”)
)

SELECT @param_hash
[/cc]

Now we have the cache key that will correspond to the recordset that we either will create, or will look to see exists.

Managing the cache

You may have thought this through enough already and realized if we store all the result sets for the procedure in one table, then when we delete or update based on whether the cache is expired, we’re going run into either contention, or an empty result set. This is where our race condition safe — caching solution comes into play. It involves using 3 tables instead of 1 or 2 in order to store the cache results. Why 3? Hold on, here comes my attempt at explaining.

If we use 1 table, then when we need to invalidate the cache, we would have to perform an update on all the records for each existant cache key as they come in. This will no doubt cause contention with blocking.

If we use 2 tables, then we can switch over to table 2 once the cache is expired for table 1, then trucate table 1. However, a subtle race condition comes into play in this scenario. When the procedure is executed and it is determined that table 1 is the valid table to use, unless you wrap the entire code in an explicit transaction, by the time it reaches the bottom of the procedure, the cache may have switched to table 2 as being the primary cache table, meaning table 1 has now been truncated. Guess what, you’re returning no data now.

So that is why we use 3 tables. We do not truncate table 1 when table 2 represents the current cache table. That way, when we reach the end of the procedure, we can still serve a result set because nothing has been truncated. Instead we truncate table 3, which is two tables back from being used (surely not being queried any longer). We then round robin through the tables in this manner.

Here’s an example of the cache invalidator job that would run at a set interval

[cc lang=”sql”]

USE cachedb
GO

DECLARE @cache_table_old int
DECLARE @transaction_error int
DECLARE @debug_mode int = 0

BEGIN TRANSACTION

— find the table we need to truncate
SET @cache_table_old =
(
SELECT CASE WHEN cache_table_current = 1 THEN cache_table_count ELSE cache_table_current – 1 END
FROM proc_mutex pm
WHERE pm.[procedure_name] = ‘pr_procedure_example’
);

— set the new cache table (the cached procedure pr_procedure_example will read this table to determine cache)
UPDATE TOP(1) pm
SET cache_table_current = CASE WHEN cache_table_current = cache_table_count THEN 1 ELSE cache_table_current + 1 END,
datetime_modified = GETDATE()
FROM proc_mutex pm
WHERE pm.[procedure_name] = ‘pr_procedure_example’

set @transaction_error = @@ERROR;

IF @transaction_error != 0
BEGIN
ROLLBACK
END
ELSE
BEGIN

COMMIT TRANSACTION
— truncate the old cache table
IF @transaction_error = 0
BEGIN

IF @cache_table_old = 1
BEGIN
TRUNCATE TABLE cachedb.tmp.pr_procedure_example_cachedb_1
END
ELSE IF @cache_table_old = 2
BEGIN
TRUNCATE TABLE cachedb.tmp.pr_procedure_example_cachedb_2
END
ELSE IF @cache_table_old = 3
BEGIN
TRUNCATE TABLE cachedb.tmp.pr_procedure_example_cachedb_3
END

END

END
[/cc]

And here is the ddl for the proc_mutex table. This table will hold all the procedure names that you cache, along with the current cache table designation.

[cc lang=”sql”]
CREATE TABLE [dbo].[proc_mutex](
[proc_mutex_id] [int] IDENTITY(1,1) NOT NULL,
[procedure_name] [varchar](255) NOT NULL,
[cache_table_current] [int] NOT NULL,
[cache_table_count] [int] NOT NULL,
[datetime_modified] [datetime] NOT NULL
)
[/cc]

It is recommended you keep the caching on a separate database which is stored on a PCIE SSD card, or at least an SSD

Creating the procedure

Now that we have created the cache invalidator mechanism, along with the proc_mutex, we will write the first half of the procedure. This will ideally be a template you can follow:

[cc lang=”sql”]
USE proddb
GO

CREATE PROCEDURE dbo.pr_procedure_example
@cust_type INT = 5,
@cust_location int = 10,
@location_category int = 20
AS
BEGIN

/**********************************************************
* setup param_hash_key
***********************************************************/
DECLARE @procedure_name sysname ;
DECLARE @cache_table_number tinyint ;

SET @procedure_name = OBJECT_NAME(@@PROCID);
SET @cache_table_number = (SELECT TOP(1) pm.cache_table_current FROM cachedb.dbo.proc_mutex pm with (nolock) WHERE pm.[procedure_name] = @procedure_name);

DECLARE @param_hash varbinary(20)

/**************************************************************
* determine which table may have our param_hash_key
* if the key does exist, we will simply return our result set
* this will result in a fast return < 5ms
***************************************************************/

IF @cache_table_number = 1
BEGIN

IF EXISTS(
SELECT 1
FROM cachedb.tmp.pr_procedure_example_cachedb_1
WHERE param_hash = @param_hash
)
BEGIN

SELECT *
FROM cachedb.tmp.pr_procedure_example_cachedb_1
WHERE param_hash = @param_hash

RETURN

END
END
ELSE IF @cache_table_number = 2
BEGIN

IF EXISTS(
SELECT 1
FROM cachedb.tmp.pr_procedure_example_cachedb_2
WHERE param_hash = @param_hash
)
BEGIN

SELECT *
FROM cachedb.tmp.pr_procedure_example_cachedb_2
WHERE param_hash = @param_hash

RETURN

END
END
ELSE IF @cache_table_number = 3
BEGIN

IF EXISTS(
SELECT 1
FROM cachedb.tmp.pr_procedure_example_cachedb_3
WHERE param_hash = @param_hash
)
BEGIN

SELECT *
FROM cachedb.tmp.pr_procedure_example_cachedb_3
WHERE param_hash = @param_hash

RETURN

END
END
[/cc]

Above, we are looking up to see if the result set has already been created for that combination of parameters. If it does, we will return the result quite fast. Depending on the size of data and the disk speed it could be somewhere between 1ms to 10ms. Let’s take a look at the full example.

Full cached procedure example

[cc lang=”sql”]
USE proddb
GO

CREATE PROCEDURE dbo.pr_procedure_example
@cust_type INT = 5,
@cust_location int = 10,
@location_category int = 20
AS
BEGIN

SET @procedure_name = OBJECT_NAME(@@PROCID);
SET @cache_table_number = (SELECT TOP(1) pm.cache_table_current FROM cachedb.dbo.proc_mutex pm with (nolock) WHERE pm.[procedure_name] = @procedure_name);

DECLARE @param_hash varbinary(20)

IF @cache_table_number = 1
BEGIN

IF EXISTS(
SELECT 1
FROM cachedb.tmp.pr_procedure_example_cachedb_1
WHERE param_hash = @param_hash
)
BEGIN

SELECT *
FROM cachedb.tmp.pr_procedure_example_cachedb_1
WHERE param_hash = @param_hash

RETURN

END
END
ELSE IF @cache_table_number = 2
BEGIN

IF EXISTS(
SELECT 1
FROM cachedb.tmp.pr_procedure_example_cachedb_2
WHERE param_hash = @param_hash
)
BEGIN

SELECT *
FROM cachedb.tmp.pr_procedure_example_cachedb_2
WHERE param_hash = @param_hash

RETURN

END
END
ELSE IF @cache_table_number = 3
BEGIN

IF EXISTS(
SELECT 1
FROM cachedb.tmp.pr_procedure_example_cachedb_3
WHERE param_hash = @param_hash
)
BEGIN

SELECT *
FROM cachedb.tmp.pr_procedure_example_cachedb_3
WHERE param_hash = @param_hash

RETURN

END
END

/**************************************************************
* procedure logic goes here
***************************************************************/
SELECT *
INTO #holding_table
FROM proc_example_table
WHERE cust_type = @cust_type
AND cust_location = @cust_location
AND location_category = @location_category

/**************************************************************
* insert result set to cache table if not exists
***************************************************************/
IF @cache_table_number = 1
BEGIN

INSERT INTO cachedb.tmp.pr_procedure_example_cachedb_1
SELECT
param_hash = @param_hash,
*
FROM #holding_table
— prevent duplicate result sets
WHERE NOT EXISTS (SELECT 1 FROM cachedb.tmp.pr_procedure_example_cachedb_1 WHERE param_hash = @param_hash);

END
ELSE IF @cache_table_number = 2
BEGIN

INSERT INTO cachedb.tmp.pr_procedure_example_cachedb_2
SELECT
param_hash = @param_hash,
*
FROM #holding_table
WHERE NOT EXISTS (SELECT 1 FROM cachedb.tmp.pr_procedure_example_cachedb_2 WHERE param_hash = @param_hash);

END
ELSE IF @cache_table_number = 3
BEGIN

INSERT INTO cachedb.tmp.pr_procedure_example_cachedb_3
SELECT param_hash = @param_hash,
*
FROM #holding_table
WHERE NOT EXISTS (SELECT 1 FROM cachedb.tmp.pr_procedure_example_cachedb_3 WHERE param_hash = @param_hash);

END

/**************************************************************
* return back result set
***************************************************************/
IF @cache_table_number = 1
BEGIN
SELECT *
FROM cachedb.tmp.pr_procedure_example_cachedb_1
WHERE param_hash = @param_hash
END
ELSE IF @cache_table_number = 2
BEGIN
SELECT *
FROM cachedb.tmp.pr_procedure_example_cachedb_2
WHERE param_hash = @param_hash
END
ELSE IF @cache_table_number = 3
BEGIN
SELECT *
FROM cachedb.tmp.pr_procedure_example_cachedb_3
WHERE param_hash = @param_hash
END

END

[/cc]

There it is. Needless to say, the respective tables (cachedb.tmp.pr_procedure_example_cachedb_1) need to have the same schema as the procedure that’s being returned. And again, put these tables on a PCIE SSD, and setup the job to invalidate the cache at a set interval, and there you have it, a procedure cache mechanism to help you scale.

Post a comment

2 comments

Joe Mucchiello 12 Mar 2018 at 12:21 am

You should add delimiters between the parameters otherwise all of the following will have the same hash value:
exec spFoo ‘AB’, ‘CD’
exec spFoo ‘A’, ‘BCD’
exec spFoo ‘ABCD’, NULL

Since you should know the kind of data expected in the parameters, picking the delimiter should be easy (a comma, a semi-colon, etc). But do put some thought into the delimiter you pick.

Reply to this comment