User Defined Functions and Performance

There is definitely a lack of awareness in the SQL world regarding the use of user defined functions and the potential performance hit they can have when using within your queries. Don’t get me wrong, I would love nothing more than to be able to centralize my commonly used code into functions for reuse. In a lot of cases this is possible, however there are specific cases where this can cause a huge performance impact.

The Problem

The one thing we need to be aware of with SQL is that its efficiency lies in the fact that it deals with data in SETS. Meaning that its power does not come in performing row-by-row operations, rather it wants to retrieve chunks of data and manipulate them as recordsets. Keeping this in mind, you can look out for scenarios where certain operations will cause more of a row-by-row operation and therefore impact performance.

The most common no no, is the use of scalar functions within a set based operation. It seems (but I can’t prove) that SQL 2008 has actually made some great strides in being able to deal with these situations, however there will always be a negative impact. First, let’s look at a common scenario.

The Test

First, let’s deploy this scalar user defined function which calculates the End of month for a given date:

CREATE FUNCTION [dbo].[ufn_GetLastDayOfMonth] ( @pInputDate    DATETIME )
RETURNS DATETIME
BEGIN

    DECLARE @vOutputDate        DATETIME

    SET @vOutputDate = CAST(YEAR(@pInputDate) AS VARCHAR(4)) + '/' +
                       CAST(MONTH(@pInputDate) AS VARCHAR(2)) + '/01'
    SET @vOutputDate = DATEADD(DD, -1, DATEADD(M, 1, @vOutputDate))

    RETURN @vOutputDate

END

Now, lets run a test against the SalesOrderDetail table in Adventureworks. In the first example, we’ll put the scalar function within the select list passing in the value of ModifiedDate from the SalesOrderDetail table. In the second example, we’ll put only the code from the inline function within the select list.

DECLARE @now DATETIME;
SET @now = GETDATE();

-- use the scalar udf function
SELECT
    EOM = dbo.ufn_GetLastDayOfMonth2(ModifiedDate)
INTO #MEME
FROM Sales.SalesOrderDetail

SELECT 'scalar timing', DATEDIFF(ms, @now, GETDATE())
SET @now = GETDATE();

-- use only the code from the scalar function
SELECT
    EOM = DATEADD(DD, -1, DATEADD(M, 1, CAST(YEAR(ModifiedDate) AS VARCHAR(4)) + '/' + CAST(MONTH(ModifiedDate) AS VARCHAR(2)) + '/01'))
INTO #MEME2
FROM Sales.SalesOrderDetail

SELECT 'non scalar timing', DATEDIFF(ms, @now, GETDATE())

From the results, we can see that the scalar function made the query about 4 times slower. The reason for this is because the optimizer could not interweave the execution plan of the external query with the execution plan of the scalar function. The scalar function is basically a black box. Because of this, the query needed to do a row-by-row operation on the values being passed in, rather than deal with the result as a SET.

The Solution

So is there anyway around this? Well, there kind of is. The best way to intermingle execution plans is to use an inline table valued function. This should not be mistaken with a multi-valued table valued function (which performs as a black box). The inline table value function is able to expose its execution plan to the external query, and thus increasing throughput. Inline functions in a nutshell do not declare variables.

One other very important thing to note before creating the inline function is the datatype you define for the input parameter. It should ideally match the datatype you are going to pass in. So if you are going to pass in a datetime, do not define the input param as a smalldatetime (or vice versa). If you do, then there will be an implicit conversion between the join of the table and the function. And you want the table to join directly, without any conversions for speed.

Let’s create the same function but this time we’ll use an inline table valued function:

CREATE FUNCTION [dbo].[ufn_GetLastDayOfMonthTABLE] ( @pInputDate    DATETIME )
RETURNS TABLE
AS RETURN

    SELECT EOMDate = DATEADD(DD, -1, DATEADD(M, 1, CAST(YEAR(@pInputDate) AS VARCHAR(4)) + '/' +
                       CAST(MONTH(@pInputDate) AS VARCHAR(2)) + '/01'))

Now let’s run the same query above, but we’ll use an apply operator to return the result:

SET @NOW = GETDATE();

SELECT
    EOM = t.EOMDate
INTO #MEME3
FROM Sales.SalesOrderDetail sod
OUTER APPLY ufn_GetLastDayOfMonthTABLE(sod.ModifiedDate) t

SELECT DATEDIFF(ms, @now, GETDATE())

Now we see that the result is actually just as fast as the inline code and we gain the benefit of being able to encapsulate it.

So the conclusion? Scalar functions by themselves don’t slow down queries, people who place them in in set based statements slow down queries.

One comment
William 06 Feb 2015 at 2:23 pm

Great blog! Can you write an example where you return more than one column from the function?

Featured Articles

 Site Author

  • Thanks for visiting!