CROSS APPLY Explained
-
Posted on November 13, 2011 by Derek Dieter
-
26
My first introduction to the APPLY operator was using the DMVs. For quite a while after first being introduced, I didn’t understand it or see a use for it. While it is undeniable that it is has some required uses when dealing with table valued functions, it’s other uses evaded me for a while. Luckily, I started seeing some code that used it outside of table valued functions. It finally struck me that it could be used as a replacement for correlated sub queries and derived tables. That’s what we’ll discuss today.
I never liked correlated subqueries because it always seemed like adding full blown queries in the select list was confusing and improper.
[cc lang=”sql”]
SELECT
SalesOrderID = soh.SalesOrderID
,OrderDate = soh.OrderDate
,MaxUnitPrice = (SELECT MAX(sod.UnitPrice) FROM Sales.SalesOrderDetail sod WHERE soh.SalesOrderID = sod.SalesOrderID)
FROM AdventureWorks.Sales.SalesOrderHeader AS soh
[/cc]
It always seemed to me that these operations should go below the FROM clause. So to get around this, I would typically create a derived table. Which didn’t completely feel right either, but it was still just a bit cleaner:
[cc lang=”sql”]
SELECT
soh.SalesOrderID
,soh.OrderDate
,sod.max_unit_price
FROM AdventureWorks.Sales.SalesOrderHeader AS soh
JOIN
(
SELECT
max_unit_price = MAX(sod.UnitPrice),
SalesOrderID
FROM Sales.SalesOrderDetail AS sod
GROUP BY sod.SalesOrderID
) sod
ON sod.SalesOrderID = soh.SalesOrderID
[/cc]
What made this ugly was the need to use the GROUP BY clause because we could not correlate. Also, even though SQL almost always generates the same execution plan as a correlated sub query, there were times when the logic inside the derived table got so complex, that it would not limit the result set of the derived table by inferring the correlation first. This made this kind of query sometimes impractical.
Luckily, this is where the CROSS APPLY steps in so nicely. It gives us the best of both worlds by allowing us to correlate AND not have the query embedded in the select list:
[cc lang=”sql”]
SELECT
soh.SalesOrderID
,soh.OrderDate
,sod.max_unit_price
FROM AdventureWorks.Sales.SalesOrderHeader AS soh
CROSS APPLY
(
SELECT
max_unit_price = MAX(sod.UnitPrice)
FROM Sales.SalesOrderDetail AS sod
WHERE soh.SalesOrderID = sod.SalesOrderID
) sod
[/cc]
The other advantage this has over the correlated sub query is when we want to add more columns in our SELECT list, we do not have to completely repeat the entire query. We still have it in one place, making it somewhat modular. So instead of this:
[cc lang=”sql”]
SELECT
SalesOrderID = soh.SalesOrderID
,OrderDate = soh.OrderDate
,MaxUnitPrice = (SELECT MAX(sod.UnitPrice) FROM Sales.SalesOrderDetail sod WHERE soh.SalesOrderID = sod.SalesOrderID) — 1
,SumLineTotal = (SELECT SUM(LineTotal) FROM Sales.SalesOrderDetail sod WHERE soh.SalesOrderID = sod.SalesOrderID) — 2
FROM AdventureWorks.Sales.SalesOrderHeader AS soh
[/cc]
We have this:
[cc lang=”sql”]
SELECT
soh.SalesOrderID
,soh.OrderDate
,sod.max_unit_price
,sod.sum_line_total
FROM AdventureWorks.Sales.SalesOrderHeader AS soh
CROSS APPLY
(
SELECT
max_unit_price = MAX(sod.UnitPrice)
,sum_line_total = SUM(sod.LineTotal)
FROM Sales.SalesOrderDetail AS sod
WHERE soh.SalesOrderID = sod.SalesOrderID
) sod
[/cc]
As for the execution plans, in my experience CROSS APPLY has always won. Not always by a lot, but it still wins.
So what is OUTER APPLY? It’s equivalent to a left join on the derived table.
[cc lang=”sql”]
SELECT
soh.SalesOrderID
,soh.OrderDate
,sod.max_unit_price
FROM AdventureWorks.Sales.SalesOrderHeader AS soh
LEFT JOIN
(
SELECT
max_unit_price = MAX(sod.UnitPrice),
SalesOrderID
FROM Sales.SalesOrderDetail AS sod
GROUP BY sod.SalesOrderID
) sod
ON sod.SalesOrderID = soh.SalesOrderID
[/cc]
[cc lang=”sql”]
SELECT
soh.SalesOrderID
,soh.OrderDate
,sod.max_unit_price
FROM AdventureWorks.Sales.SalesOrderHeader AS soh
OUTER APPLY
(
SELECT
max_unit_price = MAX(sod.UnitPrice)
FROM Sales.SalesOrderDetail AS sod
WHERE soh.SalesOrderID = sod.SalesOrderID
) sod
[/cc]
- Comments (RSS)
- Trackback
- Permalink
Simple, accurate, the best explanation that I’ve found so far.
Very… very useful.
Thank you!
I think you’ve missed one of the best uses for cross apply…
select c.newcalc
from table_a a
left join table_b on a.coln=b.coln
cross apply (select a.somecolumn+b.somecolum ‘newcalc’) c /*enter any statement in here (i.e case statements)*/
this is really powerful, as say if you need to reference the column ‘newcalc’ multiple times, you don’t need to repeat anything. Such a simple way too, and it just works 🙂
Using either of those methods would increase the complexity of the execution plan. If you are using something like a stored procedure to store the SQL code, it would make better sense to use a variable in the where clause if you are expected only one value such a single max value. If you are having multiple sets of data it would make since to have the inner query precalculated as a temporary or variable table and then join to the main query.
SQL does not like complex queries because SQL has only limited time to calculate possibly different execution plans before it must choose which one to run. If the query is too complicated with hundreds of thousands of possible ways to perform the physical operations than SQL is more than likely choose a bad performing execution plan.
I found from experience that breaking out complicated queries into multiple steps, even though the code is longer, can have much better performance than one large query because SQL thrives in simplicity. It’s very noticeable when you are dealing with millions to billions of records.
FYI: When I view the page the sql formatting appears to have been lost.
This is still one of the best CROSS APPLY articles out there some six+ years later. Great job!
The one part I’m unsure about is performance. The execution plans for all three statement versions here are 33% relative to batch (after removing the CROSS APPLY’s “sum_line_total” that was causing its cost to jump to 76% relative to batch). The reads at each step are identical in all plans. Perhaps optimizer improvements in MSSQL16 have improved the performance of the correlated subquery. If so, the argument for CROSS APPLY would seem to be readability.
Just wanted to thank you for posting this. I know sometimes blogs feel more like messages in bottles thrown into the sea. But this one found me and it was extremely timely. I spent about 6 hours working on the LEFT JOIN approach and as you described it just didn’t seem right. I now believe, based on your definition of the OUTER APPLY that I understand it.
Whoa! That’s easy and you explained it well! Many times I write a common table expression. For example, WITH blaTable AS (SELECT MAX(whatever), fk_ID FROM someTable GROUP BY fk_ID)….
While this works well, I’ve often wanted a more logical looking way especially when the queries get long.
Awesome explanation. Thank you. I’ve used CROSS APPLY before but only in a single context. I struggle with Joins that have select statements and subqueries within select statements because it’s just not how my brain works… Cross Apply just makes sense to me.
Seriously, this was so helpful! Totally allowed me to perform further calculations on calculated columns from my cross applied query in my main query. Thank you!
These examples are fantastic and very helpful! Thanks very much for this awesome article.
I don’t understand why you’re using CROSS APPLY here instead of simply the following standard join + group:
SELECT soh.SalesOrderID, soh.OrderDate,
MAX(sod.UnitPrice) AS max_unit_price,
SUM(sod.LineTotal) AS sum_line_total
FROM Sales.SalesOrderHeader soh
INNER JOIN Sales.SalesOrderDetail sod ON soh.SalesOrderID = sod.SalesOrderID
GROUP BY soh.SalesOrderID, soh.OrderDate
Can you explain why CROSS JOIN makes a difference in your examples?
He’s using a CROSS APPLY, not a CROSS JOIN.
“Can you explain why CROSS JOIN makes a difference in your examples?” – Alistair
CROSS JOIN is completely different than a CROSS APPLY. A CROSS JOIN returns a combination of all records (a Cartesian product) found in both tables. For example, if # of rows in table A = 100 and # of rows in table B = 5, a CROSS JOIN between the 2 tables (A * B) would return 500 rows total.
A CROSS APPLY functions similarly to an INNER JOIN as far as rows returned (the data and # of rows would be the same), except that performance is usually better. A derived query like the author is using isn’t how I’d typically script it (instead, you might as well use a Common Table Expression and join on the CTE, since that’s typically better for performance).
However, a CROSS APPLY can be especially useful with a table-valued function. In your query, CROSS APPLY to a table-valued function (with what would’ve been the derived query, instead, encapsulated inside of the function), and no “ON” clause needed. SalesOrderID is passed to the SQL function as a parameter. Like this…
SELECT
soh.SalesOrderID
,soh.OrderDate
,sod.max_unit_price
,sod.sum_line_total
FROM AdventureWorks.Sales.SalesOrderHeader AS soh
CROSS APPLY tblValuedFunction_sod (soh.SalesOrderID) AS sod
Of course, typically, the above would be done when you want to reuse the logic that’s in the function in multiple SQL views/functions (otherwise, just use a CTE).
They are correct! You should be a published author! I understood it so well!!
Why couldn’t you use
SELECT
soh.SalesOrderID
,soh.OrderDate
,MAX(sod.UnitPrice)
FROM AdventureWorks.Sales.SalesOrderHeader AS soh
JOIN Sales.SalesOrderDetail AS sod ON soh.SalesOrderID = sod.SalesOrderID
GROUP BY soh.SalesOrderID
,soh.OrderDate
in your first example of cross apply? It seems to me even more legible.
This is a very good article on the use of CROSS APPLY. I ran these queries against an Adventureworks database at our site and I did find one thing that surprised/confused me. I set STATISTICS IO ON and when comparing the two queries below, I would have thought they would have been at least equivalent in scans and logical reads, but the latter query had counts much higher. What am I missing?
SELECT
SalesOrderID = soh.SalesOrderID
,OrderDate = soh.OrderDate
,MaxUnitPrice = (SELECT MAX(sod.UnitPrice) FROM Sales.SalesOrderDetail sod WHERE soh.SalesOrderID = sod.SalesOrderID) — 1
,SumLineTotal = (SELECT SUM(LineTotal) FROM Sales.SalesOrderDetail sod WHERE soh.SalesOrderID = sod.SalesOrderID) — 2
FROM AdventureWorks.Sales.SalesOrderHeader AS soh
Table ‘SalesOrderHeader’. Scan count 1, logical reads 703, . . .
Table ‘SalesOrderDetail’. Scan count 2, logical reads 2482, . . .
SELECT
soh.SalesOrderID
,soh.OrderDate
,sod.max_unit_price
,sod.sum_line_total
FROM AdventureWorks.Sales.SalesOrderHeader AS soh
CROSS APPLY
(
SELECT
max_unit_price = MAX(sod.UnitPrice)
,sum_line_total = SUM(sod.LineTotal)
FROM Sales.SalesOrderDetail AS sod
WHERE soh.SalesOrderID = sod.SalesOrderID
) sod
Table ‘SalesOrderDetail’. Scan count 31465, logical reads 101815, . . .
Table ‘SalesOrderHeader’. Scan count 1, logical reads 703, . . .
The stats obtained are the acid test about whether a technique is useful or not. In this case I think that the query optimiser is unable to work out what NEEDS to be done to satisfy the query as opposed to what you are telling it to do. It may be clearer to a human but the optimiser is obstructed. Perhaps the conclusion is that we should stick to using APPLY when table valued functions have to be used (i.e. the use for which MS recommends this compnent of SQL)
I never reply to these kinds of things either, but this was so well written and clear that I now feel pretty comfortable with cross apply. I came across it maintaining someone else’s code that was using cross apply on two function calls, and it wasn’t quite clear how it was working. This has been very helpful. Write a book on SQL 🙂
Best explanation via example I’ve seen on this! Now if only I could get you Sql Server people to use “select 1 as a” instead of “select a=1”. 🙂
No doubt.. that is a BAD habit I agree.. I’m ridding myself of it as well.
Has its use…
WITH order_counts AS (
…
),
customer_order_counts AS (
…
)
— UN-comment next line to alternate between update and select
— UPDATE COC SET /*
SELECT *, — */
order_count = COC.new_order_cnt — COC.new_order_cnt AS order_count
, … = ….
FROM customer_order_counts AS COC;
similarly
— UN-comment next line to alternate between delete and select
— DELETE /*
SELECT * FROM –*/
BRANCH_INSTPAYMENTS
WHERE [Client_RecordId] = @ACCOUNTNO
AND AgreementNo = @AGREEMENTNO
–AND StatusDate > dateadd(day, -7, getdate()
You should know that I NEVER comment on these types of sites, but this explanation was so excellent and concise that I felt the need to express my gratitude. Thank you and very well done!
I agree completely. I was reading the example provided by Microsoft online and they use a function that has a cte in it and it is going from other tables to that and the explanation was driving me crazy, I read this and it made ALL the more sense to me. Thanks Derek!
Ok, So here is what i have..
SELECT Portfolio FROM [DBNmae].[dbo].[TableNM] AS SYN
INNER JOIN(
SELECT PARENTACCOUNT
FROM [DBServer].[DBName].[dbo].[TableNM] AS ARC
WHERE CONVERT(CHAR(10),POSTDATE,101)=CONVERT(CHAR(10),GETDATE()-1,101)
AND USERNUMBER!=898
AND USERNUMBER!=907
AND USERNUMBER!=800
AND USERNUMBER!=902
AND USERNUMBER!=808
GROUP BY PARENTACCOUNT) ARC
ON SYN.Portfolio = ARC.PARENTACCOUNT
What I need to do now, is use the results of the above query to pull the USERNUMBER associated with the query on the inner join.
nobody in this world could tell the real use of Cross Apply, but you have done it and that too with a very simple example. Thanks !!!
Nice article! Thanks.
Also thanking those who explained the difference between CROSS & OUTER ‘Apply’s
AS a new dev just starting to write more complex sql queries without the use of ctes or inner joins, this was super helpful. Thanks!
Thanks for the explanation and great examples. I see this getting a lot of use.
Your explanation is great! I don’t see much of a difference between a join and a cross apply though.
I agree, it is the best and easiest explanation I found, with good usage examples. But I learned something today.
In the first example with sub query as a column, all records in SalesOrderHeader will be returned, whether there’s a match in SalesOrderDetail or not (everyone knows this).
Second example, inner join to derived table with group, only the records where SalesOrderID is matched between Header and Detail table will be returned (everyone knows).
The third example with the cross apply, it is the same as the inner join. It will return data only where there is a match in SalesOrderID, like the inner join. This is what I learned. Cross Apply will filter out data if there is no match. Therefore, the first example and third example will not necessarily return the same dataset.
thank you for your article. I use cross apply and outer apply all the time and now i have a url to give to others so I don’t have to explain it.
Your article helped me to refresh my knowledge quickly. Thanks!
The 2nd example, with GROUP BY in sod, does not yield the same result as example 3 (first stmt with CROSS APPLY.) The 2nd example will only yield SalesOrderHeader records that have details but the 3rd example will yield all headers, with or without details. Now maybe in this DB all headers have details, but that is not necessarily true for other databases in which a CROSS APPLY may be useful.
Excellent article. Very nicely explained.
However, I have a question.
I expect the following query to exclude the null values from results, but it returns null for units column when there arent any matching receipt_ids in sales_detail table.
select h.receipt_id, d.units
from sales_header h
cross apply
(
select sum(d.units) as units
from sales_details d
where d.receipt_id = h.receipt_id
) d
Great explanation indeed. It allows to correlate in a derived table and thus you can reference multiple columns in the select list.
I’ve been working in databases for years and have never come across a Cross Apply or Outer Apply. I’ve built some wild stuff with recursive querying and everything. I saw a query this morning online with an Outer Apply and had to look it up. I’ll be refactoring some of my recent work using this logic. I love the idea of not having to use a group by in a derived table. Thanks for the explanation!
I’ve been building queries for years and I keep forgetting about this. It saved my butt a year back, but this morning after a couple of hours of messing around and refusing to create double subqueries (example above, with two subqueries in the select), your article reminded me of outer apply again. It really does have a nice use at times, and is also massively faster than the execution was with the two sub-queries
came across today and it’s the simplest and best explanation of cross apply I have come across.thanks
The explanation was pretty good. Just the right one I needed. Thank you!
Thanks for your reply . I made some progress, after using non-cached cecioltlon at least it does not say No Data Available butIn the Recommended Tables Based on Contention ( this time I used exactly your example with AdentureWorks2012 and EXEC test1 ) the tables now appear but when I click on Sales Order Details I get Error A data source instance has not been supplied for the data source Scan Details’ . Since this is the same CTP1 , same database, same procedure this looks strange . Any idea?
Love the way you explain this. Definitely very helpful, thank you.
This explanation was really excellent and much better than many of the others I’ve seen on this topic. I never liked having to repeat the same correlated subquery multiple times in the select statement. Now I don’t have to! Thanks!
Great explanation and example! Thanks. I am wondering if the same effect might be accomplished using a common table expression? In other words, the part that would be the correlated subquery were defined as a CTE grouped as required and then joined into the main query. I only ask because I need to do something like this and the technical spec actually used the Crossapply approach so I am wondering if I am missing something or a CTE could also work.
Thanks,
Bryan
I was thinking the same thing.
More readable than most other approaches, and extremely flexible.
For example:
WITH MaxPrice as
(SELECT
SalesOrderID,
MAX(sod.UnitPrice) as max_unit_price
From Sales.SalesOrderDetail
GROUP BY
SalesOrderID
)
SELECT
soh.SalesOrderID
,soh.OrderDate
,mp.max_unit_price
FROM AdventureWorks.Sales.SalesOrderHeader AS soh
LEFT JOIN
MaxPrice mp
ON soh.SalesOrderID = mp.SalesOrderID