Decorrelating Correlated Subquery
Decorrelating Correlated Subquery
By : Kasim Wirama, MCDBA
I have written posting about correlated and noncorrelated subquery. Now let’s take a look correlated subquery but optimizer generated noncorrelated query-like execution plan.
Here is query to display order for each customer who lives in Madrid
select orderid
from orders as o
where exists
(
select *
from Customers as c
where c.CustomerID = o.CustomerID
and c.City = 'Madrid'
)
Execution plan for the correlated subquery is :
|--Nested Loops(Inner Join, OUTER REFERENCES:(
.[CustomerID]))
|--Index Seek(OBJECT:([Northwind].[dbo].[Customers].[City] AS
), SEEK:(
.[City]=N'Madrid') ORDERED FORWARD)
|--Index Seek(OBJECT:([Northwind].[dbo].[Orders].[CustomerID] AS
), SEEK:(
.[CustomerID]=[Northwind].[dbo].[Customers].[CustomerID] as
.[CustomerID]) ORDERED FORWARD)
The execution above is same as execution plan for the following query :
select orderid
from orders as o
join Customers as c
on o.CustomerID = c.CustomerID
where c.City = 'Madrid'
Second example here is query returning freight and its average from all orders within same customer.
select o1.OrderID, o1.Freight,
(select AVG(freight) from orders as o2
where o2.CustomerID = o1.CustomerID
) as AverageFreight
from dbo.orders as o1
Its execution plan is :
|--Compute Scalar(DEFINE:([Expr1006]=[Expr1004]))
|--Hash Match(Right Outer Join, HASH:([o2].[CustomerID])=([o1].[CustomerID]), RESIDUAL:([Northwind].[dbo].[Orders].[CustomerID] as [o2].[CustomerID]=[Northwind].[dbo].[Orders].[CustomerID] as [o1].[CustomerID]))
|--Compute Scalar(DEFINE:([Expr1004]=CASE WHEN [Expr1013]=(0) THEN NULL ELSE [Expr1014]/CONVERT_IMPLICIT(money,[Expr1013],0) END))
| |--Stream Aggregate(GROUP BY:([o2].[CustomerID]) DEFINE:([Expr1013]=COUNT_BIG([Northwind].[dbo].[Orders].[Freight] as [o2].[Freight]), [Expr1014]=SUM([Northwind].[dbo].[Orders].[Freight] as [o2].[Freight])))
| |--Sort(ORDER BY:([o2].[CustomerID] ASC))
| |--Clustered Index Scan(OBJECT:([Northwind].[dbo].[Orders].[PK_Orders] AS [o2]))
|--Clustered Index Scan(OBJECT:([Northwind].[dbo].[Orders].[PK_Orders] AS [o1]))
The item immediately below Hash Match operator is scanning all orders and sort by customerid before they are computed for its average freight with grouping by CustomerId. To get freight column, optimizer decides to do clustered index scan, if you remove freight column, optimizer will choose index scan rather than clustered index scan, which is more efficient. There is possible that one customer may have more than one orders. For example, a customer has 5 orders. In this case optimizer will calculate average freight once for the customer, not 5 times for the customer. It is reflected with calculation of aggregation first before matched with the inner execution of clustered index scan of alias O1. The more orders for each customers, the more efficient execution plan, because it just calculates average value once for each customers.
Most cases, noncorrelated subquery is more efficient than correlated subquery. If you have options to change correlated subquery into non correlated subquery, it is better to do so.