SQL Server knowledge center

everything about SQL Server
See also: Other Geeks@INDC

Decorrelating Correlated Subquery

Decorrelating Correlated Subquery

By : Kasim Wirama, MCDBA

 

I have written posting about correlated and noncorrelated subquery. Now let’s take a look correlated subquery but optimizer generated noncorrelated query-like execution plan.

Here is query to display order for each customer who lives in Madrid

select orderid

from orders as o

where exists

     (

          select *

          from Customers as c

          where c.CustomerID = o.CustomerID

          and c.City = 'Madrid'        

     )

 

Execution plan for the correlated subquery is :

  |--Nested Loops(Inner Join, OUTER REFERENCES:(Coffee.[CustomerID]))

       |--Index Seek(OBJECT:([Northwind].[dbo].[Customers].[City] AS Coffee), SEEK:(Coffee.[City]=N'Madrid') ORDERED FORWARD)

       |--Index Seek(OBJECT:([Northwind].[dbo].[Orders].[CustomerID] AS Time), SEEK:(Time.[CustomerID]=[Northwind].[dbo].[Customers].[CustomerID] as Coffee.[CustomerID]) ORDERED FORWARD)

 

The execution above is same as execution plan for the following query :

select orderid

from orders as o

join Customers as c

on o.CustomerID = c.CustomerID

where c.City = 'Madrid'

 

Second example here is query returning freight and its average from all orders within same customer.

 

select o1.OrderID, o1.Freight,

     (select AVG(freight) from orders as o2

      where o2.CustomerID = o1.CustomerID

     ) as AverageFreight

from dbo.orders as o1

 

Its execution plan is :

 

  |--Compute Scalar(DEFINE:([Expr1006]=[Expr1004]))

       |--Hash Match(Right Outer Join, HASH:([o2].[CustomerID])=([o1].[CustomerID]), RESIDUAL:([Northwind].[dbo].[Orders].[CustomerID] as [o2].[CustomerID]=[Northwind].[dbo].[Orders].[CustomerID] as [o1].[CustomerID]))

            |--Compute Scalar(DEFINE:([Expr1004]=CASE WHEN [Expr1013]=(0) THEN NULL ELSE [Expr1014]/CONVERT_IMPLICIT(money,[Expr1013],0) END))

            |    |--Stream Aggregate(GROUP BY:([o2].[CustomerID]) DEFINE:([Expr1013]=COUNT_BIG([Northwind].[dbo].[Orders].[Freight] as [o2].[Freight]), [Expr1014]=SUM([Northwind].[dbo].[Orders].[Freight] as [o2].[Freight])))

            |         |--Sort(ORDER BY:([o2].[CustomerID] ASC))

            |              |--Clustered Index Scan(OBJECT:([Northwind].[dbo].[Orders].[PK_Orders] AS [o2]))

            |--Clustered Index Scan(OBJECT:([Northwind].[dbo].[Orders].[PK_Orders] AS [o1]))

 

The item immediately below Hash Match operator is scanning all orders and sort by customerid before they are computed for its average freight with grouping by CustomerId. To get freight column, optimizer decides to do clustered index scan, if you remove freight column, optimizer will choose index scan rather than clustered index scan, which is more efficient. There is possible that one customer may have more than one orders. For example, a customer has 5 orders. In this case optimizer will calculate average freight once for the customer, not 5 times for the customer. It is reflected with calculation of aggregation first before matched with the inner execution of clustered index scan of alias O1. The more orders for each customers, the more efficient execution plan, because it just calculates average value once for each customers.

 

Most cases, noncorrelated subquery is more efficient than correlated subquery. If you have options to change correlated subquery into non correlated subquery, it is better to do so.

 

Share this post: | | | |

Comments

Decorrelating Correlated Subquery | Freight News said:

Pingback from  Decorrelating Correlated Subquery | Freight News

# June 22, 2008 12:31 PM
Leave a Comment

(required) 

(required) 

(optional)

(required) 
Are you human?:  


Enter the numbers above: