Rebind and Rewind on Non Equal Correlated Subquery
Rebind and Rewind on Non Equal Correlated Subquery
By : Kasim Wirama, MCDBA
This posting, I would show other execution plan operator that has caching capability regarding to non correlated subquery that involved non equality relationship between inner and outer query.
What is the behaviour difference for optimizer to generate plan between correlated and noncorrelated subquery? In noncorrelated subquery, inner query doesn’t depend on outer query thus optimizer compute inner query first then outer query, whereas in correlated subquery, optimizer compute every row in outer query then to inner query because inner query depends on value of outer query.
Example of nonequality-correlated subquery is :
SELECT
orderid
FROM orders AS o1
WHERE freight <
(
SELECT avg(freight)
FROM orders AS o2
WHERE o2.orderdate < o1.orderdate
);
Execution plan for this query is :
|--Filter(WHERE:([Northwind].[dbo].[Orders].[Freight] as [o1].[Freight]<[Expr1004]))
|--Nested Loops(Inner Join, OUTER REFERENCES:([o1].[OrderDate]))
|--Clustered Index Scan(OBJECT:([Northwind].[dbo].[Orders].[PK_Orders] AS [o1]))
|--Index Spool (SEEK:([o1].[OrderDate]=[Northwind].[dbo].[Orders]. [OrderDate] as [o1].[OrderDate]))
|--Compute Scalar(DEFINE:([Expr1004]=CASE WHEN [Expr1011]=(0) THEN NULL ELSE [Expr1012]/CONVERT_IMPLICIT(money,[Expr1011],0) END))
|--Stream Aggregate (DEFINE:([Expr1011]=COUNT_BIG ([Northwind].[dbo].[Orders].[Freight] as [o2].[Freight]), [Expr1012]=SUM([Northwind].[dbo].[Orders].[Freight] as [o2].[Freight])))
|--Index Spool(SEEK:([o2].[OrderDate] < [Northwind].[dbo]. [Orders].[OrderDate] as [o1].[OrderDate]))
|--Clustered Index Scan (OBJECT:([Northwind].[dbo]. [Orders].[PK_Orders] AS [o2]))
First index spool is lazy spool and second one is eager spool (look at logicalOp column). For first access into inner query, optimizer scan orders table then create eager index spool (second spool). Eager index spool is also called as index on the fly spool because it eagerly build temporary index on entire outer input on first access. On subsequent access from outer query, average aggregation is handled by computer scalar and stream aggregate operator and then it is cached by first index spool (lazy index spool). It is called lazy index spool, because it cache result from immediate previous execution. When next input row has the same orderdate, then the result is taken from lazy index spool only. This behaviour is called rewind. But if next input row has different orderdate, then the result is taken from accumulated value in lazy index spool or if it is not in lazy index spool, the operator request input from stream aggregate and computer scalar operator. Such behaviour is called rebind because it rebinds new value into lazy index spool.
If you check number of rows of rebind and rewind at property of lazy index spool, their total will be equal to total number of rows in orders table.
Optimizer tends to create index spool operator when there are many duplicate value on orderdate column and there are a lot of input rows from outer query. So if you reduce the number of input column, index spool operator will be disappear. Look at this example and its execution plan below :
SELECT
orderid
FROM orders AS o1
WHERE freight <
(
SELECT avg(freight)
FROM orders AS o2
WHERE o2.orderdate < o1.orderdate
)
AND shipcity = 'Caracas';
Execution plan of the query is :
|--Filter(WHERE:([Northwind].[dbo].[Orders].[Freight] as [o1].[Freight]<[Expr1004]))
|--Nested Loops(Inner Join, OUTER REFERENCES:([o1].[OrderDate]))
|--Clustered Index Scan(OBJECT:([Northwind].[dbo].[Orders].[PK_Orders] AS [o1]), WHERE:([Northwind].[dbo].[Orders].[ShipCity] as [o1].[ShipCity]=N'Caracas'))
|--Compute Scalar(DEFINE:([Expr1004]=CASE WHEN [Expr1011]=(0) THEN NULL ELSE [Expr1012]/CONVERT_IMPLICIT(money,[Expr1011],0) END))
|--Stream Aggregate(DEFINE:([Expr1011]=COUNT_BIG([Northwind].[dbo].[Orders].[Freight] as [o2].[Freight]), [Expr1012]=SUM([Northwind].[dbo].[Orders].[Freight] as [o2].[Freight])))
|--Clustered Index Scan(OBJECT:([Northwind].[dbo].[Orders].[PK_Orders] AS [o2]), WHERE:([Northwind].[dbo].[Orders].[OrderDate] as [o2].[OrderDate]<[Northwind].[dbo].[Orders].[OrderDate] as [o1].[OrderDate]))
Observe that there is no index spool operator.