Study of Stream Aggregation
Study of Stream Aggregation
By : Kasim Wirama, MCDBA
Last article, I talk about scalar aggregation and also I gave example of scalar and stream aggregation. On last article, I also give possible execution plan for scalar aggregation. This article, I examine possible execution plan for stream aggregate. On next article, I would talk about hash aggregation. Now let’s take a look at execution plan of stream aggregate more closely.
Given the group by query below :
SELECT shipcity,count(*)
FROM orders
GROUP BY shipcity
Execution plan for query above is :
|--Compute Scalar(DEFINE:([Expr1003]=CONVERT_IMPLICIT(int,[Expr1006],0)))
|--Stream Aggregate(GROUP BY:([Northwind2].[dbo].[Orders].[ShipCity]) DEFINE:([Expr1006]=Count(*)))
|--Sort(ORDER BY:([Northwind2].[dbo].[Orders].[ShipCity] ASC))
|--Clustered Index Scan(OBJECT:([Northwind2].[dbo].[Orders].[PK_Orders]))
Compare execution with query scalar aggregation below :
SELECT count(*) FROM orders
|--Compute Scalar(DEFINE:([Expr1003]=CONVERT_IMPLICIT(int,[Expr1004],0)))
|--Stream Aggregate(DEFINE:([Expr1004]=Count(*)))
|--Index Scan(OBJECT:([Northwind2].[dbo].[Orders].[OrderDate]))
Notice that stream aggregation query contains Sort operator. If you create index on ShipCity, the execution plan will be similar with execution plan of scalar aggregation query :
|--Compute Scalar(DEFINE:([Expr1003]=CONVERT_IMPLICIT(int,[Expr1006],0)))
|--Stream Aggregate(GROUP BY:([Northwind2].[dbo].[Orders].[ShipCity]) DEFINE:([Expr1006]=Count(*)))
|--Index Scan(OBJECT:([Northwind2].[dbo].[Orders].[ix_shipcity]), ORDERED FORWARD)
If you just issue GROUP BY without aggregate function, there is no Stream Aggregate operator, it has Sort operator instead, as the following query and its execution plan.
SELECT shipcity
FROM orders
GROUP BY shipcity
|--Sort(DISTINCT ORDER BY:([Northwind2].[dbo].[Orders].[ShipCity] ASC))
|--Clustered Index Scan(OBJECT:([Northwind2].[dbo].[Orders].[PK_Orders]))
Now I progress to more complex query with combination between GROUP BY and DISTINCT aggregate here :
SELECT employeeid, count(DISTINCT customerid)
FROM orders
GROUP BY employeeid
Execution plan for the above query is :
|--Compute Scalar(DEFINE:([Expr1003]=CONVERT_IMPLICIT(int,[Expr1006],0)))
|--Stream Aggregate(GROUP BY:([Northwind2].[dbo].[Orders].[EmployeeID]) DEFINE:([Expr1006]=COUNT([Northwind2].[dbo].[Orders].[CustomerID])))
|--Sort(DISTINCT ORDER BY:([Northwind2].[dbo].[Orders].[EmployeeID] ASC, [Northwind2].[dbo].[Orders].[CustomerID] ASC))
|--Clustered Index Scan(OBJECT:([Northwind2].[dbo].[Orders].[PK_Orders]))
The query plan above is similar with first query plan. If you create index on employeeid and customerid, sort operator will be replaced with Stream Aggregate which is a query improvement.
Now how about execution plan for multiple distinct? Here is the query example :
SELECT employeeid, COUNT(DISTINCT customerid), COUNT(*)
FROM orders
GROUP BY employeeid
Query plan for above query is :
|--Compute Scalar(DEFINE:([Northwind2].[dbo].[Orders].[EmployeeID]=[Northwind2].[dbo].[Orders].[EmployeeID]))
|--Merge Join(Inner Join, MANY-TO-MANY MERGE:([Northwind2].[dbo].[Orders].[EmployeeID])=([Northwind2].[dbo].[Orders].[EmployeeID]), RESIDUAL:([Northwind2].[dbo].[Orders].[EmployeeID] = [Northwind2].[dbo].[Orders].[EmployeeID]))
|--Compute Scalar(DEFINE:([Northwind2].[dbo].[Orders].[EmployeeID]=[Northwind2].[dbo].[Orders].[EmployeeID]))
| |--Compute Scalar(DEFINE:([Expr1003]=CONVERT_IMPLICIT(int,[Expr1011],0)))
| |--Stream Aggregate(GROUP BY:([Northwind2].[dbo].[Orders].[EmployeeID]) DEFINE:([Expr1011]=COUNT([Northwind2].[dbo].[Orders].[CustomerID])))
| |--Sort(DISTINCT ORDER BY:([Northwind2].[dbo].[Orders].[EmployeeID] ASC, [Northwind2].[dbo].[Orders].[CustomerID] ASC))
| |--Clustered Index Scan(OBJECT:([Northwind2].[dbo].[Orders].[PK_Orders]))
|--Compute Scalar(DEFINE:([Northwind2].[dbo].[Orders].[EmployeeID]=[Northwind2].[dbo].[Orders].[EmployeeID]))
|--Compute Scalar(DEFINE:([Expr1004]=CONVERT_IMPLICIT(int,[Expr1012],0)))
|--Stream Aggregate(GROUP BY:([Northwind2].[dbo].[Orders].[EmployeeID]) DEFINE:([Expr1012]=Count(*)))
|--Index Scan(OBJECT:([Northwind2].[dbo].[Orders].[EmployeeID]), ORDERED FORWARD)
Notice that there are 2 groups of Compute Scalar operator below Merge Join operator. On my previous article, there is nested loop operator for scalar aggregation. For stream aggregation the operator here is merge join because customerid has been sorted on these two groups. Notice that type of Merge Join operator is many-to-many merge join. Actually it is one-to-many merge join, it means there is a little overhead issue.
Please don’t memorize for each pattern, but try to understand why it results an execution plan.