SQL Server knowledge center

everything about SQL Server
See also: Other Geeks@INDC

Study of Stream Aggregation

Study of Stream Aggregation

By : Kasim Wirama, MCDBA

 

Last article, I talk about scalar aggregation and also I gave example of scalar and stream aggregation. On last article, I also give possible execution plan for scalar aggregation. This article, I examine possible execution plan for stream aggregate. On next article, I would talk about hash aggregation. Now let’s take a look at execution plan of stream aggregate more closely.

 

Given the group by query below :

 

SELECT shipcity,count(*)
FROM orders
GROUP BY shipcity

Execution plan for query above is :

  |--Compute Scalar(DEFINE:([Expr1003]=CONVERT_IMPLICIT(int,[Expr1006],0)))

       |--Stream Aggregate(GROUP BY:([Northwind2].[dbo].[Orders].[ShipCity]) DEFINE:([Expr1006]=Count(*)))

            |--Sort(ORDER BY:([Northwind2].[dbo].[Orders].[ShipCity] ASC))

                 |--Clustered Index Scan(OBJECT:([Northwind2].[dbo].[Orders].[PK_Orders]))

Compare execution with query scalar aggregation below :

SELECT count(*) FROM orders

  |--Compute Scalar(DEFINE:([Expr1003]=CONVERT_IMPLICIT(int,[Expr1004],0)))

       |--Stream Aggregate(DEFINE:([Expr1004]=Count(*)))

            |--Index Scan(OBJECT:([Northwind2].[dbo].[Orders].[OrderDate]))

Notice that stream aggregation query contains Sort operator. If you create index on ShipCity, the execution plan will be similar with execution plan of scalar aggregation query :

  |--Compute Scalar(DEFINE:([Expr1003]=CONVERT_IMPLICIT(int,[Expr1006],0)))

       |--Stream Aggregate(GROUP BY:([Northwind2].[dbo].[Orders].[ShipCity]) DEFINE:([Expr1006]=Count(*)))

            |--Index Scan(OBJECT:([Northwind2].[dbo].[Orders].[ix_shipcity]), ORDERED FORWARD)

If you just issue GROUP BY without aggregate function, there is no Stream Aggregate operator, it has Sort operator instead, as the following query and its execution plan.

SELECT shipcity
FROM orders
GROUP BY shipcity

 

  |--Sort(DISTINCT ORDER BY:([Northwind2].[dbo].[Orders].[ShipCity] ASC))

       |--Clustered Index Scan(OBJECT:([Northwind2].[dbo].[Orders].[PK_Orders]))

Now I progress to more complex query with combination between GROUP BY and DISTINCT aggregate here :

SELECT employeeid, count(DISTINCT customerid)
FROM orders
GROUP BY employeeid

Execution plan for the above query is :

  |--Compute Scalar(DEFINE:([Expr1003]=CONVERT_IMPLICIT(int,[Expr1006],0)))

       |--Stream Aggregate(GROUP BY:([Northwind2].[dbo].[Orders].[EmployeeID]) DEFINE:([Expr1006]=COUNT([Northwind2].[dbo].[Orders].[CustomerID])))

            |--Sort(DISTINCT ORDER BY:([Northwind2].[dbo].[Orders].[EmployeeID] ASC, [Northwind2].[dbo].[Orders].[CustomerID] ASC))

                 |--Clustered Index Scan(OBJECT:([Northwind2].[dbo].[Orders].[PK_Orders]))

The query plan above is similar with first query plan. If you create index on employeeid and customerid, sort operator will be replaced with Stream Aggregate which is a query improvement.

Now how about execution plan for multiple distinct? Here is the query example :

SELECT employeeid, COUNT(DISTINCT customerid), COUNT(*)

FROM orders

GROUP BY employeeid

Query plan for above query is :

  |--Compute Scalar(DEFINE:([Northwind2].[dbo].[Orders].[EmployeeID]=[Northwind2].[dbo].[Orders].[EmployeeID]))

       |--Merge Join(Inner Join, MANY-TO-MANY MERGE:([Northwind2].[dbo].[Orders].[EmployeeID])=([Northwind2].[dbo].[Orders].[EmployeeID]), RESIDUAL:([Northwind2].[dbo].[Orders].[EmployeeID] = [Northwind2].[dbo].[Orders].[EmployeeID]))

            |--Compute Scalar(DEFINE:([Northwind2].[dbo].[Orders].[EmployeeID]=[Northwind2].[dbo].[Orders].[EmployeeID]))

            |    |--Compute Scalar(DEFINE:([Expr1003]=CONVERT_IMPLICIT(int,[Expr1011],0)))

            |         |--Stream Aggregate(GROUP BY:([Northwind2].[dbo].[Orders].[EmployeeID]) DEFINE:([Expr1011]=COUNT([Northwind2].[dbo].[Orders].[CustomerID])))

            |              |--Sort(DISTINCT ORDER BY:([Northwind2].[dbo].[Orders].[EmployeeID] ASC, [Northwind2].[dbo].[Orders].[CustomerID] ASC))

            |                   |--Clustered Index Scan(OBJECT:([Northwind2].[dbo].[Orders].[PK_Orders]))

            |--Compute Scalar(DEFINE:([Northwind2].[dbo].[Orders].[EmployeeID]=[Northwind2].[dbo].[Orders].[EmployeeID]))

                 |--Compute Scalar(DEFINE:([Expr1004]=CONVERT_IMPLICIT(int,[Expr1012],0)))

                      |--Stream Aggregate(GROUP BY:([Northwind2].[dbo].[Orders].[EmployeeID]) DEFINE:([Expr1012]=Count(*)))

                           |--Index Scan(OBJECT:([Northwind2].[dbo].[Orders].[EmployeeID]), ORDERED FORWARD)

 

Notice that there are 2 groups of Compute Scalar operator below Merge Join operator. On my previous article, there is nested loop operator for scalar aggregation. For stream aggregation the operator here is merge join because customerid has been sorted on these two groups. Notice that type of Merge Join operator is many-to-many  merge join. Actually it is one-to-many merge join, it means there is a little overhead issue.

 

Please don’t memorize for each pattern, but try to understand why it results an execution plan.

 

Share this post: | | | |

Comments

Rangga Praduwiratna said:

Good article. Anyway, is this article that you meaned later in SQL Server User Group Indonesia milis? Good post, I'm still collecting and learning about join function and other T-SQL languange in SQL Server, since I'm still weak in SQL Server Programming. Thanks again!

Regards,

Rangga P.

# May 30, 2008 11:46 PM

Kasim.Wirama said:

About rewind and rebinds? no, it isn't. I will let you know about rewind and rebind discussion. You're welcomed.

regards,

Kasim Wirama

# June 2, 2008 4:46 PM
Leave a Comment

(required) 

(required) 

(optional)

(required) 
Are you human?:  


Enter the numbers above: