Performance Study of Scalar Aggregation

Performance Study of Scalar Aggregation

By : Kasim Wirama, MCDBA

 

You might use aggregate function in SQL Server such as COUNT, MIN, MAX, AVG, SUM and DISTINCT. Let’s look into these aggregate functions, and its execution plan. There is scalar aggregation and stream aggregation. A query is called scalar aggregation if only aggregate functions exist, while there is one or more columns besides aggregate function, it is called stream aggregation. I show everything about scalar aggregation in this article.

Examples of scalar aggregation is shown below :

SELECT COUNT(*) FROM orders;

SELECT MIN(orderdate), MAX(orderdate) FROM orders;

SELECT COUNT(DISTINCT  customerid) FROM orders;

Example of stream aggregation is shown below :

SELECT orderid, COUNT(*) FROM orders
GROUP BY orderid;

SELECT customerid, MAX(orderdate) FROM ORDERS
GROUP BY customerid;

SELECT DISTINCT customerid FROM orders;

SELECT employeeid, COUNT(DISTINCT customerid) FROM orders
GROUP BY  employeeid;

Let’s examine this query :

SELECT COUNT(*) FROM orders;

Execution plan for above query is :

  |--Compute Scalar(DEFINE:([Expr1003]=CONVERT_IMPLICIT(int,[Expr1004],0)))

       |--Stream Aggregate(DEFINE:([Expr1004]=Count(*)))

            |--Index Scan(OBJECT:([Northwind].[dbo].[Orders].[ShippersOrders]))

Notice that count(*) is recognized as Stream Aggregate operator.  How about more than one column aggregations? For example :

SELECT MIN(orderdate), MAX(orderdate) FROM orders;

The execution plan is :

  |--Stream Aggregate(DEFINE:([Expr1003]=MIN([Northwind].[dbo].[Orders].[OrderDate]), [Expr1004]=MAX([Northwind].[dbo].[Orders].[OrderDate])))

       |--Index Scan(OBJECT:([Northwind].[dbo].[Orders].[OrderDate]))

 

Notice that there is only one stream aggregate even though there is more than one aggregation function even it refers to different column. So what does execution plan look like if it aggregates different column? Take a look this example :

 

SELECT COUNT(DISTINCT  customerid), COUNT(DISTINCT  shipcity) FROM orders;

The execution plan is :

  |--Compute Scalar(DEFINE:([Expr1003]=CONVERT_IMPLICIT(int,[Expr1005],0), [Expr1004]=CONVERT_IMPLICIT(int,[Expr1006],0)))

       |--Stream Aggregate(DEFINE:([Expr1005]=COUNT([Northwind].[dbo].[Orders].[CustomerID]), [Expr1006]=COUNT([Northwind].[dbo].[Orders].[ShipCity])))

            |--Clustered Index Scan(OBJECT:([Northwind].[dbo].[Orders].[PK_Orders]))

 

Still you get one stream aggregate. It is consistent for scalar aggregate with no DISTINCT in query statement. Now let’s take a look for multi scalar aggregate with distinct in its query statement :

SELECT COUNT(DISTINCT  customerid), COUNT(DISTINCT  shipcity) FROM orders;

  |--Nested Loops(Inner Join)

       |--Compute Scalar(DEFINE:([Expr1003]=CONVERT_IMPLICIT(int,[Expr1009],0)))

       |    |--Stream Aggregate(DEFINE:([Expr1009]=COUNT([Northwind].[dbo].[Orders].[CustomerID])))

       |         |--Stream Aggregate(GROUP BY:([Northwind].[dbo].[Orders].[CustomerID]))

       |              |--Index Scan(OBJECT:([Northwind].[dbo].[Orders].[CustomersOrders]), ORDERED FORWARD)

       |--Compute Scalar(DEFINE:([Expr1004]=CONVERT_IMPLICIT(int,[Expr1010],0)))

            |--Stream Aggregate(DEFINE:([Expr1010]=COUNT([Northwind].[dbo].[Orders].[ShipCity])))

                 |--Sort(DISTINCT ORDER BY:([Northwind].[dbo].[Orders].[ShipCity] ASC))

                      |--Clustered Index Scan(OBJECT:([Northwind].[dbo].[Orders].[PK_Orders]))

Notice that, for DISTINCT in a query statement, there will be 2 big branches, and each of them will have at least one stream aggregate and 2 branches is joined with nested loop operator.

 

Now compare 2 query statement below :

SELECT COUNT(DISTINCT customerid) FROM orders

SELECT COUNT(DISTINCT shipcity) FROM orders                                                            

Execution plan for each of them are:

  |--Compute Scalar(DEFINE:([Expr1003]=CONVERT_IMPLICIT(int,[Expr1006],0)))

       |--Stream Aggregate(DEFINE:([Expr1006]=COUNT([Northwind].[dbo].[Orders].[CustomerID])))

            |--Stream Aggregate(GROUP BY:([Northwind].[dbo].[Orders].[CustomerID]))

                 |--Index Scan(OBJECT:([Northwind].[dbo].[Orders].[CustomersOrders]), ORDERED FORWARD)

 

  |--Compute Scalar(DEFINE:([Expr1003]=CONVERT_IMPLICIT(int,[Expr1006],0)))

       |--Stream Aggregate(DEFINE:([Expr1006]=COUNT([Northwind].[dbo].[Orders].[ShipCity])))

            |--Sort(DISTINCT ORDER BY:([Northwind].[dbo].[Orders].[ShipCity] ASC))

                 |--Clustered Index Scan(OBJECT:([Northwind].[dbo].[Orders].[PK_Orders]))

 

Notice that fonts that I make it bold, why there is sort operator besides stream aggregate operator after index scan operator? Stream aggregate happens when there is index on the associated column and vice versa for sort operator. Cost for stream aggregateoperator  is significantly smaller than sort operator, so stream aggregate operator in eliminating duplicate rows is more efficient than sort operator.

 

Share this post: | | | |
Published Saturday, May 24, 2008 2:12 PM by Kasim.Wirama
Filed under:

Comments

No Comments
Powered by Community Server (Commercial Edition), by Telligent Systems