SQL Server knowledge center

everything about SQL Server
See also: Other Geeks@INDC

Parallelism in Hash Join

Parallelism in Hash Join

By : Kasim Wirama, MCDBA 

In this posting, I would talk about parallelism in hash join. If you would like to know algorithm in hash join, you can refer to my past posting titled Nested Loop, Merge Join, and Hash Join Algorithm. (http://geeks.netindonesia.net/blogs/kasim.wirama/archive/2008/05/18/nested-loop-merge-join-and-hash-join-algorithm.aspx)

 

Hash join has 2 input sets, build rows and probe rows. As in single hash join, parallelism in hash join doesn’t require input sets are in sorted order. So parallelism in hash join is categorized as non-merging exchange which is better than merging exchange (for example: parallelism in merge join uses merging exchange).  To be able to make build rows and probe rows are in the same threads, optimizer use hash partitioning to distribute them to threads. Each execution is thread independent. This type of hash join parallelism is called hash partitioning. Other type of hash join parallelism is broadcast partitioning. Both of them distribute input sets to consumer threads but, as the name implies, broadcast partitioning will broadcast every build input sets to all consumer threads. Implication of broadcast partitioning is memory consumption is proportional when more threads are allocated whereas the number of memory is same when a number of threads increase. When does broadcast partitioning in hash join happen? It tends to happen when a number of rows in build input set is small so that some threads don’t have build rows to join with probe rows.

 

Here the query example of hash partitioning in parallel hash join.

 

SELECT t1.col1, t1.col2, t2.col1

FROM bigtable t1

JOIN anotherbigtable t2

ON t1.col2 = t2.col2

 

Execution plan for query above is :

 

  |--Parallelism(Gather Streams)

       |--Hash Match(Inner Join, HASH:([t1].[col2])=([t2].[col2]), RESIDUAL:([Northwind].[dbo].[anotherbigtable].[col2] as [t2].[col2]=[Northwind].[dbo].[bigtable].[col2] as [t1].[col2]))

            |--Parallelism(Repartition Streams, Hash Partitioning, PARTITION COLUMNS:([t1].[col2]))

            |    |--Clustered Index Scan (OBJECT:([Northwind].[dbo].[bigtable].[PK__bigtable__49C3F6B7] AS [t1]))

            |--Parallelism(Repartition Streams, Hash Partitioning, PARTITION COLUMNS:([t2].[col2]))

                 |--Clustered Index Scan (OBJECT:([Northwind].[dbo].[anotherbigtable]. [pk_1] AS [t2]))

 

Notice that there are Hash Partitioning attributes (bold ones) for both parallelism operators and they are join with Hash Match operator (hash join operator). Now change the query slightly here.

 

SELECT t1.col1, t1.col2, t2.col1

FROM bigtable t1

JOIN anotherbigtable t2

ON t1.col2 = t2.col2

WHERE t1.col1 < 100

 

Execution plan for query above is :

 

  |--Parallelism(Gather Streams)

       |--Hash Match(Inner Join, HASH:([t1].[col2])=([t2].[col2]), RESIDUAL:([Northwind].[dbo].[anotherbigtable].[col2] as [t2].[col2]=[Northwind].[dbo].[bigtable].[col2] as [t1].[col2]))

            |--Parallelism(Distribute Streams, Broadcast Partitioning)

            |    |--Clustered Index Seek(OBJECT:([Northwind].[dbo].[bigtable].[PK__bigtable__49C3F6B7] AS [t1]), SEEK:([t1].[col1] < (100)) ORDERED FORWARD)

            |--Clustered Index Scan(OBJECT:([Northwind].[dbo].[anotherbigtable].[pk_1] AS [t2]))

 

The execution plan above shows that build input set is constructed with broadcast partitioning. The question is why there is no exchange operator (parallelism in probe input sets (anotherbigtable). Implicitly, optimizer will distribute scanning result of anotherbigtable into all threads, so no exchange operator required or in other words, no parallelism algorithms (broadcast/hash/round robin/demand/range) explicitly required.

Share this post: | | | |
Posted: Jul 23 2008, 06:16 AM by Kasim.Wirama | with no comments
Filed under:

Comments

No Comments

Leave a Comment

(required) 

(required) 

(optional)

(required) 
Are you human?:  


Enter the numbers above: