SQL Server knowledge center

everything about SQL Server
See also: Other Geeks@INDC

More about Merge Join

More about Merge Join

By : Kasim Wirama, MCDBA
 

I would like to tell you further about merge join operator in execution plan. There are some categorizations for merge loop, based on sorting and relationship between 2 input set. And I would describe what kind of join predicate that will give optimizer more options to choose physical join operation for merge join.

 

Based on sorting categorization, there are index merge join and sort merge join. As I ever mentioned in my recent article about join algorithm, there must be 2 input set in order before merge join is performed. There might be already indexes for both input set, so no explicit sort operator or SQL Server optimizer will add sort operator to one of input set. Because sort operator is kind of blocking operator, index merge join is considered more efficient than sort merge join.

 

Based on relationship between 2 inputs in a set, there are 2 scenarios. First scenario is where first input contains no duplicate rows and second input contains duplicate rows. Merge join for this scenario is one to many merge join. Second scenario is where both inputs have duplicate rows, so that merge join for the scenario is many to many merge join. When scanning both input rows for many to many merge join, next row of left input might be the same value so that optimizer should cache result of corresponding right input as long as left input is still in same value. The cache is built in worktable located at tempdb database, and it will be discarded when optimizer is sure that there will no more same input for next row in left input. The cache size depends on size of duplicate value in right input. One to many merge join doesn’t require worktable so that it is more efficient than many to many merge join. It is possible you can change many to many merge join to one to many merge join by changing left input to no duplicate rows, you can issue DISTINCT or GROUP BY to achieve this, but DISTINCT or GROUP BY might present sort operator.

 

What kind of input join predicate that will give more options to optimizer to favor merge join? If you have join predicate such as table1.cola = table2.cola, make sure both columns are indexed. What about if you have more complex join predicate such as table1.cola = table2.cola and table1.colb = table2.colb, make sure not only both cola and colb get indexed but also index order on both tables should be same as well. What about if you have inequality operator such as table1.cola = table2.cola and table1.colb > table2.colb? SQL Server optimizer still consider merge join as long as at least there is one equality join operator, and inequality join will be residual predicate. Merge join might be used in the following logical join operator, outer join and semi join. For outer and semi join, merge join will stop scanning when it scan the whole rows either of input set depending whether it is left or right outer join/semi join. It is different compared to inner join because merge join will stop scanning whether it reaches one of the end of input set. Be careful for full outer join when considering merge join because full outer join will use many to many merge join.

 

Share this post: | | | |
Posted: May 24 2008, 02:14 PM by Kasim.Wirama | with no comments
Filed under:

Comments

No Comments

Leave a Comment

(required) 

(required) 

(optional)

(required) 
Are you human?:  


Enter the numbers above: