Parallelism in Nested Loop
Parallelism in Nested Loop
By : Kasim Wirama, MCDBA
This posting, I would show behavior characteristic of parallelism in nested loop join. Let’s build these 2 big tables first, with the following TSQL script :
CREATE TABLE bigtable
(
col1 INT PRIMARY KEY,
col2 INT,
col3 CHAR(2000)
);
GO
SET NOCOUNT ON;
DECLARE @i INT;
SET @i = 1;
WHILE @i <=300000
BEGIN
INSERT INTO bigtable VALUES (@i , @i + 100, 'SOME VALUE');
SET @i = @i + 1;
END
GO
SELECT *
INTO anotherbigtable
FROM bigtable;
GO
ALTER TABLE anotherbigtable
ADD CONSTRAINT pk_1 PRIMARY KEY (col1);
GO
I retrieve result from columns of both tables by joining them on col1 column with filter criteria col2 column less than 500. The sql query for achieve parallelism in nested loop join is :
SELECT
t1.*, t2.*
FROM bigtable AS t1 JOIN anotherbigtable AS t2
ON t1.col1 = t2.col1
WHERE t2.col2 < 500
The execution plan for the query (run first : SET STATISTICS PROFILE ON) is :
|--Parallelism(Gather Streams)
|--Nested Loops(Inner Join, OUTER REFERENCES:([t2].[col1], [Expr1004]) OPTIMIZED WITH UNORDERED PREFETCH)
|--Clustered Index Scan(OBJECT:([tempdb].[dbo].[anotherbigtable].[pk_1] AS [t2]), WHERE:([tempdb].[dbo].[anotherbigtable].[col2] as [t2].[col2]<(500)))
|--Clustered Index Seek(OBJECT:([tempdb].[dbo].[bigtable].[PK__bigtable__18EBB532] AS [t1]), SEEK:([t1].[col1]=[tempdb].[dbo].[anotherbigtable].[col1] as [t2].[col1]) ORDERED FORWARD)
With retrieve only 500 rows from 300,000 rows tables, optimizer need to use parallelism. Why? It is because no index applied on col2 column, so optimizer will search from first until end of join predicate on clustered index col1 on both bigtable and anotherbigtable. To speed up the nested looping processing time, optimizer decides to apply parallelism on Nested Loops operator. With parallelism on nested loop, optimizer use more than one thread, and group of rows would be independent across threads. Outer query (Clustered Index Scan operator) will be on each of threads. How about inner query (Clustered Index Seek operator)? Is it also separated on each threads? Here is something a bit tricky. The answer is no, inner query will be executed in serial mode because data on inner query (bigtable) is shared among threads on anotherbigtable (outer query) so parallelism on inner query couldn’t be applied whereas outer query could.