May 2008 - Posts
Study of Stream Aggregation
By : Kasim Wirama, MCDBA
Last article, I talk about scalar aggregation and also I gave example of scalar and stream aggregation. On last article, I also give possible execution plan for scalar aggregation. This article, I examine possible execution plan for stream aggregate. On next article, I would talk about hash aggregation. Now let’s take a look at execution plan of stream aggregate more closely.
Given the group by query below :
SELECT shipcity,count(*)
FROM orders
GROUP BY shipcity
Execution plan for query above is :
|--Compute Scalar(DEFINE:([Expr1003]=CONVERT_IMPLICIT(int,[Expr1006],0)))
|--Stream Aggregate(GROUP BY:([Northwind2].[dbo].[Orders].[ShipCity]) DEFINE:([Expr1006]=Count(*)))
|--Sort(ORDER BY:([Northwind2].[dbo].[Orders].[ShipCity] ASC))
|--Clustered Index Scan(OBJECT:([Northwind2].[dbo].[Orders].[PK_Orders]))
Compare execution with query scalar aggregation below :
SELECT count(*) FROM orders
|--Compute Scalar(DEFINE:([Expr1003]=CONVERT_IMPLICIT(int,[Expr1004],0)))
|--Stream Aggregate(DEFINE:([Expr1004]=Count(*)))
|--Index Scan(OBJECT:([Northwind2].[dbo].[Orders].[OrderDate]))
Notice that stream aggregation query contains Sort operator. If you create index on ShipCity, the execution plan will be similar with execution plan of scalar aggregation query :
|--Compute Scalar(DEFINE:([Expr1003]=CONVERT_IMPLICIT(int,[Expr1006],0)))
|--Stream Aggregate(GROUP BY:([Northwind2].[dbo].[Orders].[ShipCity]) DEFINE:([Expr1006]=Count(*)))
|--Index Scan(OBJECT:([Northwind2].[dbo].[Orders].[ix_shipcity]), ORDERED FORWARD)
If you just issue GROUP BY without aggregate function, there is no Stream Aggregate operator, it has Sort operator instead, as the following query and its execution plan.
SELECT shipcity
FROM orders
GROUP BY shipcity
|--Sort(DISTINCT ORDER BY:([Northwind2].[dbo].[Orders].[ShipCity] ASC))
|--Clustered Index Scan(OBJECT:([Northwind2].[dbo].[Orders].[PK_Orders]))
Now I progress to more complex query with combination between GROUP BY and DISTINCT aggregate here :
SELECT employeeid, count(DISTINCT customerid)
FROM orders
GROUP BY employeeid
Execution plan for the above query is :
|--Compute Scalar(DEFINE:([Expr1003]=CONVERT_IMPLICIT(int,[Expr1006],0)))
|--Stream Aggregate(GROUP BY:([Northwind2].[dbo].[Orders].[EmployeeID]) DEFINE:([Expr1006]=COUNT([Northwind2].[dbo].[Orders].[CustomerID])))
|--Sort(DISTINCT ORDER BY:([Northwind2].[dbo].[Orders].[EmployeeID] ASC, [Northwind2].[dbo].[Orders].[CustomerID] ASC))
|--Clustered Index Scan(OBJECT:([Northwind2].[dbo].[Orders].[PK_Orders]))
The query plan above is similar with first query plan. If you create index on employeeid and customerid, sort operator will be replaced with Stream Aggregate which is a query improvement.
Now how about execution plan for multiple distinct? Here is the query example :
SELECT employeeid, COUNT(DISTINCT customerid), COUNT(*)
FROM orders
GROUP BY employeeid
Query plan for above query is :
|--Compute Scalar(DEFINE:([Northwind2].[dbo].[Orders].[EmployeeID]=[Northwind2].[dbo].[Orders].[EmployeeID]))
|--Merge Join(Inner Join, MANY-TO-MANY MERGE:([Northwind2].[dbo].[Orders].[EmployeeID])=([Northwind2].[dbo].[Orders].[EmployeeID]), RESIDUAL:([Northwind2].[dbo].[Orders].[EmployeeID] = [Northwind2].[dbo].[Orders].[EmployeeID]))
|--Compute Scalar(DEFINE:([Northwind2].[dbo].[Orders].[EmployeeID]=[Northwind2].[dbo].[Orders].[EmployeeID]))
| |--Compute Scalar(DEFINE:([Expr1003]=CONVERT_IMPLICIT(int,[Expr1011],0)))
| |--Stream Aggregate(GROUP BY:([Northwind2].[dbo].[Orders].[EmployeeID]) DEFINE:([Expr1011]=COUNT([Northwind2].[dbo].[Orders].[CustomerID])))
| |--Sort(DISTINCT ORDER BY:([Northwind2].[dbo].[Orders].[EmployeeID] ASC, [Northwind2].[dbo].[Orders].[CustomerID] ASC))
| |--Clustered Index Scan(OBJECT:([Northwind2].[dbo].[Orders].[PK_Orders]))
|--Compute Scalar(DEFINE:([Northwind2].[dbo].[Orders].[EmployeeID]=[Northwind2].[dbo].[Orders].[EmployeeID]))
|--Compute Scalar(DEFINE:([Expr1004]=CONVERT_IMPLICIT(int,[Expr1012],0)))
|--Stream Aggregate(GROUP BY:([Northwind2].[dbo].[Orders].[EmployeeID]) DEFINE:([Expr1012]=Count(*)))
|--Index Scan(OBJECT:([Northwind2].[dbo].[Orders].[EmployeeID]), ORDERED FORWARD)
Notice that there are 2 groups of Compute Scalar operator below Merge Join operator. On my previous article, there is nested loop operator for scalar aggregation. For stream aggregation the operator here is merge join because customerid has been sorted on these two groups. Notice that type of Merge Join operator is many-to-many merge join. Actually it is one-to-many merge join, it means there is a little overhead issue.
Please don’t memorize for each pattern, but try to understand why it results an execution plan.
More about Merge Join
By : Kasim Wirama, MCDBA
I would like to tell you further about merge join operator in execution plan. There are some categorizations for merge loop, based on sorting and relationship between 2 input set. And I would describe what kind of join predicate that will give optimizer more options to choose physical join operation for merge join.
Based on sorting categorization, there are index merge join and sort merge join. As I ever mentioned in my recent article about join algorithm, there must be 2 input set in order before merge join is performed. There might be already indexes for both input set, so no explicit sort operator or SQL Server optimizer will add sort operator to one of input set. Because sort operator is kind of blocking operator, index merge join is considered more efficient than sort merge join.
Based on relationship between 2 inputs in a set, there are 2 scenarios. First scenario is where first input contains no duplicate rows and second input contains duplicate rows. Merge join for this scenario is one to many merge join. Second scenario is where both inputs have duplicate rows, so that merge join for the scenario is many to many merge join. When scanning both input rows for many to many merge join, next row of left input might be the same value so that optimizer should cache result of corresponding right input as long as left input is still in same value. The cache is built in worktable located at tempdb database, and it will be discarded when optimizer is sure that there will no more same input for next row in left input. The cache size depends on size of duplicate value in right input. One to many merge join doesn’t require worktable so that it is more efficient than many to many merge join. It is possible you can change many to many merge join to one to many merge join by changing left input to no duplicate rows, you can issue DISTINCT or GROUP BY to achieve this, but DISTINCT or GROUP BY might present sort operator.
What kind of input join predicate that will give more options to optimizer to favor merge join? If you have join predicate such as table1.cola = table2.cola, make sure both columns are indexed. What about if you have more complex join predicate such as table1.cola = table2.cola and table1.colb = table2.colb, make sure not only both cola and colb get indexed but also index order on both tables should be same as well. What about if you have inequality operator such as table1.cola = table2.cola and table1.colb > table2.colb? SQL Server optimizer still consider merge join as long as at least there is one equality join operator, and inequality join will be residual predicate. Merge join might be used in the following logical join operator, outer join and semi join. For outer and semi join, merge join will stop scanning when it scan the whole rows either of input set depending whether it is left or right outer join/semi join. It is different compared to inner join because merge join will stop scanning whether it reaches one of the end of input set. Be careful for full outer join when considering merge join because full outer join will use many to many merge join.
Performance Study of Scalar Aggregation
By : Kasim Wirama, MCDBA
You might use aggregate function in SQL Server such as COUNT, MIN, MAX, AVG, SUM and DISTINCT. Let’s look into these aggregate functions, and its execution plan. There is scalar aggregation and stream aggregation. A query is called scalar aggregation if only aggregate functions exist, while there is one or more columns besides aggregate function, it is called stream aggregation. I show everything about scalar aggregation in this article.
Examples of scalar aggregation is shown below :
SELECT COUNT(*) FROM orders;
SELECT MIN(orderdate), MAX(orderdate) FROM orders;
SELECT COUNT(DISTINCT customerid) FROM orders;
Example of stream aggregation is shown below :
SELECT orderid, COUNT(*) FROM orders
GROUP BY orderid;
SELECT customerid, MAX(orderdate) FROM ORDERS
GROUP BY customerid;
SELECT DISTINCT customerid FROM orders;
SELECT employeeid, COUNT(DISTINCT customerid) FROM orders
GROUP BY employeeid;
Let’s examine this query :
SELECT COUNT(*) FROM orders;
Execution plan for above query is :
|--Compute Scalar(DEFINE:([Expr1003]=CONVERT_IMPLICIT(int,[Expr1004],0)))
|--Stream Aggregate(DEFINE:([Expr1004]=Count(*)))
|--Index Scan(OBJECT:([Northwind].[dbo].[Orders].[ShippersOrders]))
Notice that count(*) is recognized as Stream Aggregate operator. How about more than one column aggregations? For example :
SELECT MIN(orderdate), MAX(orderdate) FROM orders;
The execution plan is :
|--Stream Aggregate(DEFINE:([Expr1003]=MIN([Northwind].[dbo].[Orders].[OrderDate]), [Expr1004]=MAX([Northwind].[dbo].[Orders].[OrderDate])))
|--Index Scan(OBJECT:([Northwind].[dbo].[Orders].[OrderDate]))
Notice that there is only one stream aggregate even though there is more than one aggregation function even it refers to different column. So what does execution plan look like if it aggregates different column? Take a look this example :
SELECT COUNT(DISTINCT customerid), COUNT(DISTINCT shipcity) FROM orders;
The execution plan is :
|--Compute Scalar(DEFINE:([Expr1003]=CONVERT_IMPLICIT(int,[Expr1005],0), [Expr1004]=CONVERT_IMPLICIT(int,[Expr1006],0)))
|--Stream Aggregate(DEFINE:([Expr1005]=COUNT([Northwind].[dbo].[Orders].[CustomerID]), [Expr1006]=COUNT([Northwind].[dbo].[Orders].[ShipCity])))
|--Clustered Index Scan(OBJECT:([Northwind].[dbo].[Orders].[PK_Orders]))
Still you get one stream aggregate. It is consistent for scalar aggregate with no DISTINCT in query statement. Now let’s take a look for multi scalar aggregate with distinct in its query statement :
SELECT COUNT(DISTINCT customerid), COUNT(DISTINCT shipcity) FROM orders;
|--Nested Loops(Inner Join)
|--Compute Scalar(DEFINE:([Expr1003]=CONVERT_IMPLICIT(int,[Expr1009],0)))
| |--Stream Aggregate(DEFINE:([Expr1009]=COUNT([Northwind].[dbo].[Orders].[CustomerID])))
| |--Stream Aggregate(GROUP BY:([Northwind].[dbo].[Orders].[CustomerID]))
| |--Index Scan(OBJECT:([Northwind].[dbo].[Orders].[CustomersOrders]), ORDERED FORWARD)
|--Compute Scalar(DEFINE:([Expr1004]=CONVERT_IMPLICIT(int,[Expr1010],0)))
|--Stream Aggregate(DEFINE:([Expr1010]=COUNT([Northwind].[dbo].[Orders].[ShipCity])))
|--Sort(DISTINCT ORDER BY:([Northwind].[dbo].[Orders].[ShipCity] ASC))
|--Clustered Index Scan(OBJECT:([Northwind].[dbo].[Orders].[PK_Orders]))
Notice that, for DISTINCT in a query statement, there will be 2 big branches, and each of them will have at least one stream aggregate and 2 branches is joined with nested loop operator.
Now compare 2 query statement below :
SELECT COUNT(DISTINCT customerid) FROM orders
SELECT COUNT(DISTINCT shipcity) FROM orders
Execution plan for each of them are:
|--Compute Scalar(DEFINE:([Expr1003]=CONVERT_IMPLICIT(int,[Expr1006],0)))
|--Stream Aggregate(DEFINE:([Expr1006]=COUNT([Northwind].[dbo].[Orders].[CustomerID])))
|--Stream Aggregate(GROUP BY:([Northwind].[dbo].[Orders].[CustomerID]))
|--Index Scan(OBJECT:([Northwind].[dbo].[Orders].[CustomersOrders]), ORDERED FORWARD)
|--Compute Scalar(DEFINE:([Expr1003]=CONVERT_IMPLICIT(int,[Expr1006],0)))
|--Stream Aggregate(DEFINE:([Expr1006]=COUNT([Northwind].[dbo].[Orders].[ShipCity])))
|--Sort(DISTINCT ORDER BY:([Northwind].[dbo].[Orders].[ShipCity] ASC))
|--Clustered Index Scan(OBJECT:([Northwind].[dbo].[Orders].[PK_Orders]))
Notice that fonts that I make it bold, why there is sort operator besides stream aggregate operator after index scan operator? Stream aggregate happens when there is index on the associated column and vice versa for sort operator. Cost for stream aggregateoperator is significantly smaller than sort operator, so stream aggregate operator in eliminating duplicate rows is more efficient than sort operator.
Quote of the day
Query statement doesn't guarantee result in sorted order.
Nested Loop on Partitioned Table/Index
By : Kasim Wirama, MCDBA
This article I would like to show you the behavior of nested loop in partitioned table/index in SQL Server 2005. Partitioning has been introduced since SQL Server 2005. This feature will be available on next release of SQL Server.
Let’s say you have a table whose values of one of the column is range between 1 and 1000 and you want to put these values into 4 partitions, first partition from 1 to 199, second one from 200 to 399, third one from 400 to 599, and the rest value (after 599) is assigned to fourth partition. Here is the script to create partitioned table:
CREATE partition FUNCTION pf_1(INT) AS range RIGHT FOR VALUES (200,400,600);
CREATE partition scheme ps_1 AS partition pf_1 ALL TO ([PRIMARY]);
CREATE TABLE test_partition(id INT PRIMARY KEY, DATA VARCHAR(10))
ON ps_1(id);
Now issue select statement below and get its query plan :
SELECT * FROM test_partition
SELECT * FROM test_partition
|--Nested Loops(Inner Join, OUTER REFERENCES:([PtnIds1003]) PARTITION ID:([PtnIds1003]))
|--Constant Scan(VALUES:(((1)),((2)),((3)),((4))))
|--Clustered Index Scan(OBJECT:([tempdb].[dbo].[test_partition].[PK__test_partition__0BC6C43E]))
Notice that there is phrase : PARTITION ID instead of correlated parameter. PtnIds1003 shows partition id column (in this example, it is id column). SQL Server will do constant scan iterative from first until last partition, and does scanning for each iterations of partition member.
Query Plan for Nested Loop
By : Kasim Wirama, MCDBA
This article, I would like to show possible query plan for nested loop and also which types of query plan for nested loop is considered in optimization form. Let’s take a look for this physical join operation.
I use northwind database and issue query to get orders that belong to customers who comes from Berlin.
SELECT o.orderid
FROM customers AS c
JOIN orders AS o
ON c.customerid = o.customerid
WHERE c.city='berlin'
Query plan for this query is :
select o.orderid from customers as c join orders as o on c.customerid = o.customerid where c.city='berlin'
|--Nested Loops(Inner Join, OUTER REFERENCES:(
.[CustomerID]))
|--Index Seek(OBJECT:([Northwind].[dbo].[Customers].[City] AS
), SEEK:(
.[City]=N'berlin') ORDERED FORWARD)
|--Index Seek(OBJECT:([Northwind].[dbo].[Orders].[CustomerID] AS
), SEEK:(
.[CustomerID]=[Northwind].[dbo].[Customers].[CustomerID] as
.[CustomerID]) ORDERED FORWARD)
Rows in customers table have their match at orders table with join key customerid, notice that there is “outer references” keyword which contains CustomerID column from customers table. This column would be input parameter for Orders.CustomerID column. Outer reference in nested loop is called correlated parameter and inner side of nested loop shows index seek which boosts nested loop performance. If you find Lazy Spool operator on inner side of nested loop, you also have good type of nested loop because it caches search results for inner side of join. You will find Lazy Spool operator when there are many duplicate values and inner side of join is expensive to evaluate.
What if join keys are not indexed or no equijoin, what does the query plan look like? Here is the query example:
SELECT e1.employeeid, count(*)
FROM employees e1 JOIN employees e2
ON e1.hiredate > e2.hiredate
GROUP BY e1.employeeid
Query plan for this query is :
select e1.employeeid, count(*) from employees e1 join employees e2 on e1.hiredate > e2.hiredate group by e1.employeeid
|--Compute Scalar(DEFINE:([Expr1004]=CONVERT_IMPLICIT(int,[Expr1007],0)))
|--Stream Aggregate(GROUP BY:([e1].[EmployeeID]) DEFINE:([Expr1007]=Count(*)))
|--Nested Loops(Inner Join, WHERE:([Northwind].[dbo].[Employees].[HireDate] as [e1].[HireDate]>[Northwind].[dbo].[Employees].[HireDate] as [e2].[HireDate]))
|--Clustered Index Scan(OBJECT:([Northwind].[dbo].[Employees].[PK_Employees] AS [e1]))
|--Clustered Index Scan(OBJECT:([Northwind].[dbo].[Employees].[PK_Employees] AS [e2]))
Notice that there is no “outer references” keyword. This query could be rewritten so that there is correlation between outer query and inner query as the following query:
SELECT e1.employeeid, D.N
FROM employees e1
CROSS apply
(
SELECT count(*) AS N
FROM employees e2
WHERE e1.hiredate > e2.hiredate
) AS D
WHERE N>0
Query plan for the rewritten query above is shown below:
select e1.employeeid, D.n from employees e1 cross apply ( select count(*) as n from employees e2 where e1.hiredate > e2.hiredate ) as D where n>0
|--Filter(WHERE:([Expr1004]>(0)))
|--Nested Loops(Inner Join, OUTER REFERENCES:([e1].[HireDate]))
|--Clustered Index Scan(OBJECT:([Northwind].[dbo].[Employees].[PK_Employees] AS [e1]))
|--Compute Scalar(DEFINE:([Expr1004]=CONVERT_IMPLICIT(int,[Expr1007],0)))
|--Stream Aggregate(DEFINE:([Expr1007]=Count(*)))
|--Clustered Index Scan(OBJECT:([Northwind].[dbo].[Employees].[PK_Employees] AS [e2]), WHERE:([Northwind].[dbo].[Employees].[HireDate] as [e1].[HireDate]>[Northwind].[dbo].[Employees].[HireDate] as [e2].[HireDate]))
The above query plan is optimized nested loop, if you apply index on hiredate column.
Performance characteristics of Nested Loop, Merge Join and Hash Join
By : Kasim Wirama, MCDBA
Nested loop, merge join and hash join is common operators which you will find out in execution plan as long as your query contains any logical joins (inner join, outer join, cross join, semi join). These 3 brothers are classified into physical joins. For performance perspective, they fit in some conditions. I would tell you characteristics for each of them.
Nested loop join is best to handle small input sets with index on inner input set specified by join key. It is suitable for OLTP where there are a large number of concurrent users. It supports equijoin though not required, and either left outer join and left (anti) semi join. If you have dynamic cursor, make sure nested loop is the only brother that is in. you can rely ordering issue if you want it gets ordered based on outer input.
Merge join is best to handle medium to large input sets. A better query plan will come up with index one- to-many merge join so that it would be able to support large number of concurrent users. Notice that at least one equijoin required in join predicate, so that optimizer will prefer merge join. Planning carefully to your tempdb database capacity is recommended if you found many-to-many merge join. Compared to join spectrum at nested loop, merge join supports all logical join. Because no inner and outer hierarchy like nested loop, merge join can preserve order of either of input sets.
Last brother of physical join is hash join. The join is best to large input sets. This typical characteristics will be found in datawarehouse environment with parallel query execution, so it supports limited number of concurrent users. Hash join will hold its output until all rows of first input set are completely build into hash table, and remember that hash join will be in-memory storage so it takes memory. If memory is not sufficient, it will spill out to tempdb as temporary repository of build table, in worst case it turns out to bailout. And like merge join, it supports all logical joins and at least one equijoin required.
Nested Loop, Merge Join, and Hash
Join algorithm
By : Kasim Wirama, MCDBA
If you observe execution plan on your query using join
types, you will find out the following operators: nested loop, merge loop and
hash join. I would like to describe algorithm for each of them.
Nested loop
As its name implies, it will iterate each rows at outer
input set to query rows at inner input set. The cost will be multiplication
between inner input and outer input rows. The algorithm for nested loop is
shown below :
For each R1 in outer input set
For each R2 in inner input set
If R1 match with R2
Output (R1, R2)
R1 and R2 are rows for outer and inner input sets.
Merge Join
Merge join will output rows from both input sets that have their
matches by comparing 2 inputs sets that have been already sorted. Cost of merge
loop is sum of number of both inputs. Algorithm for merge join is shown below :
Read first R1 at first input set
Read first R2 at second input set
While not reach the end of either of the inputs
begin
if R1 match R2
begin
output (R1, R2)
get next R2
end
else if R1 less than R2
get next R1
else
get next R2
end
Hash join
Hash join will output qualified resultset by comparing hash
value of second input set to hash value of first input set. It will create hash
table (called build table) for first input set, then compare hash value of
second input set to the build table. It is undesirable operator in most OLTP
scenarios. The algorithm for hash join is shown below :
Create build table
For each R1
begin
generate hash value of R1 join key
insert into build table to appropriate hash bucket
end
for each R2
begin
generate hash value of R2 join key
for each R1 in corresponding hash bucket
if match R1 and R2
output (R1,R2)
end
Types of Join
By : Kasim Wirama, MCDBA
In this article, I would like to introduce types of join in SQL Server. I give examples inner join, cross join, left outer join, full outer join and left semi join and left anti semi join.
Before I progress to each joins, here is the tables script as example :
CREATE TABLE customer
(custid CHAR(3) NOT NULL PRIMARY KEY,
firstname VARCHAR(10) NOT NULL);
GO
CREATE TABLE sales
(salesID INT NOT NULL PRIMARY KEY,
custid CHAR(3) NULL,
qty INT);
GO
INSERT INTO customer (custid, firstname) VALUES ('ant','anton');
INSERT INTO customer (custid, firstname) VALUES ('rng','rangga');
INSERT INTO customer (custid, firstname) VALUES ('joh','johan');
INSERT INTO sales (salesID, custid, qty) VALUES (1,'ant',10);
INSERT INTO sales (salesID, custid, qty) VALUES (2,'rng',20);
INSERT INTO sales (salesID, custid, qty) VALUES (3,'rng',25);
INSERT INTO sales (salesID, custid, qty) VALUES (4,'smi',10);
Inner join
Inner join will join 2 input sets and output them if both of them match the join criteria. To display customer and sales related to the customer, here is the query and output
SELECT c.custid, c.firstname, s.salesID, s.custid, qty
FROM customer AS c
INNER JOIN sales AS s
ON c.custid = s.custid
custid firstname salesID custid qty
------ ---------- ----------- ------ -----------
ant anton 1 ant 10
rng rangga 2 rng 20
rng rangga 3 rng 25
(3 row(s) affected)
Cross join
Cross join will join 2 input sets and each rows in both 2 inputs will have combination for each other. Most business requirement doesn’t use this join frequently. Cross join doesn’t need join predicate. The example and result for this join are as below :
SELECT c.custid, c.firstname, s.salesID, s.custid, qty
FROM customer AS c
CROSS JOIN sales AS s
custid firstname salesID custid qty
------ ---------- ----------- ------ -----------
ant anton 1 ant 10
ant anton 2 rng 20
ant anton 3 rng 25
ant anton 4 smi 10
joh johan 1 ant 10
joh johan 2 rng 20
joh johan 3 rng 25
joh johan 4 smi 10
rng rangga 1 ant 10
rng rangga 2 rng 20
rng rangga 3 rng 25
rng rangga 4 smi 10
(12 row(s) affected)
Notice that number of rows in cross join is the multiplication of number of rows in customer and sales table.
Left outer join
Left outer join will output all rows from left input sets based on specified join predicate, even though rows from left input sets doesn’t necessarily have its match at right input sets. For example : display all customers with existing and non existing sales.
SELECT c.custid, c.firstname, s.salesID, s.custid, qty
FROM customer AS c
LEFT OUTER JOIN sales AS s
ON c.custid = s.custid
custid firstname salesID custid qty
------ ---------- ----------- ------ -----------
ant anton 1 ant 10
joh johan NULL NULL NULL
rng rangga 2 rng 20
rng rangga 3 rng 25
(4 row(s) affected)
Notice that custid Joh doesn’t have sales. A LEFT OUTER JOIN B will be same as B RIGHT OUTER JOIN A.
SELECT c.custid, c.firstname, s.salesID, s.custid, qty
FROM sales AS s
RIGHT OUTER JOIN customer AS c
ON c.custid = s.custid
Full outer join
This join will join both input sets, and will output each rows of both of them even though they don’t have pairs for each other. Let’s say, I would like to display all customer and all sales information, with the following example :
SELECT c.custid, c.firstname, s.salesID, s.custid, qty
FROM customer AS c
FULL OUTER JOIN sales AS s
ON c.custid = s.custid
custid firstname salesID custid qty
------ ---------- ----------- ------ -----------
ant anton 1 ant 10
joh johan NULL NULL NULL
rng rangga 2 rng 20
rng rangga 3 rng 25
NULL NULL 4 smi 10
(5 row(s) affected)
Notice that custid joh doesn’t have sales, and salesid 4 doesn’t have corresponding customer information in customer table.
Left Semi join
Left semi join will output outer input set which exists in inner input set and remove duplication of resultset, it is look like inner join. SQL Server doesn’t have command for semi join, the semi join is introduced in EXISTS or IN keyword.
Illustration for semi join is to display customer information that has sales.
SELECT c.custid, c.firstname
FROM customer AS c
WHERE EXISTS
(
SELECT * FROM sales AS s
WHERE s.custid = c.custid
);
custid firstname
------ ----------
ant anton
rng rangga
(2 row(s) affected)
Or using IN keyword :
SELECT