SQL Server knowledge center

everything about SQL Server
See also: Other Geeks@INDC

May 2008 - Posts

Study of Stream Aggregation

Study of Stream Aggregation

By : Kasim Wirama, MCDBA

 

Last article, I talk about scalar aggregation and also I gave example of scalar and stream aggregation. On last article, I also give possible execution plan for scalar aggregation. This article, I examine possible execution plan for stream aggregate. On next article, I would talk about hash aggregation. Now let’s take a look at execution plan of stream aggregate more closely.

 

Given the group by query below :

 

SELECT shipcity,count(*)
FROM orders
GROUP BY shipcity

Execution plan for query above is :

  |--Compute Scalar(DEFINE:([Expr1003]=CONVERT_IMPLICIT(int,[Expr1006],0)))

       |--Stream Aggregate(GROUP BY:([Northwind2].[dbo].[Orders].[ShipCity]) DEFINE:([Expr1006]=Count(*)))

            |--Sort(ORDER BY:([Northwind2].[dbo].[Orders].[ShipCity] ASC))

                 |--Clustered Index Scan(OBJECT:([Northwind2].[dbo].[Orders].[PK_Orders]))

Compare execution with query scalar aggregation below :

SELECT count(*) FROM orders

  |--Compute Scalar(DEFINE:([Expr1003]=CONVERT_IMPLICIT(int,[Expr1004],0)))

       |--Stream Aggregate(DEFINE:([Expr1004]=Count(*)))

            |--Index Scan(OBJECT:([Northwind2].[dbo].[Orders].[OrderDate]))

Notice that stream aggregation query contains Sort operator. If you create index on ShipCity, the execution plan will be similar with execution plan of scalar aggregation query :

  |--Compute Scalar(DEFINE:([Expr1003]=CONVERT_IMPLICIT(int,[Expr1006],0)))

       |--Stream Aggregate(GROUP BY:([Northwind2].[dbo].[Orders].[ShipCity]) DEFINE:([Expr1006]=Count(*)))

            |--Index Scan(OBJECT:([Northwind2].[dbo].[Orders].[ix_shipcity]), ORDERED FORWARD)

If you just issue GROUP BY without aggregate function, there is no Stream Aggregate operator, it has Sort operator instead, as the following query and its execution plan.

SELECT shipcity
FROM orders
GROUP BY shipcity

 

  |--Sort(DISTINCT ORDER BY:([Northwind2].[dbo].[Orders].[ShipCity] ASC))

       |--Clustered Index Scan(OBJECT:([Northwind2].[dbo].[Orders].[PK_Orders]))

Now I progress to more complex query with combination between GROUP BY and DISTINCT aggregate here :

SELECT employeeid, count(DISTINCT customerid)
FROM orders
GROUP BY employeeid

Execution plan for the above query is :

  |--Compute Scalar(DEFINE:([Expr1003]=CONVERT_IMPLICIT(int,[Expr1006],0)))

       |--Stream Aggregate(GROUP BY:([Northwind2].[dbo].[Orders].[EmployeeID]) DEFINE:([Expr1006]=COUNT([Northwind2].[dbo].[Orders].[CustomerID])))

            |--Sort(DISTINCT ORDER BY:([Northwind2].[dbo].[Orders].[EmployeeID] ASC, [Northwind2].[dbo].[Orders].[CustomerID] ASC))

                 |--Clustered Index Scan(OBJECT:([Northwind2].[dbo].[Orders].[PK_Orders]))

The query plan above is similar with first query plan. If you create index on employeeid and customerid, sort operator will be replaced with Stream Aggregate which is a query improvement.

Now how about execution plan for multiple distinct? Here is the query example :

SELECT employeeid, COUNT(DISTINCT customerid), COUNT(*)

FROM orders

GROUP BY employeeid

Query plan for above query is :

  |--Compute Scalar(DEFINE:([Northwind2].[dbo].[Orders].[EmployeeID]=[Northwind2].[dbo].[Orders].[EmployeeID]))

       |--Merge Join(Inner Join, MANY-TO-MANY MERGE:([Northwind2].[dbo].[Orders].[EmployeeID])=([Northwind2].[dbo].[Orders].[EmployeeID]), RESIDUAL:([Northwind2].[dbo].[Orders].[EmployeeID] = [Northwind2].[dbo].[Orders].[EmployeeID]))

            |--Compute Scalar(DEFINE:([Northwind2].[dbo].[Orders].[EmployeeID]=[Northwind2].[dbo].[Orders].[EmployeeID]))

            |    |--Compute Scalar(DEFINE:([Expr1003]=CONVERT_IMPLICIT(int,[Expr1011],0)))

            |         |--Stream Aggregate(GROUP BY:([Northwind2].[dbo].[Orders].[EmployeeID]) DEFINE:([Expr1011]=COUNT([Northwind2].[dbo].[Orders].[CustomerID])))

            |              |--Sort(DISTINCT ORDER BY:([Northwind2].[dbo].[Orders].[EmployeeID] ASC, [Northwind2].[dbo].[Orders].[CustomerID] ASC))

            |                   |--Clustered Index Scan(OBJECT:([Northwind2].[dbo].[Orders].[PK_Orders]))

            |--Compute Scalar(DEFINE:([Northwind2].[dbo].[Orders].[EmployeeID]=[Northwind2].[dbo].[Orders].[EmployeeID]))

                 |--Compute Scalar(DEFINE:([Expr1004]=CONVERT_IMPLICIT(int,[Expr1012],0)))

                      |--Stream Aggregate(GROUP BY:([Northwind2].[dbo].[Orders].[EmployeeID]) DEFINE:([Expr1012]=Count(*)))

                           |--Index Scan(OBJECT:([Northwind2].[dbo].[Orders].[EmployeeID]), ORDERED FORWARD)

 

Notice that there are 2 groups of Compute Scalar operator below Merge Join operator. On my previous article, there is nested loop operator for scalar aggregation. For stream aggregation the operator here is merge join because customerid has been sorted on these two groups. Notice that type of Merge Join operator is many-to-many  merge join. Actually it is one-to-many merge join, it means there is a little overhead issue.

 

Please don’t memorize for each pattern, but try to understand why it results an execution plan.

 

Share this post: | | | |
More about Merge Join

More about Merge Join

By : Kasim Wirama, MCDBA
 

I would like to tell you further about merge join operator in execution plan. There are some categorizations for merge loop, based on sorting and relationship between 2 input set. And I would describe what kind of join predicate that will give optimizer more options to choose physical join operation for merge join.

 

Based on sorting categorization, there are index merge join and sort merge join. As I ever mentioned in my recent article about join algorithm, there must be 2 input set in order before merge join is performed. There might be already indexes for both input set, so no explicit sort operator or SQL Server optimizer will add sort operator to one of input set. Because sort operator is kind of blocking operator, index merge join is considered more efficient than sort merge join.

 

Based on relationship between 2 inputs in a set, there are 2 scenarios. First scenario is where first input contains no duplicate rows and second input contains duplicate rows. Merge join for this scenario is one to many merge join. Second scenario is where both inputs have duplicate rows, so that merge join for the scenario is many to many merge join. When scanning both input rows for many to many merge join, next row of left input might be the same value so that optimizer should cache result of corresponding right input as long as left input is still in same value. The cache is built in worktable located at tempdb database, and it will be discarded when optimizer is sure that there will no more same input for next row in left input. The cache size depends on size of duplicate value in right input. One to many merge join doesn’t require worktable so that it is more efficient than many to many merge join. It is possible you can change many to many merge join to one to many merge join by changing left input to no duplicate rows, you can issue DISTINCT or GROUP BY to achieve this, but DISTINCT or GROUP BY might present sort operator.

 

What kind of input join predicate that will give more options to optimizer to favor merge join? If you have join predicate such as table1.cola = table2.cola, make sure both columns are indexed. What about if you have more complex join predicate such as table1.cola = table2.cola and table1.colb = table2.colb, make sure not only both cola and colb get indexed but also index order on both tables should be same as well. What about if you have inequality operator such as table1.cola = table2.cola and table1.colb > table2.colb? SQL Server optimizer still consider merge join as long as at least there is one equality join operator, and inequality join will be residual predicate. Merge join might be used in the following logical join operator, outer join and semi join. For outer and semi join, merge join will stop scanning when it scan the whole rows either of input set depending whether it is left or right outer join/semi join. It is different compared to inner join because merge join will stop scanning whether it reaches one of the end of input set. Be careful for full outer join when considering merge join because full outer join will use many to many merge join.

 

Share this post: | | | |
Posted: May 24 2008, 02:14 PM by Kasim.Wirama | with no comments
Filed under:
Performance Study of Scalar Aggregation

Performance Study of Scalar Aggregation

By : Kasim Wirama, MCDBA

 

You might use aggregate function in SQL Server such as COUNT, MIN, MAX, AVG, SUM and DISTINCT. Let’s look into these aggregate functions, and its execution plan. There is scalar aggregation and stream aggregation. A query is called scalar aggregation if only aggregate functions exist, while there is one or more columns besides aggregate function, it is called stream aggregation. I show everything about scalar aggregation in this article.

Examples of scalar aggregation is shown below :

SELECT COUNT(*) FROM orders;

SELECT MIN(orderdate), MAX(orderdate) FROM orders;

SELECT COUNT(DISTINCT  customerid) FROM orders;

Example of stream aggregation is shown below :

SELECT orderid, COUNT(*) FROM orders
GROUP BY orderid;

SELECT customerid, MAX(orderdate) FROM ORDERS
GROUP BY customerid;

SELECT DISTINCT customerid FROM orders;

SELECT employeeid, COUNT(DISTINCT customerid) FROM orders
GROUP BY  employeeid;

Let’s examine this query :

SELECT COUNT(*) FROM orders;

Execution plan for above query is :

  |--Compute Scalar(DEFINE:([Expr1003]=CONVERT_IMPLICIT(int,[Expr1004],0)))

       |--Stream Aggregate(DEFINE:([Expr1004]=Count(*)))

            |--Index Scan(OBJECT:([Northwind].[dbo].[Orders].[ShippersOrders]))

Notice that count(*) is recognized as Stream Aggregate operator.  How about more than one column aggregations? For example :

SELECT MIN(orderdate), MAX(orderdate) FROM orders;

The execution plan is :

  |--Stream Aggregate(DEFINE:([Expr1003]=MIN([Northwind].[dbo].[Orders].[OrderDate]), [Expr1004]=MAX([Northwind].[dbo].[Orders].[OrderDate])))

       |--Index Scan(OBJECT:([Northwind].[dbo].[Orders].[OrderDate]))

 

Notice that there is only one stream aggregate even though there is more than one aggregation function even it refers to different column. So what does execution plan look like if it aggregates different column? Take a look this example :

 

SELECT COUNT(DISTINCT  customerid), COUNT(DISTINCT  shipcity) FROM orders;

The execution plan is :

  |--Compute Scalar(DEFINE:([Expr1003]=CONVERT_IMPLICIT(int,[Expr1005],0), [Expr1004]=CONVERT_IMPLICIT(int,[Expr1006],0)))

       |--Stream Aggregate(DEFINE:([Expr1005]=COUNT([Northwind].[dbo].[Orders].[CustomerID]), [Expr1006]=COUNT([Northwind].[dbo].[Orders].[ShipCity])))

            |--Clustered Index Scan(OBJECT:([Northwind].[dbo].[Orders].[PK_Orders]))

 

Still you get one stream aggregate. It is consistent for scalar aggregate with no DISTINCT in query statement. Now let’s take a look for multi scalar aggregate with distinct in its query statement :

SELECT COUNT(DISTINCT  customerid), COUNT(DISTINCT  shipcity) FROM orders;

  |--Nested Loops(Inner Join)

       |--Compute Scalar(DEFINE:([Expr1003]=CONVERT_IMPLICIT(int,[Expr1009],0)))

       |    |--Stream Aggregate(DEFINE:([Expr1009]=COUNT([Northwind].[dbo].[Orders].[CustomerID])))

       |         |--Stream Aggregate(GROUP BY:([Northwind].[dbo].[Orders].[CustomerID]))

       |              |--Index Scan(OBJECT:([Northwind].[dbo].[Orders].[CustomersOrders]), ORDERED FORWARD)

       |--Compute Scalar(DEFINE:([Expr1004]=CONVERT_IMPLICIT(int,[Expr1010],0)))

            |--Stream Aggregate(DEFINE:([Expr1010]=COUNT([Northwind].[dbo].[Orders].[ShipCity])))

                 |--Sort(DISTINCT ORDER BY:([Northwind].[dbo].[Orders].[ShipCity] ASC))

                      |--Clustered Index Scan(OBJECT:([Northwind].[dbo].[Orders].[PK_Orders]))

Notice that, for DISTINCT in a query statement, there will be 2 big branches, and each of them will have at least one stream aggregate and 2 branches is joined with nested loop operator.

 

Now compare 2 query statement below :

SELECT COUNT(DISTINCT customerid) FROM orders

SELECT COUNT(DISTINCT shipcity) FROM orders                                                            

Execution plan for each of them are:

  |--Compute Scalar(DEFINE:([Expr1003]=CONVERT_IMPLICIT(int,[Expr1006],0)))

       |--Stream Aggregate(DEFINE:([Expr1006]=COUNT([Northwind].[dbo].[Orders].[CustomerID])))

            |--Stream Aggregate(GROUP BY:([Northwind].[dbo].[Orders].[CustomerID]))

                 |--Index Scan(OBJECT:([Northwind].[dbo].[Orders].[CustomersOrders]), ORDERED FORWARD)

 

  |--Compute Scalar(DEFINE:([Expr1003]=CONVERT_IMPLICIT(int,[Expr1006],0)))

       |--Stream Aggregate(DEFINE:([Expr1006]=COUNT([Northwind].[dbo].[Orders].[ShipCity])))

            |--Sort(DISTINCT ORDER BY:([Northwind].[dbo].[Orders].[ShipCity] ASC))

                 |--Clustered Index Scan(OBJECT:([Northwind].[dbo].[Orders].[PK_Orders]))

 

Notice that fonts that I make it bold, why there is sort operator besides stream aggregate operator after index scan operator? Stream aggregate happens when there is index on the associated column and vice versa for sort operator. Cost for stream aggregateoperator  is significantly smaller than sort operator, so stream aggregate operator in eliminating duplicate rows is more efficient than sort operator.

 

Share this post: | | | |
Posted: May 24 2008, 02:12 PM by Kasim.Wirama | with no comments
Filed under:
Quote of the day

Quote of the day

Query statement doesn't guarantee result in sorted order.

Share this post: | | | |
Posted: May 23 2008, 01:36 AM by Kasim.Wirama | with no comments
Filed under:
Nested Loop on Partitioned Table/Index

Nested Loop on Partitioned Table/Index

By : Kasim Wirama, MCDBA

 

This article I would like to show you the behavior of nested loop in partitioned table/index in SQL Server 2005. Partitioning has been introduced since SQL Server 2005. This feature will be available on next release of SQL Server.

Let’s say you have a table whose values of one of the column is range between 1 and 1000 and you want to put these values into 4 partitions, first partition from 1 to 199, second one from 200 to 399, third one from 400 to 599, and the rest value (after 599) is assigned to fourth partition. Here is the script to create partitioned table:

 

CREATE partition FUNCTION pf_1(INT) AS range RIGHT FOR VALUES (200,400,600);

 

CREATE partition scheme ps_1 AS partition pf_1 ALL TO ([PRIMARY]);

 

CREATE TABLE test_partition(id INT PRIMARY KEY, DATA VARCHAR(10))
ON ps_1(id);

 

Now issue select statement below and get its query plan :

 

SELECT * FROM test_partition

 

  SELECT * FROM test_partition

  |--Nested Loops(Inner Join, OUTER REFERENCES:([PtnIds1003]) PARTITION ID:([PtnIds1003]))

       |--Constant Scan(VALUES:(((1)),((2)),((3)),((4))))

       |--Clustered Index Scan(OBJECT:([tempdb].[dbo].[test_partition].[PK__test_partition__0BC6C43E]))

 

Notice that there is phrase : PARTITION ID instead of correlated parameter. PtnIds1003 shows partition id column (in this example, it is id column). SQL Server will do constant scan iterative from first until last partition, and does scanning for each iterations of partition member.

Share this post: | | | |
Posted: May 22 2008, 12:37 PM by Kasim.Wirama | with no comments
Filed under:
Query Plan for Nested Loop

Query Plan for Nested Loop

By : Kasim Wirama, MCDBA

 

This article, I would like to show possible query plan for nested loop and also which types of query plan for nested loop is considered in optimization form. Let’s take a look for this physical join operation.

 

I use northwind database and issue query to get orders that belong to customers who comes from Berlin.

 

SELECT o.orderid

FROM customers AS c

JOIN orders AS o

ON c.customerid = o.customerid

WHERE c.city='berlin'

 

Query plan for this query is :

 

select o.orderid  from customers as c  join orders as o  on c.customerid = o.customerid  where c.city='berlin'

  |--Nested Loops(Inner Join, OUTER REFERENCES:(Coffee.[CustomerID]))

       |--Index Seek(OBJECT:([Northwind].[dbo].[Customers].[City] AS Coffee), SEEK:(Coffee.[City]=N'berlin') ORDERED FORWARD)

       |--Index Seek(OBJECT:([Northwind].[dbo].[Orders].[CustomerID] AS Time), SEEK:(Time.[CustomerID]=[Northwind].[dbo].[Customers].[CustomerID] as Coffee.[CustomerID]) ORDERED FORWARD)

 

Rows in customers table have their match at orders table with join key customerid, notice that there is “outer references” keyword which contains CustomerID column from customers table. This column would be input parameter for Orders.CustomerID column. Outer reference in nested loop is called correlated parameter and inner side of nested loop shows index seek which boosts nested loop performance. If you find Lazy Spool operator on inner side of nested loop, you also have good type of nested loop because it caches search results for inner side of join. You will find Lazy Spool operator when there are many duplicate values and inner side of join is expensive to evaluate.

 

What if join keys are not indexed or no equijoin, what does the query plan look like? Here is the query example:

 

SELECT e1.employeeid, count(*)

FROM employees e1 JOIN employees e2

ON e1.hiredate > e2.hiredate

GROUP BY e1.employeeid

 

Query plan for this query is :

 

select e1.employeeid, count(*)  from employees e1 join employees e2  on e1.hiredate > e2.hiredate  group by e1.employeeid

  |--Compute Scalar(DEFINE:([Expr1004]=CONVERT_IMPLICIT(int,[Expr1007],0)))

       |--Stream Aggregate(GROUP BY:([e1].[EmployeeID]) DEFINE:([Expr1007]=Count(*)))

            |--Nested Loops(Inner Join, WHERE:([Northwind].[dbo].[Employees].[HireDate] as [e1].[HireDate]>[Northwind].[dbo].[Employees].[HireDate] as [e2].[HireDate]))

                 |--Clustered Index Scan(OBJECT:([Northwind].[dbo].[Employees].[PK_Employees] AS [e1]))

                 |--Clustered Index Scan(OBJECT:([Northwind].[dbo].[Employees].[PK_Employees] AS [e2]))

 

Notice that there is no “outer references” keyword. This query could be rewritten so that there is correlation between outer query and inner query as the following query:

 

SELECT e1.employeeid, D.N

FROM employees e1

CROSS apply

   (

     SELECT count(*)  AS N

     FROM employees e2

     WHERE e1.hiredate > e2.hiredate

   ) AS D

WHERE N>0

 

Query plan for the rewritten query above is shown below:

 

select e1.employeeid, D.n  from employees e1  cross apply      (       select count(*)  as n       from employees e2       where e1.hiredate > e2.hiredate     ) as D  where n>0

  |--Filter(WHERE:([Expr1004]>(0)))

       |--Nested Loops(Inner Join, OUTER REFERENCES:([e1].[HireDate]))

            |--Clustered Index Scan(OBJECT:([Northwind].[dbo].[Employees].[PK_Employees] AS [e1]))

            |--Compute Scalar(DEFINE:([Expr1004]=CONVERT_IMPLICIT(int,[Expr1007],0)))

                 |--Stream Aggregate(DEFINE:([Expr1007]=Count(*)))

                      |--Clustered Index Scan(OBJECT:([Northwind].[dbo].[Employees].[PK_Employees] AS [e2]), WHERE:([Northwind].[dbo].[Employees].[HireDate] as [e1].[HireDate]>[Northwind].[dbo].[Employees].[HireDate] as [e2].[HireDate]))

 

The above query plan is optimized nested loop, if you apply index on hiredate column.

 

Share this post: | | | |
Posted: May 20 2008, 11:21 PM by Kasim.Wirama | with no comments
Filed under:
Performance characteristics of Nested Loop, Merge Join and Hash Join

Performance characteristics of Nested Loop, Merge Join and Hash Join

By : Kasim Wirama, MCDBA

 

 

Nested loop, merge join and hash join is common operators which you will find out in execution plan as long as your query contains any logical joins (inner join, outer join, cross join, semi join). These 3 brothers are classified into physical joins. For performance perspective, they fit in some conditions. I would tell you characteristics for each of them.

 

Nested loop join is best to handle small input sets with index on inner input set specified by join key. It is suitable for OLTP where there are a large number of concurrent users. It supports equijoin though not required, and either left outer join and left (anti) semi join. If you have dynamic cursor, make sure nested loop is the only brother that is in. you can rely ordering issue if you want it gets ordered based on outer input.

 

Merge join is best to handle medium to large input sets. A better query plan will come up with index one- to-many merge join so that it would be able to support large number of concurrent users. Notice that at least one equijoin required in join predicate, so that optimizer will prefer merge join. Planning carefully to your tempdb database capacity is recommended if you found many-to-many merge join. Compared to join spectrum at nested loop, merge join supports all logical join. Because no inner and outer hierarchy like nested loop, merge join can preserve order of either of input sets.

 

Last brother of physical join is hash join. The join is best to large input sets. This typical characteristics will be found in datawarehouse environment with parallel query execution, so it supports limited number of concurrent users. Hash join will hold its output until all rows of first input set are completely build into hash table, and remember that hash join will be in-memory storage so it takes memory. If memory is not sufficient, it will spill out to tempdb as temporary repository of build table, in worst case it turns out to bailout. And like merge join, it supports all logical joins and at least one equijoin required.

 

Share this post: | | | |
Nested Loop, Merge Join, and Hash Join algorithm

Nested Loop, Merge Join, and Hash Join algorithm

By : Kasim Wirama, MCDBA

 

If you observe execution plan on your query using join types, you will find out the following operators: nested loop, merge loop and hash join. I would like to describe algorithm for each of them.

Nested loop

As its name implies, it will iterate each rows at outer input set to query rows at inner input set. The cost will be multiplication between inner input and outer input rows. The algorithm for nested loop is shown below :

For each R1 in outer input set
                For each R2 in inner input set
                                If R1 match with R2
                                                Output (R1, R2)

R1 and R2 are rows for outer and inner input sets.

Merge Join

Merge join will output rows from both input sets that have their matches by comparing 2 inputs sets that have been already sorted. Cost of merge loop is sum of number of both inputs. Algorithm for merge join is shown below :

Read first R1 at first input set
Read first R2 at second input set
While not reach the end of either of the inputs
begin
                if R1 match R2
                begin
                                output (R1, R2)
                                get next R2
                end
                else if R1 less than R2
                                get next R1
                else
                                get next R2
end

Hash join

Hash join will output qualified resultset by comparing hash value of second input set to hash value of first input set. It will create hash table (called build table) for first input set, then compare hash value of second input set to the build table. It is undesirable operator in most OLTP scenarios. The algorithm for hash join is shown below :

Create build table
For each R1
begin
                generate hash value of R1 join key
                insert into build table to appropriate hash bucket
end
for each R2
begin
                generate hash value of R2 join key
                for each R1 in corresponding hash bucket
                                if match R1 and R2
                                                output (R1,R2)
end

Share this post: | | | |
Posted: May 18 2008, 02:31 AM by Kasim.Wirama | with no comments
Filed under:
Types of Join

Types of Join

By : Kasim Wirama, MCDBA

 

In this article, I would like to introduce types of  join in SQL Server. I give examples inner join, cross join, left outer join, full outer join and left semi join and left anti semi join.

Before I progress to each joins, here is the tables script as example :

CREATE TABLE customer

(custid CHAR(3) NOT NULL PRIMARY KEY,

 firstname VARCHAR(10) NOT NULL);

GO

 

CREATE TABLE sales

(salesID INT NOT NULL PRIMARY KEY,

 custid CHAR(3) NULL,

 qty    INT);

GO

 

INSERT INTO customer (custid, firstname) VALUES ('ant','anton');

 

INSERT INTO customer (custid, firstname) VALUES ('rng','rangga');

 

INSERT INTO customer (custid, firstname) VALUES ('joh','johan');

 

INSERT INTO sales (salesID, custid, qty) VALUES (1,'ant',10);

 

INSERT INTO sales (salesID, custid, qty) VALUES (2,'rng',20);

 

INSERT INTO sales (salesID, custid, qty) VALUES (3,'rng',25);

 

INSERT INTO sales (salesID, custid, qty) VALUES (4,'smi',10);

Inner join

Inner join will join 2 input sets and output them if both of them match the join criteria. To display customer and sales related to the customer, here is the query and output

SELECT c.custid, c.firstname, s.salesID, s.custid, qty

FROM customer AS c

INNER JOIN sales AS s

ON c.custid = s.custid

custid firstname  salesID     custid qty

 

------ ---------- ----------- ------ -----------

 

ant    anton      1           ant    10

 

rng    rangga     2           rng    20

 

rng    rangga     3           rng    25

 

(3 row(s) affected)

Cross join

Cross join will join 2 input sets and each rows in both 2 inputs will have combination for each other. Most business requirement doesn’t use this join frequently. Cross join doesn’t need join predicate. The example and result for this join are as below :

SELECT c.custid, c.firstname, s.salesID, s.custid, qty

FROM customer AS c

CROSS JOIN sales AS s

 

custid firstname  salesID     custid qty

 

------ ---------- ----------- ------ -----------

 

ant    anton      1           ant    10

 

ant    anton      2           rng    20

 

ant    anton      3           rng    25

 

ant    anton      4           smi    10

 

joh    johan      1           ant    10

 

joh    johan      2           rng    20

 

joh    johan      3           rng    25

 

joh    johan      4           smi    10

 

rng    rangga     1           ant    10

 

rng    rangga     2           rng    20

 

rng    rangga     3           rng    25

 

rng    rangga     4           smi    10

 

(12 row(s) affected)

Notice that number of rows in cross join is the multiplication of number of rows in customer and sales table.

Left outer join

Left outer join will output all rows from left input sets based on specified join predicate, even though rows from left input sets doesn’t necessarily have its match at right input sets. For example : display all customers with existing and non existing sales.

 

SELECT c.custid, c.firstname, s.salesID, s.custid, qty

FROM customer AS c

LEFT OUTER JOIN sales AS s

ON c.custid = s.custid

 

custid firstname  salesID     custid qty

 

------ ---------- ----------- ------ -----------

 

ant    anton      1           ant    10

 

joh    johan      NULL        NULL   NULL

 

rng    rangga     2           rng    20

 

rng    rangga     3           rng    25

 

 

 

(4 row(s) affected)

Notice that custid Joh doesn’t have sales. A LEFT OUTER JOIN B will be same as B RIGHT OUTER JOIN A.

SELECT c.custid, c.firstname, s.salesID, s.custid, qty

FROM sales AS s

RIGHT OUTER JOIN customer AS c

ON c.custid = s.custid

Full outer join

This join will join both input sets, and will output each rows of both of them even though they don’t have pairs for each other. Let’s say, I would like to display all customer and all sales information, with the following example :

 

SELECT c.custid, c.firstname, s.salesID, s.custid, qty

FROM customer AS c

FULL OUTER JOIN sales AS s

ON c.custid = s.custid

 

custid firstname  salesID     custid qty

 

------ ---------- ----------- ------ -----------

 

ant    anton      1           ant    10

 

joh    johan      NULL        NULL   NULL

 

rng    rangga     2           rng    20

 

rng    rangga     3           rng    25

 

NULL   NULL       4           smi    10

 

 

(5 row(s) affected)

Notice that custid joh doesn’t have sales, and salesid 4 doesn’t have corresponding customer information in customer table.

Left Semi join

Left semi join will output outer input set which exists in inner input set and remove duplication of resultset, it is look like inner join. SQL Server doesn’t have command for semi join, the semi join is introduced in EXISTS or IN keyword.
Illustration for semi join is to display customer information that has sales.

SELECT c.custid, c.firstname

FROM customer AS c

WHERE EXISTS

  (

    SELECT * FROM sales AS s

    WHERE s.custid = c.custid  

  );

 

custid firstname

 

------ ----------

 

ant    anton

 

rng    rangga

 

 

(2 row(s) affected)

 

Or using IN keyword :

SELECT