October 2007 - Posts
Checkpointing
in SSIS
By :
Kasim Wirama, MCDBA, MVP
SQL Server
If you ever developed
sophisticated DTS package, and there is another error in one of the task,
usually you have to reexecute the package from
beginning. For complex package and big data that get involved in ETL process,
it will consume considerably amount of time wasted. In SSIS, you have option
not to start package from beginning in case the package gets error. The feature
in SSIS to address this issue is checkpoint.
Checkpoint will note execution
information to a file, when the task in a package gets failure to execute. When
you correct the task, and re-execute the package, the beginning of execution
will start at the failed task not the first task of a package. The package will
read execution in the file, and resume executing the task and the rest. Once it
is successfully executed, the package will remove the file. Nice feature that
help you a lot when you develop a sophisticated package.
To enable checkpointing
of a package, you define package’s properties in Checkpoints section, you
specify Checkpointfilename, set CheckPointUsage
to IfExists or always, and set to true for SaveCheckpoint. Then you set FailPackageOnFailure
property of a task to true, otherwise the package will not create checkpoint
file if it fails.
To simulate the problem, create 3
execute SQL Task package, set second package property name ForceExecutionResult
to failure, and run the package. If you notice the output window, package
status will have failure status, then you will find checkpoint file that you specify
on Checkpointfilename property. Let’s say you have
corrected the task, for demonstration purpose, set ForceExecutionResult
property of second task from Failure to success. Run the package again, this
time you notice that package will be executed from the second task (failed
task) rather than first task. Once it is successfully executed, the checkpoint
file will no longer exist, because the package deletes
the checkpoint file that no longer useful.
Checkpoint feature in SSIS is
brand new feature that worth to be implemented, to make your SSIS package is
more reliable and manageable.
Managing Transaction in SSIS
Managing Transaction in SSIS
By
: Kasim Wirama,
MCDBA, MVP SQL Server
SSIS supports transactions as you
usually use in TSQL, there are 2 kinds of transactions in SSIS
:
- Distributed Transaction
This
transaction uses 2 phase commit transaction with Distributed Transaction
Coordinator (DTC) service. Make sure this service runs before you run your
package that uses the service. You don’t have to explicitly state begin tran…commit/rollback
tran, because they are handled automatically by DTC
service
- Native SQL Server engine transaction
This
transaction uses explicit BEGIN TRAN/COMMIT/ROLLBACK TRAN, explicitly in
Execute SQL Task component. You must setting up the involved connection
manager’s RetainSameConnection property from false to
true.
For first alternative, there are
3 options for TransactionOption properties
:
- NotSupported
The component
will not join to transaction scope.
- Supported
The component
will join parent’s transaction scope if exists
- Required
The component
will join parent’s transaction scope if exists, otherwise it will start new
component.
Let’s start by create 3 Execute
SQL Tasks that refers to same connection manager. Point to AdventureWorks,
and you can create dummy table called test with 1 character column. Put there 3
each of these 3 statements into each of Execute SQL Tasks.
Create table dbo.dummy
(cola varchar(10) not null);
Insert into dbo.dummy
values (‘a’);
Drop table dbo.dummy;
Set up property TransactionOption from Supported (default) to Required for package level. Make sure DTC service already
runs, otherwise, the package will fail.
Run your package, and there will
be a table created and populated in AdventureWorks
database.
For second alternative, you don’t
have dependency to DTC service. DTC service is used when you want span transaction
over more than 1 connection, tasks even more than 1 package, otherwise,
it is simple to use native transaction TSQL statement.
Managing One Temporary Table in SSIS
Managing One Temporary Table in SSIS
By
: Kasim Wirama,
MCDBA, MVP SQL Server
In T-SQL, you must have used
temporary table to keep temporary resultset.
Temporary table is unique per connection, so if there is
2 users creating connection to create temporary table, even though the name is
same, but two of them doesn’t interact each other. How it becomes possible? In
engine level, SQL Server create temporary table with suffix with connection id.
If you create temporary table in
SSIS, you must refers it to the same connection manager, otherwise SSIS will
consider the temporary table is created and manipulated under different
connection id.
You can create connection
manager, pointing to tempdb database. Now I would
like to do temporary table creation , insert new
record, and drop the table, each of them is put on separate Execute SQL Task.
If you run the package, it will
display red background in second task, the error message is :
"Invalid object name '<temporary table>’.". It seems that SSIS
consider the connection of the second task is different from the connection of
the first task, even though both tasks refers to the
same connection manager.
You should see property of the
connection manager called RetainSameConnection. The
default value is false, meaning that SSIS will create another connection
session. To just make current connection of one task is shared to other
connection session of previous task, you can change the default value from
false to true.
Run you package again. This time,
SSIS recognises connection session among three packages refers to one
connection session.
Setting Up dynamic property value at SSIS
Setting Up
dynamic property value at SSIS
By
: Kasim Wirama,
MCDBA, MVP SQL Server
SSIS offers more flexibility and
debugging feature that doesn’t exists in previous ETL version (DTS). This
article I will let you know one of property of each component or task that
makes your SSIS package more flexible through Expression Editor and more
configurable ever than before. Even you can change your parameter value from
XML file.
For example, I would like to
iterate each files in particular folder, and I make this particular folder is
configurable, so I have change input parameter from outside SSIS package, input
parameter will be got from outside SSIS environment for example, from XML file
or through command line.
Let’s jump into new SSIS package
for making it clear for you. First, you define package level type SSIS
variable. Name it DirectoryName and FullFileName, and their types are string.
Then you drag For Loop Each
container component, and configure its property through Foreach
Loop Editor. At Collection section, you define whatever string inside Folder
textbox, and make sure Fully Qualified radio button at Retrieve File Name
section is selected. Now the dynamic part is located in Expressions of the
component. Click on it to display Property Expression Editor. On Property
column select Directory and on Expression click ellipsis button to display
Expression Builder, and drag variable DirectoryName
from Variables list box to Expression textbox, you can click Evaluate
Expression button to make sure before you go back to Expression Editor window
by clicking OK. Now you can change DirectoryName not
directly into Foreach Loop component, but from
variable.
The task has output of full file
name (with folder path), and you need to provide variable that will receive
return string value from the task. Let’s go to Variable Mappings, and set
Variable column to variable FileName, and click OK to
go back to Control Flow designer. Drag Script Task into Foreach
Loop Container task, and open its editor, on Script section specify FileName in ReadOnlyVariables
because I just want to pop up its value through message box. Click on design
script button to open another new Visual Studio IDE, and type MsgBox(Dts.Variables(“User::FileName”). Value.ToString()),
close it.
Now you fill in DirectoryName variable, that you
want to pop up all files inside the directory. Run the package. And it will pop
up file names whose files reside in the directory.
Now you need to expand your DirectoryName variable to outside SSIS package, so each
time you don’t have to edit your SSIS when directory name changes. Choose SSIS
menu and select Package Configurations. Click Add button. Choose XML configuration file type and
specify its name, on next screen you choose Value property of DirectoryName.
Now you can specify command line
that calls the package as below:
DTEXEC /File “your dts file path” /CONFIGFILE “your xml config
file path” /MAXCONCURRENT “-1” /CHECKPOINTING OFF /REPORTING N
With command line specification
above, you can achieve same behaviour as it is in Visual Studio IDE. Your
package parameter doesn’t tightly coupled with its
package by extracting out to XML file, so you can change the value in XML file
without having to change to package.
Slowly Changing Dimension in SSIS
Slowly Changing Dimension in SSIS
By: Kasim Wirama, MCDBA, MVP SQL Server
Another SSIS component that is
interesting to examine is Slowly Changing Dimension. Term of SCD (slowly
changing dimension) is used to describe changing row attribute in dimension or
lookup table, such as change of product price. Changing of these row attributes
has some implications of table design, depending whether business user want to
keep its history or just override the destination the cell value.
In this case, there are 3 solutions
depending on how scenario business users would like to choose:
- do not keep historical data
it’s simple to be implemented, but users couldn’t see
historical value, because old value is overridden by new value
- keep historical data
users can see its historical value, but table will grow
bigger, and transaction table that refers to lookup table should be updated its
foreign key reference to lookup table (SCD table)
- keep historical data in one column
users can see last history value, but you will implement by
your own code because the component do not provide for this kind of solution.
Let’s take a look below sample. I
create simple table called dbo.EmployeeSalary that
keeps salaries of each employee. You can take scripts below :
create table dbo.EmployeeSalary
(
EmployeeID int not null identity(1,1) primary key,
EmployeeNumber
char(3) not null ,
EmployeeName varchar(10) not null,
SalaryPerMonth int not null,
[Address] varchar(15) not null,
SalaryStatus varchar(10) not null default ('current')
);
Now, there are some changes,
first change is to raise salary of John from 3500 to 4000, second one is to
change smiley address from garden road to Sudirman
road. Users want to keep history of salary changes but override address
changes. So for this case, you will examine how to use this component. But here
is the text file that contains data changes, named it list.txt
empnumber empname salary permonth address
001 john 4000 orchad road
002 smiley 3000 sudirman road
Create SSIS package, add Flat
file source and SCD component to Data Flow designer, connect two of them, point the Flat File source component to the text file
(list.txt), change column type of salary permonth
from string to integer. On SCD wizard,
you create destination OLEDB point to table dbo.EmployeeSalary,
and setting up address (Input column) to Address (Dimension column) as Not a
key column, similar to empname, salary per month, and
business key for empnumber because it uniquely
identifies each instance of employee from business user perspective.
Next, you specify SalaryPerMonth as HistoricalAttribute
and Address as ChangingAttribute.
On HistoricalAttributeOptions,
select SalaryStatus on Columns to indicate current
record, Current at Value When Current, and Expired at Expiration Value.
When you complete the wizard, run
the package. If the package runs successfully, you will get 3 rows at dbo.EmployeeSalary. With expired row on salary changes belonged to John and Smiley’s updated
address.
If
you see final result of the package, SCD component will expand to several
component, so it will make implementation of SCD much more easier in SSIS
world.
Unpivot columns in SSIS
By : Kasim Wirama, MCDBA, MVP SQL Server
If you see my previous article about pivoting data using SSIS component called Pivot, you will agree with me that it is easy to use, all you need to do is to define relational query that in turns will be converted to the pivot format. Now here is the vice versa condition, you are given table in pivot format, and there is a requirement that need to convert from pivot format to unpivot format. SSIS provides the component to make it possible and easy to use, the component name is Unpivot component.
Some properties of the component that you must configure are :
1. input columns
columns that are needed as input column stream
2. destination column
a column name that you will unpivot from pivoted columns to unpivoted rows
3. pivot key value
Unique name for each pivot columns, usually same with input columns.
4. pivot key value column name
New column name in unpivoted form that contains caption name of pivoted columns.
Let’s look pivoted form at AdventureWorks database, and you query select statement from view called [Sales].[vSalespersonsalesbyfiscalyears]. You notice that it displays sales amount for each sales person from 2002 until 2004.
And final results of unpivoted as follows :
SalesYear AmountSales SalesPersonID FullName Title SalesTerritory
2002 1951086.8256 275 Michael G Blythe Sales Rep Northeast
2003 4743906.8935 275 Michael G Blythe Sales Rep Northeast
2004 4557045.0459 275 Michael G Blythe Sales Rep Northeast
2002 2800029.1538 276 Linda C Mitchell Sales Rep Southwest
2003 4647225.4431 276 Linda C Mitchell Sales Rep Southwest
2004 5200475.2311 276 Linda C Mitchell Sales Rep Southwest
…..so forth
SalesYear and AmountSales are new columns in unpivoted format. Pivot key values are pivoted column (2002-2004). Pivot key value column name is caption name of pivot column name that is mapped to SalesYear unpivot column name. Destination columns are AmountSales. Input columns are all columns including pivoted columns.
You can see how it works by starting new packages, and dragging OLEDB source component, OLEDB destination component, and Unpivot component. Connect each of them from Source – Unpivot – Destination, and specify query for the view at source component. If you open transformation editor of unpivot component, you will see 4 properties as I mention above, configure it. Configure columns of salespersonid, fullname, title, salesterritory as pass through column, the rest (pivoted columns) as input columns.
Run your package, and the final results similar to what I display above.
With fewer steps of configurations, you will do unpivot data extremely easily.
Referring To Your Reference Table with Fuzzy Lookup
By: Kasim Wirama, MCDBA, MVP SQL Server
One of problem of ETL world is referencing non existent data. For example when one column of fact table that is foreign key to a dimension. It is usually caused by bad data in the fact table. So how to handle it more appropriately in cleansing data with less effort. The answer is that using Fuzzy Lookup. This component will help most of painful process in dealing with data cleansing.
This component provides 2 important informations besides cleansed data:
1. Similarity
Is a number between 0 and 1 representing similarity. You can define how close the similarity is. The close the number to 1, the closer similarity should be met.
2. Confidence
Is a number between 0 and 1 representing confidence of match relative to set of matched results.
I will show you how to use this component.
First you make a flat file named employeelist.txt, its content is as below, separated with tab among columns.
EMPNAME EMPTITLE
HOWARD GEN MANAGER
MATTHEW GENERAL MANAGER
LINDA GENERAL MGR
SUSAN JUNIOR MANAGER
ERIC JUNIOR MGR
JOHN JUN MANAGER
Next you created below script in tempdb database, a lookup table :
create table dbo.OccupationTitle
(
OccupationTitleID int not null,
OccupationTitleDesc varchar(30) not null
);
go
insert into dbo.OccupationTitle values (1,'GENERAL MANAGER');
insert into dbo.OccupationTitle values (2,'JUNIOR MANAGER');
Now, you create SSIS package with Visual Studio 2005, more details how to create this, just refer to early links about SSIS.
Drag Data Flow task into Control Flow designer, double click on the task to switch to Data Flow designer. Drag Flat File Source, double click on it, create new Flat File Connection Manager by clicking New button. And create your flat file connection manager as previous article I ever write before, with Column Names in the first data row checked.
Drag Fuzzy Lookup and connect to it. Double click on it, specify OLEDB connection manager and select reference table that you have just created on previous step
On columns tab connect arrow between EMPTITLE and Occupation TitleDesc, specify input column to EMPTITLE, Lookup column to OccupationTitleDesc, Mapping Type to Fuzzy, and check all comparison Flag check box. And check all column on Available Lookup Column and Available Input Column.
On Advanced tab, you can define your own Similarity threshold, if you have no idea, specify it to 0.5.
Drag Recordset Destination, connect Fuzzy Lookup component to it. Create variable with type Object. Double click on Recordset Destination, specify VariableName property to the variable. On Available Input Columns, check all columns. And click OK.
If you want to see contenct of Recordset Destination, you can add Data Viewer component by right clicking arrow between Recordset Destination and Fuzzy Lookup. Now you can run the package.
You can compare original data and cleansed data between EMPTITLE column and OccupationTitleDesc column, and Fuzzy Lookup will find its OccupationTitleID.
What a useful component for data cleansing.
Pivoting data with SSIS
By : Kasim Wirama, MCDBA, MVP SQL Server
SSIS provides other interesting component for data pivoting. It is called Pivot component. It will automate pivot requirement. There are some mandatory properties that you should set up to make it display your data correctly.
1. Pivot Column
Is a column that you would like to change it from rows to columns.
2. Row Column
Is a column that comes from rows in a column.
3. Value column
Is an aggregated value based on row column.
For example, you will display total quantity for first 3 months of one year sales quantity for each product. In this case you will specify month as pivot column, content of month as Row column, and total quantity as Value column, as query below :
Select
p.[name],
p.productnumber,
datepart(mm,th.transactiondate) as month,
sum(th.quantity) as total
from production.product as p
inner join production.transactionhistory as th
on p.productid = th.productid
where th.transactiondate between '20040101' and '20040301'
group by p.[name],p.productnumber,datepart(mm,th.transactiondate)
order by p.[name],p.productnumber,datepart(mm,th.transactiondate)
You can try to use it by dragging OLEDB destination and Pivot component to Data Flow designer and connect OLEDB to Pivot component.
In Input tab of Pivot component, you check all columns, and in Input and Output Properties tab, you can see all input columns that you specify in second tab. Now you specify PivotUsage property for Name column of input columns to 0 (row), productNumber column to 1 (key row), month column to 2 (pivot column), total column to 3 (pivot value).
Expand Pivot Default tab, and add output column by clicking Add Column button. Name it ProductName, set Source Column property to LineageID of Name column (input column). Add new column, name it ProductNumber, set its Source Column property to LineageID of ProductNumber column (input column).
Add new output column, name it January, set Pivot Key Value to 1, and Source Column to LineageID of total column (input column), similar to new columns for February and March, the only difference is setting up their PivotKeyValue properties.
Last step is add RecordSet destination, and connect Pivot component to it, add variable with object type and assign It to VariableName property of first tab, check all columns at the second tab.
You run the package. You will see month columns at row level becomes column names.
I am thinking of writing SQL Server e-book focusing on T-SQL functionality of SQL Server 2005 through project Otak (http://otak.csharpindonesia.net/).
.NET Developers especially Indonesia .NET developers should know how to leverage hidden power of T-SQL SQL 2005.
More brains are more appreciated than only one brain. So through my blog, I would like to encourage your community spirit to contribute to Indonesia SQL Server community to contribute your knowledge in respect of my proposed topic.
I look forward to hearing your enthusiasm by sending to my email (kasim_wirama@yahoo.com).
Recognize Similar Word with Built in Fuzzy Functionality in SSIS
By : Kasim Wirama, MCDBA, MVP SQL Server
SSIS has one most interesting component that is very handy in cleansing your data. You might face terrible moment when cleansing out your data. In DTS, no other way than manual way to recognize a word actually have one reference to one value, let say “Vice President” actually same with “Vice Pres”. Now this kind of manual task is made efficient by fuzzy logic wrapped to Fuzzy Grouping component in SSIS.
To understand how useful this component is, lets make some preparation before creating new SSIS package.
You create file text named Similar word.txt, save it to a directory you want. The content of the file is :
pres director
president director
vice president
vice pres
Create a table, to keep original data and clean data as results from Fuzzy Grouping component.
Create table dbo.Fuzzy
(
original_data varchar(30) not null,
clean_data varchar(30) not null
);
Now, you open Visual Studio 2005 or Business Intelligence Development Studio (BIDS), open/create SSIS project and create new package.
You create 2 connections manager. One is flat file connection manager intended for use of Flat File Source component and the other one is OLE DB Connection intended for use of OLE DB Destination.
For OLE DB Destination, set database where your tables that is just created to.
Drag Data Flow task to Control Flow designer, double click on it to switch to Data Flow designer. Drag Flat File Source component and set its Flat File Connection Manager to the Flat File Connection Manager that you have just created on previous step.
Drag Fuzzy Grouping component and connect arrow to it from Flat File Source. Having clicked on instance of Fuzzy Grouping component, on first tab, specify OLEDB connection manager where your table resides (Fuzzy table). On second tab, click checkbox on first column that you would like clean its contents. And the third tab (Advanced tab) you can set Similarity Threshold to 0.70 (the more bigger value the more distinct values, the more less value the more less distinct value, you can experience it until you feel it fits for your case). After that, save its setting by clicking OK button.
Drag OLE DB Destination to Data Flow designer. Connect arrow from previous component to it. Establish mapping of its property (OLE DB Destination Editor, Mappings item on left list box), Column 0 to original_data and Column 0_clean to clean_data. Close OK button.
Now you’ve done it, run the package. Then your table (Fuzzy table) is populated with clean data. If you query the table there is only 2 distinct values of clean_data column, i.e. : pres director and vice president.
After you tried it, you will value this component. And your hectic day for data cleansing effort will much more efficient than before.
Extract Particular Word/Phrase with SSIS
By: Kasim Wirama, MCSD.NET, MVP SQL Server
SSIS provides filtering some particular words at free text with Term Lookup component. It will read Unicode text to find words that are in its list by looking up to lookup table.
To understand how useful this component is, lets make some preparation before creating new SSIS package.
You create file text named Term.txt, save it to a directory you want. The content of the file is :
My mother buys me ice cream.
I like to eat ice cream.
But, my brother doesn't like to eat ice cream.
Anyway, my mother still buys 2 ice creams for each of us.
Create 2 tables, one is for lookup and the other is for keep results after looking up.
create table TermLookup
(
word varchar(20) not null
);
GO
create table dbo.Results
(
term nvarchar(30) not null,
score int not null
);
GO
Insert some records into TermLookup table as below shown
insert into TermLookup values ('ice');
insert into TermLookup values ('cream');
insert into TermLookup values ('ice cream');
insert into TermLookup values ('mother');
insert into TermLookup values ('brother');
Now, you open Visual Studio 2005 or Business Intelligence Development Studio (BIDS) if you haven’t installed Visual Studio 2005. Actually BIDS IDE is derived from Visual Studio 2005’s IDE. Then open/create SSIS project and create new package.
You create 2 connections manager. One is flat file connection manager intended for use of Flat File Source component and the other one is OLE DB Connection intended for use of OLE DB Destination.
For Flat File Connection Manager, just set Data Type property of first ordinal column (and only one column) as Unicode String. For OLE DB Destination, set database where your tables that is just created to.
Drag Data Flow task to Control Flow designer, double click on it to switch to Data Flow designer. Drag Flat File Source component and set its Flat File Connection Manager to the Flat File Connection Manager that you have just created on previous step.
Drag Term Lookup and connect arrow to it from Flat File Source. Having clicked on instance of Term Lookup component, on first tab of Term Lookup Transformation Editor window, specify reference table name (TermLookup table). On second tab, drag arrow from first column of Available Input Columns to first column to Available Reference Columns. And save its setting by clicking OK button.
Drag OLE DB Destination to Data Flow designer. Connect arrow from previous component to it. Establish mapping of its property (OLE DB Destination Editor, Mappings item on left list box), Term to term, Frequency to score. Close OK button.
Now you’ve done it, run the package. Then your table (Results table) is populated records based on entries lookup of LookupTerm table.
Identifying Occurrence of Some Words with SSIS
By: Kasim Wirama, MCDBA, MVP SQL Server
If you ever did some text analysis, you must ever collect occurrence frequency of words in free text. SSIS makes it very easy for you by automating such a task that you manually did it before by providing Term Extraction component.
The component actually counts number of occurrence of words in free text. It has 2 columns as result of analysis, i.e. Term and Score. Term contains word/phrase or combination of the two. Score gives information number of occurrence of the word. It has 2 forms, first is number of occurrence (integer type), second is a bit complex real number, called TFIDF (Term Frequency and Inverse Document Frequency). TFIDF formula is (frequency of term) * log ((number of rows in free text by sample)/(number of rows with term or phrase)).
The component can display number of occurrence that fulfill minimal occurrence threshold that you can consider and also maximum length of term that you still consider as well.
To get clear how the component works, lets make text file sample, called term.txt, fill in on the following text below :
My mother buys me ice cream.
I like to eat ice cream.
But, my brother doesn't like to eat ice cream.
Anyway, my mother still buys 2 ice creams for each of us.
Now, you create new SSIS package with Visual Studio 2005, drag Control Flow task to Control Flow designer. Double click on it to switch to Data Flow designer window.
Create Flat File Connection Manager that point to the flat file that you created before adding new package on previous step. Specify connection manager name, and fill out or browse File Name text box. Make sure you uncheck Column Names in The First Data Row. Choose Advanced item on left list box, click Suggest Types button, then change data type from string to Unicode String (DT_WSTR) and click OK to save setting and go back to Data Flow designer area.
Drag Flat File source to Data Flow designer area, double click on it to open Flat File Source Editor window, choose connection manager item that you just created on dropdown list box, click Columns item at left list box. Give check mark on item of Available External Columns, and click OK to save the setting and get back to Data Flow designer.
Drag Term Extraction component to Data Flow designer, connect green arrow connectivity from Flat File Source to it. Double click to open Term Extraction Transformation Editor window. Click Advanced tab, choose Noun and Noun phrase to find frequent occurrence, choose source type to Frequency, and set value 2 on Frequency Threshold parameter. It means that I would like to filter out occurrence that is less than 2, and make sure to uncheck Use Case Sensitive Term Extraction. Click OK to finish configuring the component.
You can create table to receive results from the component below, executed in SSMS.
create table dbo.Results
(
term nvarchar(30) not null,
score int not null
);
Create OLEDB connection manager for SQL Server
Drag OLE DB Destination to Data Flow designer area, connect connectivity from Term Extraction component to it. Double click on it, set OLE DB connection manager properties to item of previous step, and table name to dbo.Results. On Mappings item of left list box, set connectivity from items of Available Input Columns to items of Available Destination Columns, and click OK button to save its settings.
Now you run the package. After the package runs successfully, inspect your table, there are 2 rows that conforms to your term extraction frequency threshold.
SSIS makes check task of term occurrence very easily compared to DTS on previous version of SQL Server 2005.
Execute Row per Row Basis with OLEDB Command Component of SSIS
By : Kasim Wirama, MCDBA, MVP SQL Server
If you have ever used Query Driven Task in DTS, you must be familiar with event for execute different stored procedures for insert, update, delete or select query. In SSIS, they are unified to one kind of statement, so SSIS makes it centralized and manageable.
This functionality is provided by SSIS through OLE DB Command component. The component is available in Data Flow designer.
I will show you how to use it, especially import sample from outside database, for example plain text file.
First, you can make dummy text file as below
Name,Address
Susan,Oxford street
Eric,Orchard road
Save it as Customers.txt, I intentionally make it very simple, because I would like to stress out how to use the component, in context of Extract, Transform, Load from heterogeneous data source.
Next you make a table, named Customers in tempdb database or whatever database you prefer. Its schema is as below
Create table dbo.customers
(
Name varchar(10) not null,
Address varchar(50) not null
);
Next, you create SSIS package with Visual Studio 2005, add new package by right clicking on SSIS Packages folder in Solution Explorer.
Make sure you are on Control Flow designer, drag Data Flow Task and double click on it to switch to Data Flow Designer. Drag Flat File Source Task, double click on it to open Flat File Source Editor window, click New button to create flat file connection manager, Flat File Connection Manager Editor is displayed.
Fill out Connection Manager Name, File Name pointed to customers.txt text file and check Column Names In the First Data Row, and choose Preview Item on left list box to make sure output stream is correct as you expect. Close the window by click OK button to return back to Flat File Source Editor window. Close the window to return back to Data Flow designer window.
Create new SQL Server OLEDB connection manager, that point to database tempdb and table Customers.
Drag OLEDB component to Data Flow designer window. Connect green arrow from Flat File Source to it.
Double click OLE DB component to open Advanced Editor for OLE DB command. On the first tab (Connection managers), choose the OLEDB connection manager that you have created on previous step, on second tab (Component Properties), at SqlCommand property, type script to insert row to Customers table : insert into dbo.Customers values (?,?).
? sign is placeholder for columns of Customers table.
Go to third tab (Column Mappings), you will see Available Input Columns that contains 2 columns (Name and Address from flat file). On Available Destination Columns you will see 2 columns as well (Param_0, Param_1) as representation of two ? signs. Connect Name column to first parameter, and Address column to second parameter. Click OK to go save its setting and get back to Data Flow designer.
Test you package that has been just created by right clicking on it, and choose Execute Button from context menu.
After your package is executed without error, check your table (select * from dbo.Customers) at SSMS (SQL Server Management Studio), it will display 2 records that are imported from flat text file.
Implementing Conditional Task Execution with SSIS
By: Kasim Wirama, MCDBA, MVP SQL Server
If you have ever developed DTS package, you must be familiar with Precedence Constraint (Success, Failure, Completion) move from one task to another task. Some enhancements for these Precedence Constraint is added with conditional that you can apply with expression such as @var1 > 5, that should be evaluated before move execution from one task to another task.
In SSIS, you can define execution condition not only by constraint (success or failure) but also by conditional expression (<, >, <= , >= , <>) and mixed (constraint and expression) with AND and OR operator.
Let say, you have 2 tasks at Control Flow Designer, first task is Data Flow Tasks, second task is Execute SQL Tasks, you want second task is executed when first tasks contains records and it is executed without error.
You can achieve this scenario by using RowCount Component that counts number of rows returned from previous tasks. This component requires a variable that saves value of number of rows.
You can try it out, by make new package. Make OLEDB connection at connection manager area. Specify OLEDB source as SQL Server , database AdventureWorks.
Next you create SSIS variables, name it Var1 with Int32 type. And drag Data Flow Tasks into Control Flow designer area, then double click to switch to Data Flow designer. Drag OLE DB source, specify connection manager from OLEDB connection that you have just created, and point to table AWBuildVersion.
Drag RowCount component, connect OLEDB Source’s green arrow to it, and then double click RowCount. At Component Properties tab, assign Var1 variable to VariableName property, close the window by clicking OK button.
Switch back to Control Flow designer, drag Execute SQL task from toolbox to below Data Flow tasks, and connect Data Flow tasks to Execute SQL task. Edit Execute SQL task, and assign ConnectionType property as OLEDB, Connection property as OLEDB connection, and SQLStatement as select ‘dummy’. Edit precedence constraint (green arrow between 2 tasks), and in Precedence Constraint Editor choose Expression and Constraint, fill out Expression entry with @Var1 > 1 (assume number of rows in AWBuildVersion table is 1, so the expected behaviour is Execute SQL Task has been never got executed because it doesn’t pass from conditional expression @Var1 > 1).
Close Precedence Constraint Editor by click OK button. And you execute package now, for first time second task doesn’t get executed because number of rows is 1
End package execution, edit the precedence constraint, this time you change Expression to @Var1 <=1, execute the package again. This time you see that second task gets executed. The usage of RowCount component comes into play for this scenario.
SSIS gives you more choice by providing Expression criteria, besides constraint (success, failure, completion) that must be fulfilled by next task before allowing the task gets executed and combination between constraint and expression.