Extract Particular Word/Phrase with SSIS
Extract Particular Word/Phrase with SSIS
By: Kasim Wirama, MCSD.NET, MVP SQL Server
SSIS provides filtering some particular words at free text with Term Lookup component. It will read Unicode text to find words that are in its list by looking up to lookup table.
To understand how useful this component is, lets make some preparation before creating new SSIS package.
You create file text named Term.txt, save it to a directory you want. The content of the file is :
My mother buys me ice cream.
I like to eat ice cream.
But, my brother doesn't like to eat ice cream.
Anyway, my mother still buys 2 ice creams for each of us.
Create 2 tables, one is for lookup and the other is for keep results after looking up.
create table TermLookup
(
word varchar(20) not null
);
GO
create table dbo.Results
(
term nvarchar(30) not null,
score int not null
);
GO
Insert some records into TermLookup table as below shown
insert into TermLookup values ('ice');
insert into TermLookup values ('cream');
insert into TermLookup values ('ice cream');
insert into TermLookup values ('mother');
insert into TermLookup values ('brother');
Now, you open Visual Studio 2005 or Business Intelligence Development Studio (BIDS) if you haven’t installed Visual Studio 2005. Actually BIDS IDE is derived from Visual Studio 2005’s IDE. Then open/create SSIS project and create new package.
You create 2 connections manager. One is flat file connection manager intended for use of Flat File Source component and the other one is OLE DB Connection intended for use of OLE DB Destination.
For Flat File Connection Manager, just set Data Type property of first ordinal column (and only one column) as Unicode String. For OLE DB Destination, set database where your tables that is just created to.
Drag Data Flow task to Control Flow designer, double click on it to switch to Data Flow designer. Drag Flat File Source component and set its Flat File Connection Manager to the Flat File Connection Manager that you have just created on previous step.
Drag Term Lookup and connect arrow to it from Flat File Source. Having clicked on instance of Term Lookup component, on first tab of Term Lookup Transformation Editor window, specify reference table name (TermLookup table). On second tab, drag arrow from first column of Available Input Columns to first column to Available Reference Columns. And save its setting by clicking OK button.
Drag OLE DB Destination to Data Flow designer. Connect arrow from previous component to it. Establish mapping of its property (OLE DB Destination Editor, Mappings item on left list box), Term to term, Frequency to score. Close OK button.
Now you’ve done it, run the package. Then your table (Results table) is populated records based on entries lookup of LookupTerm table.