SQL Server knowledge center

everything about SQL Server
See also: Other Geeks@INDC

September 2007 - Posts

Bagaimana mengubah stored procedure yang berjalan 16 menit menjadi 1 detik?

Bagaimana mengubah stored procedure yang berjalan 16 menit menjadi 1 detik?

 

By : Kasim Wirama, MCDBA, MVP SQL Server

 

SQL Server 2005 memiliki banyak fitur TSQL yang memberikan lebih banyak pilihan dibandingkan dengan versi sebelumnya, karena SQL Server 2005 mengimplementasikan beberapa standard yang ditetapkan dalam ANSI SQL 2003 setelah standard ANSI SQL 99.

 

Kasus query tuning yang menarik di sini berkenaan dengan implementasi SQL Server 2005 ke dalam stored procedure yang menyebabkan stored procedure tersebut berubah execution time-nya dari 16 menit menjadi 1 detik dengan row kembalian sebanyak 1 row.

 

Sebagai gambarannya, stored procedure tersebut memiliki beberapa query, dimana sebetulnya yang paling berat execution cost di sini terjadi pada sebuah query yang melakukan join antara 2 tabel, dimana tabel pertama memiliki row sebanyak 600 ribu sedangkan tabel kedua memiliki row sebanyak 53 juta. Join antara kedua tabel tersebut bersifat equijoin, ditambah dengan kondisi yang mengandung function SQL Server seperti replace/coalesce di sisi kolom seperti WHERE coalesce(kolomA,’’)=’’ dikombinasikan dengan filtering dengan input parameter. Sehingga sebetulnya query tersebut sederhana menjadi sangat kompleks.

 

Semula saya mengira ini hanya persoalan index, setelah dicheck ternyata kolom untuk melakukan joining antara kedua tabel tersebut telah dalam bentuk clustered index. Ada 3 kemungkinan join diluar kolom-kolom yang tidak di index, yaitu

  1. join antara clustered index dengan clustered index
  2. join antara non clustered index dengan clustered index
  3. join antara non clustered index dengan non clustered index

 

Diantara ketiga join tersebut yang ideal adalah kemungkinan pertama. Berarti seharusnya hasil kembalian 1 row tidak memakan waktu yang lama. Setidaknya itu menurut pengalaman saya melakukan query tuning di beberapa kasus project SQL Server. Saya menduga kerumitan kondisi WHERE yang tidak memenuhi kondisi SEARCH ARGUMENT yang merupakan bottleneck query tidak tidak scalable. Setelah saya normalisasi kondisi WHERE ke dalam bentuk SEARCH ARGUMENT, tetap ada perbaikan kecepatan eksekusi (meski sedikit) dari 16 menit ke 14 menit. L

 

Berarti normalisasi dan clustered index pada kolom untuk kedua tabel tersebut tidak banyak berpengaruh. Apa yang sebetulnya merupakan cause root lamanya query tersebut? Itu yang menarik untuk di telusuri lebih lanjut. Saya kemudian eksekusi ulang untuk melihat query plan, eksekusi dengan menampilkan query plan biasanya memakan waktu yang lebih lama, dalam kasus ini eksekusi berjalan selama 18 menit. Bila anda sering melihat query plan, tentu anda akan familiar sekali dengan operator di dalam execution plan seperti : nested loop, hash join, merge join, parallelism, dan lain-lain. Untuk kasus ini terdapat nested loop dan juga parallelism. Yang perlu saya kemukakan di sini, bila anda mendapatkan query anda berada dalam lingkup parallelism, berarti ada 2 kemungkinan, kemungkinan pertama adalah anda melakukan searching ke dalam setiap row di dalam sebuah tabel (performance problem akan terjadi bila jumlah row dalam tabel tersebut sangat besar) sehingga query optimizer memutuskan untuk membagi peran row retrieving ke sejumlah prosesor à berarti ada pemborosan resource server, kemungkinan kedua adalah query tersebut yang tidak optimal, sehingga akses data ke tabel-tabel yang terkait melebihi dari apa yang seharusnya dibaca, sebagai contoh untuk mendapatkan 300 row, query optimizer melakukan pembacaan lebih dari 300 row à berarti juga pemborosan resource server. Kedua solusi permanent untuk kedua kemungkinan tersebut adalah refactor query atau yang anda perlu tahu adalah query tuning. Operator lainnya yang perlu anda perhatikan adalah nested loop. Nested loop merupakan indicator yang tidak cukup baik karena apabila anda melihat jumlah row yang besar di dalam nested loop, berarti terjadi looping row dalam jumlah besar sehingga yang perlu saya lakukan di sini adalah memeriksa apakah penerapan index yang terkait sudah benar (mungkin juga belum tercreate index) atau anda membatasi jumlah looping di dalam nested loop. Trick dalam query tuning yang perlu anda ketahui. J

 

Saya melakukan perubahan dari inner join menjadi cross apply untuk meminimalisir akses row yang berlebihan, dan memindahkan criteria yang kompleks yang telah dinormalisasi (SEARCH ARGUMENT) dari dalam criteria WHERE ke dalam Common Table Expression (CTE).

 

Setelah saya coba banding execution cost antara query lama dengan query yang saya buat, perbandingannya adalah 99%:1%. Saya amati ternyata operator parallelism telah hilang dan jumlah query hasil implementasi cross apply menjadi berkurang dari 41000 row menjadi hanya 1 row. Saya coba jalankan query baru tersebut, waktu eksekusi memakan waktu 1 detik.

 

Tampaknya fitur T-SQL 2005 memberikan nilai tambah bagi performance database query.

Share this post: | | | |
Importing Image into SQL Server with SSIS

Importing Image into SQL Server with SSIS

 

Written by Kasim Wirama, MCDBA, MVP SQL Server

 

This article will show you how to importing image into SQL Server with SSIS. Component relating to this job is Import Column component. It will extract file from file system based on full path of the file, and do importing job into table. Of course, the destination table should have LOB type (nvarchar(max), varchar(max), image).

 

You can try it out with these steps I show you below:

  1. Create SSIS type project with Visual Studio, named Import Column.
  2. Then you create image directory, let say : C:\images
  3. Create some images in the folder (1.jpg, 2.jpg, 3.jpg)
  4. Create text file (list.txt) inside the folder. Insert content into it below :

C:\images\1.jpg

C:\images\2.jpg

C:\images\3.jpg

  1. You create table inside AdventureWorks database with table creation script below :

CREATE TABLE ImportImage

(

Picture image

);

  1. Get back to SSIS project that you have created in step 1, make sure you get focus to Control Flow tab, drag Data Flow Task into the Control Flow designer window, then double clik the task to switch focus to Data Flow tab.
  2. Drag Flat File Source into Data Flow designer, and double click to configure its properties to point to the flat file created at step 4.
  3. At Flat File Source Editor, click New button, to create new connection manager. Specify your connection manager name and browse your text file by clicking Browse button or directly fill in into File Name text box.
  4. After that, you click Columns item that is located inside left list box at Flat File Connection Manager Editor window to make sure content of the text file is loaded properly.
  5. Close all remaining window to return back to Data Flow Designer, now you have created your source text file. Next you drag Import Column component into Data Flow designer window, and connect green arrow of the Flat File Source that you have just configured in previous step to the Import Column component.
  6. Double click Import Column component to configure its properties so that it can import images into destination (table ImportImage at AdventureWorks database), that will be specified at step 17.
  7. In Advanced Editor for Import Column, check for column and then move to Input and Output Properties.
  8. At left list box, expand all item below Import Column Input, you will see column that you have checked in step 12. The column name may vary depending how you name it.
  9. expand all item below Import Column Output, you see no column defined under it, click Add Column button, name it Picture, look its properties and make sure DataType is image, and note its LineageID
  10. return back to input column that you see at step 13, find property named FileDataColumnID, and fill it with LineageID that you look at step 14.
  11. Click all OK button to return back to Data Flow designer, drag OLEDB destination, double click on it to configure its properties.
  12. Create new OLEDB connection manager to point to your AdventureWorks database. Choose data access mode : Table or view. And select your table that is created on step 5.
  13. Move to Mappings item on left list box, and make sure there is relationship between Picture column (from Available Input Columns) to Picture column (from Available Destination Columns).
  14. Click OK button to return back to Data Flow designer. And Press F5 to execute the package.

 

You will see green on each of the component (source – import column – destination) with 3 rows flow between each of them.

 

You can open SQL Server Management Studio, and focus to Adventureworks, and check whether any picture content exists in ImportImage table. It should have 3 rows in the table.

 

Happy exploring in SSIS world. J

Share this post: | | | |
Posted: Sep 16 2007, 11:39 PM by Kasim.Wirama | with no comments
Filed under:
Select Non Existent Record Without (NOT) Keyword for EXISTS/IN

Select Non Existent Record Without (NOT) Keyword for EXISTS/IN

 

By Kasim Wirama, MCDBA, MVP SQL Server

 

 

Usually, I am taught to display non existent record on one table by referencing some columns to the other tables with NOT EXISTS/NOT IN help.

 

From performance perspective, it gives suboptimal performance, because it doesn’t give efficient index usage (index scan).

 

So, how to avoid index scan, and change to better query to populate index seek instead?

 

Let’s consider this scenario below :

 

Table A contains records (1,’a’), (2,’b’), (3,’c’) with primary key first column (1,2 and 3).

 

Table B contains records (1,’a’), (2,’b’), (3,’c’), (4,’d’) with primary key first column (1,2,3,4)

 

I am asked to display records that exist in table B, but it does not exist in table A.

 

As my bad habit without got aware before, I give solution:

select b.cola, b.colb from “table B” where not exists (select cola from “table A” on “table A”.cola = “table B”.cola);

The same question has ever been asked in technical interview, I had no idea at that time. J

 

Finally, my colleague suggested to use left/right join, I think it was other alternative besides using NOT EXISTS/NOT IN, I came up with this query:

Select b.cola, b.colb from “table B” left outer join “table A” on “table A”.cola = “table B”.cola where “table A”.cola is null;

 

Be careful when you think, as I thought before, that the third query below will give same results as second query does.

Select b.cola, b.colb from “table B” left outer join “table A” on “table A”.cola = “table B”.cola and “table A”.cola is null;

 

Remember that filtering happens after all joining process among tables is done, if you use WHERE, otherwise all condition is processed before WHERE will be finalised.

 

Just aware about sequence of query engine process a query, it will help you avoid incorrect result between second and third query.

 

Without compromise query result, the second query gives better alternative compared to first one.

Share this post: | | | |
How to get clustered index information with DMV in SQL Server 2005

 

How to get clustered index information with DMV in SQL Server 2005

 Written by : Kasim Wirama, MCDBA, MVP SQL Server

I would like to share my code to query clustered index information for a table in SQL Server 2005 by leveraging dynamic management view (DMV).

 You just supply table name dan it results clustered index name together with its columns.

declare @tablename sysname;
Set @tablename = 'HumanResources.Employee';

@tablename sysname;
Set @tablename = 'HumanResources.Employee';

with cte (index_name,column_name, index_type) as
(

cte (index_name,column_name, index_type) as
(

select i.name , c.name , case when i.index_id = 1 then 'clustered' else 'non clustered' end

select i.name , c.name , case when i.index_id = 1 then 'clustered' else 'non clustered' end

from sys.index_columns ic join sys.columns c

on ic.object_id = c.object_id

from sys.index_columns ic join sys.columns c

on ic.object_id = c.object_id

on ic.object_id = c.object_id

and ic.column_id = c.column_id

join sys.indexes i

on i.object_id = ic.object_id

and ic.column_id = c.column_id

join sys.indexes i

on i.object_id = ic.object_id

join sys.indexes i

on i.object_id = ic.object_id

on i.object_id = ic.object_id

and i.index_id = ic.index_id

where i.index_id>0 and i.object_id=object_id(@tablename)

),

and i.index_id = ic.index_id

where i.index_id>0 and i.object_id=object_id(@tablename)

),

where i.index_id>0 and i.object_id=object_id(@tablename)

),

),

cte1 (index_name_cte1) as

(

(index_name_cte1) as

(

(

select index_name

from cte

where column_name in ('')

select index_name

from cte

where column_name in ('')

from cte

where column_name in ('')

where column_name in ('')

group by index_name

),

group by index_name

),

),

cte2 (index_name_include) as

(index_name_include) as

(

(

select index_name

from cte

except

select index_name

from cte

except

from cte

except

except

select index_name_cte1

from cte1

)

select index_name_cte1

from cte1

)

from cte1

)

)

select index_name_include,column_name from

(

select index_name_include,column_name from

(

(

select cte2.index_name_include, (select top 1 index_type from cte where index_name = cte2.index_name_include ) as index_type from cte2

) as D1 join cte on D1.index_name_include = cte.index_name

where D1.index_type = 'clustered';

select cte2.index_name_include, (select top 1 index_type from cte where index_name = cte2.index_name_include ) as index_type from cte2

) as D1 join cte on D1.index_name_include = cte.index_name

where D1.index_type = 'clustered';

) as D1 join cte on D1.index_name_include = cte.index_name

where D1.index_type = 'clustered';

where D1.index_type = 'clustered';

Share this post: | | | |