SQL Server knowledge center

everything about SQL Server
See also: Other Geeks@INDC

February 2008 - Posts

Indexing Tips for Optimal Database Performance

Indexing Tips for Optimal Database Performance

By : Kasim Wirama, MCDBA

 

I would like to give some tips how to do proper indexing that gives balance between SELECT query and DML (INSERT/UPDATE/DELETE) query.

 

Here are the tips:

 
  1. If you have surrogate key as primary key, make sure you create UNIQUE constraints.
  2. Add non clustered index as needed by testing it to make sure it will give significant performance on frequently executed query.
  3. If parent table has tight relationship with child table such as SalesOrderHeader and SalesOrderDetail, it is better to create indexing on foreign key column at SalesOrderDetail.
  4. Create “thin” index, “thin” here means that you not include so many columns, if you would like to have covered columns into the index, put them inside INCLUDE() keyword of an index. (see BOL about CREATE/ALTER INDEX in SQL 2005).
  5. If your user queries table with different search predicate, it is better to implement some thin indexes rather one “fat” index.
  6. Index gives SELECT responsiveness, but only cost to INSERT, partial cost and benefit to UPDATE and DELETE.
  7. Apply unique indexes if there is only unique constraints, because unique indexes would tell the optimizer that there is only one unique record in equality search predicate.

So manage your indexes wisely.

Share this post: | | | |
Would You Like To Be a MVP?

Would You Like To Be a MVP?

By : Kasim Wirama, MCDBA

 

Most of you must hear about MVP. What is MVP? MVP stands for Most Valuable Professional, it is a form of recognition from Microsoft on individual who has excellent skill and passionate to sharing through community contribution such as active member in mailing list, blog posting, participating in local/international events held by Microsoft or Microsoft community, and make some publications. You can see MVP website by navigating to http://www.microsoft.com/mvp

 

I know some of you have these criteria above, if you keep sharing consistently, Microsoft will know and recognize your contribution and community hopes that a MVP doesn’t lose his persistent effort to keep sharing and interacting with his community from time to time.

 

As a MVP, you will be the first customer that Microsoft will deliver most up to date content for latest technology, so it is expected you are very ready with everyone else in your fields, and your role will accelerate Microsoft technology adoption by customers.

 

All you do to your community eventually will go back to your self, you will have solid knowledge foundation for yourself and community, and you will be known well by your community. And if you are looking for IT job, I am sure you will have your dream job. It is a long journey and hard work but worth it, so don’t just work merely for your company but spare some time to contribute to IT community, it has good impact for yourself and the others. INDC makes me aware how valuable to transform my SQL Server skill from just a SQL hobbist to active member. I give my thanks to community through INDC.

 

Once again, I encourage and give you motivation and passion as IT roles to work and play harder to support Microsoft local community to strengthen your skill set. INDC is there, and it is waiting for your participation.

Share this post: | | | |
Full Scan, Partial Scan, Index Seek in Performance Perspective

Full Scan, Partial Scan, Index Seek in Performance Perspective

By : Kasim Wirama, MCDBA

 

This article, I would like to share about scanning and seek in SQL Server. Those three terminologies from above title contribute significant database performance.  You can see the information from textual or graphical execution plan. Each step is represented by operator in execution plan. Full scan is represented by Clustered Index Scan, or TableScan operator. Index seek could be in clustered/nonclustered index seek. And partial scan is between two of them.

Clustered Index scan is considered good when you access all or almost all data, so SQL Server engine access the table directly rather to index and expensive bookmark lookup. As information, bookmark lookup is found at SQL Server 2000/7.0, and new name for the operator in SQL Server 2005 is Key Lookup. If you have no clustered index, the operator name is RID Lookup instead of key lookup. A table that has no clustered index is called heap table. A table will perform better if it has clustered index, especially there is extensive update activity rather than heap table.

Other operator is seek operator. Generally seek operator is considered more efficient than scan operator because it doesn't have to access from beginning item to end of item of an index. For selective item (returning small result from a big table), it has good performance if SQL Server chooses index seek rather than index scan.

Actually there is a subtle operator, it is partial scan. A partial scan is scanning within a range, but no all record in that range is really relevant to search query. If you see in execution plan, they are identified as index seek, but if you examine more detail in text-based execution plan, usually it has "residual" keyword. Partial scan will perform a bit less efficient compared to index seek when the number of relevant rows is almost as many as number of rows in end results and vice versa.

In real scenario, from my experience, it is hard to achieve index seek for all problematic query, so usually I try to achieve partial scan if seek is not possible in some cases.

 

Share this post: | | | |
Posted: Feb 24 2008, 01:49 AM by Kasim.Wirama | with no comments
Filed under:
Implement Optimal I/O Subsystem for SQL Server

Implement Optimal I/O Subsystem for SQL Server

By : Kasim Wirama, MCDBA

Database performance is always considered as prone point of performance problem, one of I usually encounter is not optimal I/O subsystem. This article I would share points of planning optimal I/O subsystem.

 

Here are the points :

 

1.     If your disc subsystem is based on spindle, use faster spindles and distribute database workload evenly across those spindles. One of the way is use multiple filegroups. With multiple file groups, you can backup for each filegroup so backup and restore window period gets shortened.

 

2.     For fast write, implement RAID 1+0 or RAID 1. So place your transaction log, data file and tempdb to the RAID.

 

3.     Isolate tempdb with dedicated physical disc.

 

4.     Isolate your log file from data file, if possible isolate nonclustered index from clustered index if your table is heavily access with read and write, for example TransactionOrder table, this table is heavily written by front end, but it is also intensively read by back end (for example. Accounting staff need to generate point in time financial reporting).

 

5.     If you have n-core processor, for example 2-core processor, create n-files on tempdb.

 

6.     Always update HBA driver from your vendor.

 

7.     Performing I/O benchmarking against your disc to know maximum capacity that it could handle on peak hours with I/O stress test tool (for example : SQLIO tool - http://www.microsoft.com/downloads/details.aspx?familyid=9a8b005b-84e4-4f24-8d65-cb53442d9e19&displaylang=en)

 

8.     If you suspects your I/O subsystem performs suboptimal, you can make sure by monitoring this performance monitor counter below :
a.     Average read bytes/sec, average write bytes/sec b.    Reads/sec, writes/sec c.     Disk read bytes/sec, disk write bytes/sec d.    Average disk sec/read, average disk sec/write e.     Average disk queue length
Share this post: | | | |
Posted: Feb 24 2008, 01:35 AM by Kasim.Wirama | with no comments
Filed under:
SQL Server 2008 February 2008 CTP is released!

SQL Server 2008 February 2008 CTP is Released!

 

SQL Server 2008 February 2008 CTP is ready to be downloaded, check this link http://blogs.msdn.com/euanga/archive/2008/02/19/sql-server-2008-feb-ctp-get-it-while-it-is-hot.aspx , follow the link to get the installer.

 

Try it, and get ready for launching this new technology and polish your database skill to job market. Smile

Share this post: | | | |
Trigger is Database Evil? Not Really..

Trigger is Database Evil? Not Really..

by : Kasim Wirama, MCDBA

Some people said that trigger should be avoided to whatsoever because it will slow down your database performance. Actually, I tell you that trigger has so many benefits, if you have good understanding when and how to use them properly, trigger will pay off its potential problem that might arise. I agree but don't agree to their opinion as well, it depends on data and query environment. After I know the internal engine works to handle trigger, I love trigger, and consider it as the feasible component solution beside stored procedure and functions. So before I agree and don't agree to unpleasant opinion about trigger, I would tell you what is how the internal SQL Server engine dealing with user defined trigger.

 What is the difference in internal engine between SQL 2005 and previous version? SQL 2005 handle trigger especially after trigger in tempdb database rather in transaction log. From I/O characteristics, it improves performance because tempdb database give random access rather than sequential access that transaction log provides. You can enhance tempdb performance by adding number of data files that equals to number of processor, in this case dual processor not hyperthreading processor.

 

Is there any overhead that moves trigger load from transaction log to tempdb database? Yes, there is overhead, your tempdb database most likely get bigger, because there is architectural changes in tempdb on SQL Server 2005. Tempdb consists of 3 parts. They are user object store, version store object, internal object. Trigger in SQL 2005 uses version store object in tempdb and row versioning is handled in version store object, so even though you do not use row versioning in snapshot isolation level feature of SQL 2005, yet you still use row version in trigger.

 

Another thing is that I find about trigger is about pseudo table deleted and inserted, those tables actually don’t have indexes on it, if those tables joins to other materialized tables, and the materialized table don’t have indexes on join predicate, it would degrade performance. It is worse if there are a bunch of records with pattern (INSERT SELECT ... UNION ....UNION.....so forth) get executed. It means that your trigger handles multiple records in one operation. The performance is good when there are a few records, but not for a bunch of records. So limit the number of records that your trigger should handle. There is no certain threshold of number of records, it depends the complexity of the trigger.

 

Make sure you use your trigger only for validating data, not for implement modification operation inside after trigger. If you need modification automation, it is better to implement with instead of trigger. Validation data that couldn’t be handled by check constraint, can be handled by after trigger, for example validating data against multirow, or you need to implement Referential Integrity across database.

 

Hope this information will give you enough grasp what happened under the hood when a trigger is executed and what situations when trigger is considered useful.
Share this post: | | | |
Posted: Feb 21 2008, 11:32 PM by Kasim.Wirama | with no comments
Filed under:
SPARSE data type in SQL Server 2008

Sparse data type in SQL Server 2008

By : Kasim Wirama, MCDBA

 

This article will introduce another new data type in SQL Server 2008, SPARSE type. In my sight, this sparse type is similar to numeric type with NULL. I am surprised that a conditional index could be created upon the sparse column. A conditional index? It looks weird? How come an index could be conditional? And how the index is useful in some query context? Let’s take a look a bit further by looking at the following example.

 

CREATE TABLE myTable (col1 INT SPARSE);

 

CREATE INDEX mySparseIndex on myTable (col1) WHERE col1 = 1;

 

INSERT INTO myTable VALUES (1);

INSERT INTO myTable VALUES (2);

INSERT INTO myTable VALUES (NULL);

 

SELECT * FROM myTable WHERE col1 = 1; -- the index is used, non 1 value will not use this index.

 

This index will be usefull when col1 is dominated by other value, and 1 value is a minority, so it will give a hand for optimizer to generate a good plan.

 

What happened in database engine level? I couldn’t answer your deep question for this time, another SQL Server 2008 documentation will give me more insight about internal work in core database engine.

Share this post: | | | |
Recover Damaged Database From Data File

Recover Damaged Database From Data File

By : Kasim Wirama, MCDBA

 

This article will show you how to recover database if your backup is not considered valid by SQL Server engine by saying that your database is not clearly shutdown.

 

Typical action you need to do is to restore from full backup and subsequent transaction log backup. Usually this approach works well until you meet some condition that you cannot use your transaction log backup.

 

If you cannot restore transaction log backup, your last chance is recover directly from data file (mdf file), but it doesn’t guarantee database consistency, but still you get your latest data instead of your recover from last full backup.

 

This is unconventional approach, and it should be used as last effort if your database cannot be recovered with typical restore.

 

First you create database with same size, logical and physical name of data file. Next step is that you shutdown SQL Server, override the new data file with the old data file (valid one) and start the SQL Server.

 

After step two is done, your database will be in suspect mode, turn your database mode into emergency and single user only with ALTER DATABASE command.

 

Run DBCC CHECKDB ( your database name, REPAIR_ALLOW_DATA_LOSS) command. After that, bring online your database, and set your database mode to multi user with ALTER DATABASE command.

 

Now your database has been recovered, don’t forget to do full backup soon after your database is online.

Share this post: | | | |
Assigning Sequence Number from SSIS

Assigning Sequence Number from SSIS
(SQL Server Integration Service)

By : Kasim Wirama, MCDBA

 

This article will show you how to assign sequence number from SSIS when the destination table has integer column intended for sequencing, but not set it to identity column’s attribute for some reasons. I assume you have good basic understanding, and could create ad hoc SSIS package in Business Intelligence Development Studio (BIDS) and you know BIDS IDE such as Control Flow designer and Data Flow designer, creating variables, and so on.

 

Let’s say I create source data in form of query like this below :

SELECT ‘mydata’ AS col0, ‘xx’ AS col1, 0 AS Id

UNION ALL

SELECT ‘mydata1’, ‘xx1’, 0

UNION ALL

SELECT ‘mydata2’, ‘xx2’, 0

UNION ALL

SELECT ‘mydata3’, ‘xx3’, 0

UNION ALL

SELECT ‘mydata4’, ‘xx4’, 0

UNION ALL

SELECT ‘mydata5’, ‘xx5’, 0

UNION ALL

SELECT ‘mydata6’, ‘xx6’, 0

 

From the query above, I need to change value of Id column from 0 to sequence number with 1,2,3,…so on same results if you assign identity attribute on a column.

For demonstration purpose, I just create Recordset Destination type for the final destination table. That’s the situation above I describe, now I tell you how to achieve it by SSIS.

 

Create new package in BIDS, and create new variable (let’s say v variable at package level with object type). Create data flow task, and inside the data flow task, put above query into OLEDB Source and connect it to Recordset destination type. You assign variable v into VariableName property of the recordset destination component. Don’t forget to check all column (at least one column) at Input Columns tab.

 

Now back to Control Flow designer, drag Script Task component, and connect it with previous component (Data Flow task). The Script Task component has ReadWriteVariables with v variable, then add the script to update Id column incrementally, one caveat here, you cannot convert ComObject type to DataSet type, instead you instantiate OleDbDataAdapter object and call its Fill method to fill records to new DataTable object.

 

Until this step, the job is 80% job done, now you need to add one more DataFlow Task into Control Flow designer. I would like to display updated value in the second one. Inside second data flow task, drag Script Component, and set it to source component.

 

Double click Script component, add output columns to col0, col1 and id respectively, and adjust the DataType property for each of them. Click Script section, and set ReadOnlyVariables to v variable. And, inside the script, you iterate for each item inside the v variable, and call AddRow method of Output0Buffer, and assign columns of Output0Buffer to each columns of current iteration item of the v variable.

 

Next you add RecordSetDestination and connect arrow from the script component to RecordsetDestination component, and add data viewer in the arrow so you can observe the Id changes before they flow into RecordsetDestination.

 

Share this post: | | | |
FileStream Data in SQL Server 2008

FileStream Data in SQL Server 2008

By : Kasim Wirama, MCDBA

 

Upcoming SQL Server 2008 (Katmai) introduces Filestream data. With this filestream, you can save file, video or any other file that may have size exceeding 2 GB. Under the hood, they are not really saved in database, but they are still in files, protected by GUID-name folder. This takes advantage of database transaction consistency, while provide native fast access to file through operating system API.

 

To enable and use these feature you need to make it enable first in master database, then you create database filegroup that contains FILESTREAM, and specify logical and physical name. Physical name here just specify folder where the file will be saved and streamed for later I/O operation. Then you create the table that contains varbinary(max) type with additional FILESTREAM clause, and add one unique identifier column with rowguidcol clause, make sure you give the default value to the uniqueidentifier by NEWID() function. Here is the example :

 

USE master;

GO

 

EXEC sp_filestream_configure @enable_level = 3;

 

Here enable level parameter has value 0,1,2, and 3.

Value 0 means the filestream feature is turned off.

Value 1 means handle only TSQL.

Value 2 handles TSQL and local file only.

Value 3 handles TSQL, local file and remote file operation.

 

Next the create database with FILESTREAM filegroup below :

 

CREATE DATABASE myDB ON PRIMARY

(

NAME = myDB_data,

FILENAME = ‘D:\myDB_data.mdf’,

SIZE = 100MB,

MAXSIZE = UNLIMITED,

FILEGROWTH = 20%

),

FILEGROUP filestream_fg CONTAINS FILESTREAM

(

NAME = myDB_fs,

FILENAME = ‘D:\fileData’

) LOG ON

(

NAME = myDB_log,

FILENAME = ‘D:\myDB_log.ldf’,

SIZE= 10MB,

MAXSIZE=UNLIMITED,

FILEGROWTH = 15%

);

GO

 

Notice that the phrase that I make it bold above. Other thing that you will have default file and folder get created in D:\filedata folder, those file will manage interaction between operating system level and database level.

 

Then, you can create the table that contains the filestream as below :

 

USE myDB;

GO

 

CREATE TABLE movie

(

MovieRowID UNIQUEIDENTIFIER ROWGUIDCOL NOT NULL PRIMARY KEY DEFAULT (NEWID()),

MovieName NVARCHAR(100) NOT NULL,

MovieFile VARBINARY(MAX) FILESTREAM

);

GO

Share this post: | | | |
Posted: Feb 11 2008, 09:57 PM by Kasim.Wirama | with no comments
Filed under:
SQL Server 2008 Reporting Services New Feature

SQL Server 2008 Reporting Services New Feature

By : Kasim Wirama, MCDBA

 

Another area for enhancement in SQL Server 2008 is at reporting services. On November CTP, I try another new feature of matrix component called tablix feature. At glance, tablix display is a bit similar to matrix display when previewed. But in tablix you can place 2 or more grouping equal to one another.

 

I will show you how to use it.

 

TBA

Share this post: | | | |
Posted: Feb 10 2008, 10:44 PM by Kasim.Wirama | with no comments
Filed under:
Declarative Management Framework in SQL Server 2008 (Katmai)

Declarative Management Framework in SQL Server 2008 (Katmai)

By : Kasim Wirama, MCDBA

 

SQL Server 2008 (Katmai) introduces DMF (Declarative Management Framework). It is framework to make your SQL Server database objects (from table until server level) complies with set of definition that you specify in DMF.

 

DMF can be used to make sure your SQL Server environment comply and secure all the time. Another use is for auditing purpose. For example, you define that no CLR enabled is enabled by default, the DMF can scan the server and check whether your SQL Server is CLR enabled or not, you can have DMF to do it manually or by schedule.

 

In DMF, you need to define condition and attach the condition to Policy. Then you enable the Policy to apply.

 

You can access DMF in Katmai, by expanding Management node, there is folder name called Policy, expand it and there is Facet node. Inside the facet node, you define DMF condition by right clicking on some item under Facet folder and choose New Condition. After you specify condition for the Facet’s item, right click again and choose New Policy on the Facet’s item, inside the new policy dialog form, you need to enable the policy that associates with the condition.

 

I think DMF is useful tool to make sure your SQL Server is always ready to auditing purpose and another tool to secure your SQL Server environment.

Share this post: | | | |
Posted: Feb 10 2008, 10:34 PM by Kasim.Wirama | with no comments
Filed under:
Passing Table Value Parameter into SQL Server 2008 (Katmai) from .NET Application

Passing Table Value Parameter into SQL Server 2008 (Katmai) from .NET Application

By : Kasim Wirama, MCDBA

 

In last article, I show use of Table Value parameter in SQL Server 2008. This article will discuss how you use this functionality from .NET application.

 

Table value parameter functionality is supported by ADO.NET from .NET Framework 3.5. All you need to do is :

  1. declare ADO.NET datatable variable.
  2. add column definition from the datatable variable.
  3. declare sqlcommand variable and instantiate it to object, set CommandType to StoredProcedure.
  4. call Parameters collection from the sqlcommand variable and call AddWithValue method where input parameter is the datatable variable.
 

Not only datatable type but you can also use DbDataReader or IList<SqlDataRecord>, or you can derive it from SqlDbType.Structured.

 

Remember that it may change before SQL Server 2008 (Katmai) is formally launched.

Share this post: | | | |
Posted: Feb 10 2008, 10:00 PM by Kasim.Wirama | with no comments
Filed under:
Table Value Parameter in SQL Server 2008

Table Value Parameter in Katmai (SQL Server 2008)

By : Kasim Wirama, MCDBA

Katmai (Code name for SQL Server 2008) introduces some new features in TSQL. One of them is Table Value parameter, it means that you can put table type parameter in stored procedure or function). Benefit of table value parameters inherits from table type variable, such as no need to request lock, no recompilation because no statistics maintained. But the most interesting point is that the feature reduces round trip between client and server side because you can embed several rows inside table variable before it is passed to stored procedure.

You need to create table value type (new feature in Katmai), before you can use it with table variable or input parameter. And you need to add READONLY modifier to the table value type parameter. This is missing functionality that you cannot modify value inside table value parameter. Let’s see the sample below.

Create table dbo.Orders

(

OrderID int not null,

Orderdate datetime not null,

Customerid int not null

);

GO

 

Create procedure InsertOrder

(

    @OrderID  INT,

    @Orderdate INT,

    @Customerid INT,

)

AS

Begin

                INSERT into dbo.Orders (OrderID, Orderdate, Customerid) VALUES

                (@OrderID , @Orderdate, @Customerid);

End;

GO

 

EXEC InsertOrder  1,’20 Jan 2008’,1000;

EXEC InsertOrder  2,’21 Jan 2008’,2000;

EXEC InsertOrder  3,’22 Jan 2008’,3000;

This example above is to add new orders, there are 3 statement to call stored procedure execution, it means there is 3 times round trip between client side and server side. With table value parameter as example below shown, it shows that you just need only 1 time only execution for 3 new orders.

CREATE TYPE OrderType AS TABLE

(

OrderID  INT,

OrderDate DATETIME,

CustomerID INT

);

GO

Create procedure InsertOrderWithTableValueParameter

(

    @NewOrder OrderType READONLY

)

AS

Begin

                INSERT into dbo.Orders (OrderID, Orderdate, Customerid)

                  SELECT OrderID, Orderdate, CustomerID FROM @ NewOrder;

End;

GO

 

Now you declare table variable and execute the stored procedure above:

DECLARE @NewOrder OrderType;

Insert into @NewOrder values (1,’20 Jan 2008’,1000);

Insert into @NewOrder values (2,’21 Jan 2008’,2000);

Insert into @NewOrder values (3,’22 Jan 2008’,3000);

 

EXEC InsertOrderWithTableValueParameter @ NewOrder;

GO

 

Example above may not work when SQL Server 2008 is released in a couple of weeks between end of February or early weeks of March 2008. There is always possible that the above feature gets changed, anyway it is better for you to get familiar so you are ready to use the feature when the product formally launches.

 Table value parameter feature adds more TSQL arsenals in Katmai. An interesting stuff!
Share this post: | | | |
Posted: Feb 09 2008, 11:42 AM by Kasim.Wirama | with no comments
Filed under:
Database Structure Implementation Best Practice

Database Structure Implementation Best Practice

By : Kasim Wirama, MCDBA

 

This article, I would let you know best practices in implementing database structure from logical database modelling. It applies to SQL Server and probably to any other database platform as well. The important thing that you should do when implementing database structure is that you should maintain consistency throughout database objects.

 

Here are best practice points

  1. Make sure your database is in at least third normal form, so it can adapt easily to business change and keep your data consistent and efficiently manipulated.
 
  1. When you have subtypes as I describe in previous article, be cautiously whether you want to separate subtype entities into tables or you just roll it up. Some pro and cons when you separate into tables, the pro is that user will easily identifies the difference among subtypes and there is more flexible for extensibility in future time, and the cons is that more tables that developer should involve more tables.
 
  1. Implement your database naming convention consistently.
    You can implement singular or plural form of table/column name. It is preferable to have singular form on column name and table name as well.
 
  1. Document your template domain to ensure domain characteristics are consistent to corresponding columns throughout tables.
 
  1. Choose data type and nullability suitable for your columns.
    Improper choose of either data type or nullability will result to precision and performance problem.
 
  1. Make sure, if your table has identity column as surrogate key, you have unique column for the table.
 
  1. Avoid data type alias, because it is difficult to be recognized and will prone to data access error.
 
  1. Use CLR data type (CLR feature provided since SQL Server 2005) when it is really necessary to implement otherwise it is better to stick with native data type in SQL Server because it is faster and more standard to other database platform.
 
  1. Foreign key is best constraint in performance terms and data integrity area. Try to include them in your tables.
 
  1. Document your database not only in external document but also in the tables/schema/column metadata itself, because it will be useful when it is read by DBA in QA environment and production environment.
Share this post: | | | |
More Posts Next page »