Change Data Capture in SQL Server 2008 CTP
By : Kasim Wirama, MCDBA
To record data changes in a table, we have to make another table and manually implement trigger on source table that capture data changes into change table. This is done in SQL Server 2005 or older versions. But with SQL Server 2008, I don’t have to implement manually this mechanism because SQL Server 2008 introduces feature called CDC (change data capture) that captures data changes on table and put the changes into change table that structurally is similar to source table. I try this feature on SQL Server 2008 November 2007 CTP, probably there is some changes in February 2008 CTP. I think CDC feature is quite stable on November 2007, or there might be just some couple of CDC minor changes.
Okay let’s take a look CDC feature and play around with this feature. By default CDC is not enabled in database, you as sysadmin role only, can enable CDC feature at database level. After enable CDC at database level, database owner role can enable CDC on tables in the database.
You can enable CDC on database level by issuing TSQL command :
Exec sys.sp_cdc_enable_db_change_data_capture;
Once you enable it, you can query whether CDC is enabled or not by querying catalog view sys.databases as follows:
SELECT NAME, IS_CDC_ENABLED FROM SYS.DATABASES WHERE DATABASE_ID = DB_ID();
If CDC is enabled, the value is 1 otherwise the value is 0.
When CDC is enabled, you can see new schema and user called cdc. You can find schema and user directory under security folder.
Okay, for exploration purpose, I create one table here :
CREATE TABLE dbo.test (cola int primary key, colb varchar(10) null);
CDC requires source table , that will be tracked with CDC, must have primary key and SQL Server Agent must be in running state.
Now I enable CDC on the table by issuing TSQL command :
Exec sys.sp_cdc_enable_table_change_data_capture ‘dbo’,’test’,@role_name=null, @supports_net_changes = 1;
When the above command has been executed, you will have some tables besides dbo.Test, all the tables have cdc schema, and there is change table which corresponds to source table (dbo.test), namely cdc.dbo_test_CT (name convention is CDC.<source schema>_<source table name>_CT).
You can also get information which tables are CDC tracked and not by issuing the query to catalog view sys.tables
SELECT NAME, IS_TRACKED_BY_CDC FROM SYS.TABLES;
If Is_tracked_by_cdc is 1, it means the source table has corresponding change table, otherwise 0.
You can query directly to the change table or you can query through CDC function called cdc.fn_cdc_get_all_changes_<source schema>_<source_table name>
Here how I can query changes on particular range of LSN (log sequence number) in dbo.test table :
Declare @from_lsn binary(10);
Declare @to_lsn binary(10);
Set @from_lsn = sys.fn_get_max_lsn(‘dbo.test’);
Set @to_lsn = sys.fn_get_min_lsn();
Select * from cdc.fn_cdc_get_all_changes (@from_lsn , @to_lsn , ‘all’); --àreturn several records
You can get net changes for a period of time by issuing the query below:
Select * from cdc.fn_cdc_get_net_changes(@from_lsn , @to_lsn , ‘all’); -- àreturn 1 record only.
Both cdc.fn_cdc_get_all_changes and cdc.fn_cdc_get_net_changes CDC table value function returns changed record with __$operation column, the value of __$operation could be 1 for delete, 2 for insert, 3 for old value before update, 4 for new value after update.
You can also manually clean up records in change table through CDC stored procedure by issuing this query script :
Declare @endtime datetime;
Set @endtime = getdate();
Declare @lsn binary(10);
Set @lsn = sys.fn_cdc_map_time_to_lsn(‘largest less than or equal’,@endtime);
Exec sys.sp_cdc_cleanup_change_table ‘dbo_test’, @lsn;
The script above will delete all record in change table called dbo_test (i.e. cdc.dbo_test_CT).
You can disable CDC feature on particular table (for example : dbo.test table) by issuing this command :
Exec sys.sp_cdc_disable_table_change_data_capture ‘dbo’,’test’,’dbo_test’;
and if you want to disable CDC feature on database level, you issue this query :
exec sys.sp_cdc_disable_db_change_data_capture;
I think CDC is very relevant to database auditing without compromising database performance, because CDC operates in asynchronous manner and it is separate process from main user DML activity.