Microsoft Techies: Slowly Changing Dimensions (SCD) in Dimensional Modeling

Dimension is a term in data management and data warehousing that refers to logical groupings of data such as geographical location, customer information, or product information. Slowly Changing Dimensions (SCDs) are dimensions that have data that changes slowly, rather than changing on a time-based, regular schedule.

In a transaction system, many a times the change is overwritten and track of change is lost. However, a data warehouse needs to maintain all the history as the key benefit of a warehouse is to provide historical information to analyze the trend.

Below SCD Types are the most implemented methods to handle these changing dimensions in a warehouse. Let’s understand these with an example below

Example:

You have a dimensional table with customer ID ' C01' with marital status as 'single' mentioned below. Overtime, customer gets married and also moves to a new location.

Let’s see how this scenario is managed with different SCD types.

Initial Data Record:

Surrogate Key (Surrogate Key)	Customer ID (Natural Key)	Date Valid	Marital Status	Date of Birth	City
100	C01	Jan 23, 2008	Single	Jan8, 1982	Palo Alto

SCD Type1:

The Type 1 methodology overwrites old data with new data, and therefore does not track historical data at all. This is obviously done, when we are not analyzing the historical information.

Surrogate Key	Customer ID	Date Valid	Marital Status	Date of Birth	City
100	C01	July 7,2012	Married	Jan8, 1982	Francisco

The record is simple over-written and no history is maintained here.

SCD Type 2:

The Type 2 method tracks historical data by creating multiple records for a given natural key in the dimensional tables with separate surrogate keys and/or different version numbers. With Type 2, we have unlimited history preservation as a new record is inserted each time a change is made.

This is implemented using a version column or through effective date columns to know the active record.

Implemented through a Version Column

Surrogate Key	Customer ID	Date Valid	Marital Status	Date of Birth	City	Version
100	C01	Sept 23, 2004	Single	Jan8, 1982	Palo Alto	0
101	C01	Sept 23, 2004	Married	Jan8, 1982	Francisco	1

Implemented through Effective Start and End Date Columns

Surrogate Key	Customer ID	Date Valid	Marital Status	Date of Birth	City	Start-Date	End-Date
100	C01	Sept 23, 2004	Single	Jan8, 1982	Palo Alto	01-Sep-2000	23-Sep-2004
101	C01	Sept 23, 2004	Married	Jan8, 1982	Francisco	24-Sep-2004	31-12-9999

A new record is added every time there is a change in the source with the version or Effective Date columns updating accordingly.

SCD Type 3:

The Type 3 method tracks changes using separate columns. Whereas Type 2 had unlimited history preservation, Type 3 has limited history preservation, as it's limited to the number of columns designated for storing historical data and will have only the recent historical change. Where the original table structure in Type 1 and Type 2 was very similar, Type 3 adds additional columns to the tables

Implemented through additional Original Columns

Surrogate Key	Customer ID	Date Valid	Original Marital Status	Marital Status	Date of Birth	Original City	City
100	C01	Sept 23, 2004	Single	Married	Jan8, 1982	Palo Alto	Francisco

Original Columns have been added to capture the most recent historical change.

Additional SCDs which are occasionally used:

SCD Type 0:

· The Type 0 method is a passive approach to managing dimension value changes, in which no action is taken.

· Values remain as they were at the time of the dimension record was first entered.

SCD Type 4

· The Type 4 method is usually referred to as using "history tables", where one table keeps the current data,

· An additional table is used to keep a record of some or all changes.

SCD Type 6 / hybrid:

· The Type 6 method combines the approaches of types 1, 2 and 3.

· This method is also called as “Unpredictable Changes with Single-Version Overlay" in The Data Warehouse

Now that we have a good learning experience on slowly changing dimensions (SCD) and how to design tables as per the requirement to maintain history or not for the important business columns. We now move on to a new feature Introduced in SQL server 2005 and is being used extensively for large volume tables from SQL server 2005 to SQL server 2008/2012.

Wednesday, 17 September 2014

Slowly Changing Dimensions (SCD) in Dimensional Modeling