Wednesday 3 September 2014

SURROGATE KEYS

In this article we will look into another concept of DWH which are:

SURROGATE KEYS

“A surrogate key is an artificial or synthetic key that is used as a substitute for a natural key”.

The surrogate key is an automatically generated sequence number which will act as primary key for a dimension and also used in referencing from fact tables.

Org_key
Org_ID
Org_Name
Head Quarters
1
101
XYZ Ltd
Sydney
2
102
PQR Ltd
Cape Town
3
103
ABC Ltd
Beijing

The question!!!! –What is use of generating a special key (surrogate) when we already have a natural key coming in from source???.

Let us find the answer to this through the following scenario:

Scenario - 1:

System1 and System2 have a products table which we integrate in our Dim_Products table in the DWH.

Both the systems have their own unique identifiers for the Products which is Product_ID column.

However, when we try to consolidate the data, we see a conflict in the Natural keys;

For e.g. Product_ID=1 represents “Rice” in the System1, but there is also a Product_ID=1 in System2 (which represents a different product).

Here is pictorial representation of the scenario (with and without surrogate keys)

*The use of Surrogate key, eliminates this confusion and gives an us a clear-cut attribute that can be used in referencing the product table.

Scenario - 2:

There are also other scenarios e.g.”98xxxxx256” is a phone number associated with person A. Later person A has changed his number and this number is allotted to person B and then person C; the surrogate key comes into the picture helping us to maintain the history of that original mapping.                     

 Some recommended characteristics of a Surrogate key in DWH:

·         System generated (not composed of several values from different domains).

·         Unique system-wide, hence never reused.

·         Should not be not editable by the user or application.