Sunday 31 August 2014

What are facts and dimensions in a data warehouse

FACTS & DIMENSIONS

Data warehouses are built using dimensional data models which consist of Fact and Dimension tables”.

In simpler words, Facts and Dimensions are the two major categories into which we divide our entities (tables) in a DWH.

The architecture and efficiency of the DWH depends on how we identify/categorize/design the Facts and Dimensions.

Before getting into any further technical explanation of what facts and dimensions actually are, let us look at the following statement:

“It is 10122 runs!!!

Though we know that 10122 runs is something related to cricket, we have no clue what the above statement represents.


However, there is a whole new meaning when we say that “
10122 are the total runs scored by Sunil Gavaskar in Test cricket”.

Let’s relate this: “10122 runs” is a raw FACT, and it has no meaning until we mention the DIMENSIONS which are the Player-Name (Gavaskar) and the Format (Test cricket).

 Here is another example that we encounter in an enterprise:
 “Quickie-Mart registered Total Sales of 2.2 million on 12th-Apr-2011”.

Based on the earlier scenario, we can make out that, here - “Total-Sales” is a Fact whereas “Quickie-Mart” (Organization Dimension) and “12th-Apr-2011” (Date dimension) are the dimensions.

We can segregate the above information by adding further dimensions.

Example:

·  Quickie-Mart registered Total-Sales of $1.3 million on Product A on 12th-Apr-2011 --- (adding a product Dimension)
·  Quickie-Mart registered Total-Sales of $0.5 million of Product A in the Asia-Pacific region on 12th-Apr-2011 --- (adding a Sales-Geography Dimension)
 
After the above examples we can deduce the following statements:

Fact Tables consist of the measurements, metrics or facts of a business process.
Dimension Tables provide a structured information to quantify (give a meaning to) those facts.

Addition Info:

·   Fact tables form the Center of Data Warehouse surrounded by Dimensions. Each fact table would have a Foreign-key that would have a corresponding Primary key in the dimension tables.

·    Fact table keeps on growing in size whereas the growth of Dimensions is generally slower (There are number of Sales transactions in a day but cannot imagine product category to grow at that pace).

·    Since the fact tables keep on growing in size very rapidly, it is desirable that the data redundancy is reduced to the max, hence the fact tables are normalized to the maximum extent, and however it may not be the same with Dimension tables.

* In the coming posts we shall look into other aspects of DWH with more examples which shall serve in getting more insights into Facts and Dimensions.