FACTS &
DIMENSIONS
“Data
warehouses are built using dimensional data models which consist of Fact and
Dimension tables”.
In simpler words,
Facts and Dimensions are the two major categories into which we divide our entities
(tables) in a DWH.
The architecture
and efficiency of the DWH depends on how we identify/categorize/design the
Facts and Dimensions.
Before getting
into any further technical explanation of what facts and dimensions actually
are, let us look at the following statement:
“It
is
10122 runs!!!”
Though we know that 10122 runs is something related to cricket, we have no clue what the above statement represents.
However, there is a whole new meaning when we say that “10122 are the total runs scored by Sunil Gavaskar in Test cricket”.
Though we know that 10122 runs is something related to cricket, we have no clue what the above statement represents.
However, there is a whole new meaning when we say that “10122 are the total runs scored by Sunil Gavaskar in Test cricket”.
Let’s relate this: “10122 runs” is
a raw FACT, and it has no meaning until we mention the DIMENSIONS
which are the Player-Name (Gavaskar) and the Format (Test cricket).
Here is another
example that we encounter in an enterprise:
“Quickie-Mart
registered Total Sales of 2.2 million on 12th-Apr-2011”.
Based on the
earlier scenario, we can make out that, here - “Total-Sales” is a Fact whereas
“Quickie-Mart” (Organization Dimension) and
“12th-Apr-2011” (Date
dimension) are the dimensions.
We can segregate
the above information by adding further dimensions.
Example:
· Quickie-Mart registered Total-Sales of
$1.3 million on Product A on 12th-Apr-2011 --- (adding a
product Dimension)
· Quickie-Mart registered Total-Sales of
$0.5 million of Product A in the Asia-Pacific region on 12th-Apr-2011 --- (adding a
Sales-Geography Dimension)
After the above
examples we can deduce the following statements:
Dimension Tables provide a
structured information to quantify (give a
meaning to) those facts.
Addition Info:
· Fact tables form the Center of Data
Warehouse surrounded by Dimensions. Each fact table would have a
Foreign-key that would have a corresponding Primary key in the dimension
tables.
· Fact table keeps
on growing in size whereas the growth of Dimensions is generally slower (There are number of
Sales transactions in a day but cannot imagine product category to grow at that
pace).
· Since the fact
tables keep on growing in size very rapidly, it is desirable that the data
redundancy is reduced to the max, hence the fact tables are normalized to the
maximum extent, and however it may not be the same with Dimension tables.
* In the coming posts we shall look into other aspects of DWH with more examples which shall
serve in getting more insights into Facts and Dimensions.