Cognos Interview questions with answers
what is the difference between star schema and snow flake schema ?and when we use those schema’s?
Star Schema means A centralized fact table and surrounded by different dimensions Snowflake means In the same star schema dimensions split into another dimensions
Star Schema means A centralized fact table and surrounded by different dimensions Snowflake means In the same star schema dimensions split into another dimensions
Star Schema contains Highly Denormalized Data
Snow flake contains Partially normalized data
Star cannot have parent table But snow flake contain parent tables. Need to go for Star: Here
Simply database
Support drilling up options
Need for Snowflake schema: Here some times we used to provide? seperate dimensions from existing dimensions that time we will go to snowflake
Dis Advantage Of snowflake: Query performance is very low because more joiners are present.
explain the flow of data starting with OLTP to OLAP including staging ,summary tables,Facts and dimensions.
OLTP(1)—->ODS(2)——>DWH(3)——–>OLAP(4)————>Reports(5)——>decision(6)
1-2 (extraction)
2-3 (Transformation and here ODS is itself staging area )
3-4-5 (Use of reporting tool and generate reports)
6-decision is taken i,e purpose of data ware house is served.
1-2 (extraction)
2-3 (Transformation and here ODS is itself staging area )
3-4-5 (Use of reporting tool and generate reports)
6-decision is taken i,e purpose of data ware house is served.
What is the role of surrogate keys in data warehouse and how will u generate them?
The Surrogate key role is it links the Dimension and Fact table. It avoids smart keys and Production keys.
Surrogate key is nothing but a primary key to identify records uniquely in dimension and fact tables. It is basically a running system-generated sequence number.
Surrogate key is nothing but a primary key to identify records uniquely in dimension and fact tables. It is basically a running system-generated sequence number.
What is the difference between Data modelling and Dimensional modelling?
When data modeling we are structuring and organizing data. These data structures are then typically implemented in a database management system. In addition to defining and organizing the data data modeling will impose (implicitly or explicitly) constraints or limitations on the data placed within the structure.
Managing large quantities of structured and unstructured data is a primary function of information systems. Data models describe structured data for storage in data management systems such as relational databases.
Data warehouses are typically developed using dimensional models rather than the traditional entity/relationship models associated with conventional relational databases.
Dimensional Modelling is a design concept used by many data warehouse
designers to build their datawarehouse. In this design model all the data is stored in two types of tables – Facts table and Dimension table. Fact table contains the facts/measurements of the business and the dimension table contains the context of
measurements ie the dimensions on which the facts are calculated.
Data is
modeled as a hypercube and the schema is a so-called star schema with a centralised fact table surrounded by smaller dimensional tables representing key scientific objects.
Dimensional database systems allow multidimensional data to be
modeled natively. Or they can be modeled using the star schema or snowflake schema.
Managing large quantities of structured and unstructured data is a primary function of information systems. Data models describe structured data for storage in data management systems such as relational databases.
Data warehouses are typically developed using dimensional models rather than the traditional entity/relationship models associated with conventional relational databases.
Dimensional Modelling is a design concept used by many data warehouse
designers to build their datawarehouse. In this design model all the data is stored in two types of tables – Facts table and Dimension table. Fact table contains the facts/measurements of the business and the dimension table contains the context of
measurements ie the dimensions on which the facts are calculated.
Data is
modeled as a hypercube and the schema is a so-called star schema with a centralised fact table surrounded by smaller dimensional tables representing key scientific objects.
Dimensional database systems allow multidimensional data to be
modeled natively. Or they can be modeled using the star schema or snowflake schema.
Why are OLTP database designs not generally a good idea for a Data Warehouse?
OLTP cannot store historical information about the organization. It is used for storing the details of daily transactions while a datawarehouse is a huge storage of historical information obtained from different datamarts for making intelligent decisions about the organization.
Because OLTP databases are transactional data related databases. The meaning of this is these databases are used in real time to insert update and delete data. To accomplish these tasks in real time The model used in OLTP databases is highly normalized. The problem of using this model in Data Warehousing is we have to join multiple tables to get a single piece of data. With the amount of historical data we deal with in DW it is highly desirable not to have a highly normalized data model like OLTP.
Because OLTP databases are transactional data related databases. The meaning of this is these databases are used in real time to insert update and delete data. To accomplish these tasks in real time The model used in OLTP databases is highly normalized. The problem of using this model in Data Warehousing is we have to join multiple tables to get a single piece of data. With the amount of historical data we deal with in DW it is highly desirable not to have a highly normalized data model like OLTP.
What is data mining?
Data mining is a process of extracting hidden trends within a datawarehouse. For example an insurance dataware house can be used to mine data for the most high risk people to insure in a certain geographial area..
Simple stated that data mining is the KDD Knowledge discovery in large databases. In this hidden patterns are meant to be discovered.
The typical query of data mining is “Tell me something I don’t know”. How can you tell some one the thing he doesn’t know all about, means you cannot write query to answer this question. So you try to discover some thing using intelligent techniques. As the volume of data is very large for mining so we need fast algorithms, usually run time for mining is O(n) or O(nlogn). O(n2-[n square]) is a kill.
Data mining is a concept of deriving/discovering the hidden unexpected information from the existing data.
A datawarehouse typically supplies answer to a question like ‘who is buying our products/ . A data mining approach would seek answer to questions like Who is NOT buying our products? .
Simple stated that data mining is the KDD Knowledge discovery in large databases. In this hidden patterns are meant to be discovered.
The typical query of data mining is “Tell me something I don’t know”. How can you tell some one the thing he doesn’t know all about, means you cannot write query to answer this question. So you try to discover some thing using intelligent techniques. As the volume of data is very large for mining so we need fast algorithms, usually run time for mining is O(n) or O(nlogn). O(n2-[n square]) is a kill.
Data mining is a concept of deriving/discovering the hidden unexpected information from the existing data.
A datawarehouse typically supplies answer to a question like ‘who is buying our products/ . A data mining approach would seek answer to questions like Who is NOT buying our products? .