Summary:
Differences between operational Database and Data Warehouse
Differences between operational Database and Data Warehouse
Operational Database
1. It is designed to support operational monitoring.
2. Data is volatile
3. It has current data
4. Detailed data
5. Normalization
6. Designed to support Entity Relational Modelling
7. More number of Joins
8. Few Indexes
9. Designed for running the business
Data warehouse
1. It is designed to support decision making process
2. Data is non-volatile
3. It has historical data
4. Summarized data
5. De-normalization
6. Designed to support dimensional modelling
7. Few joins
8. More Indexes
9. Designed for analyzing.
Data Extraction:
It is a process of reading the data from various types of sources such as relational sources ERP sources, Flat File sources, XML File sources, mainframe sources etc.,
Data Transformation:
It is a process of cleansing data and transforming the data into a required business format.
The following data transforming activities takes place in staging area
i) Data merging
ii) Data cleansing
iii) Data scrubbing
iv) Data aggregation
Data Loading:
It is the process of inserting the data into the target system. There are two types of data loads.
1.Initial or Full load:
It is the a process of loading all the required data at very fast load
2.Incremental or Delta load
It is the process of loading only new records after initial load
Data warehouse Approaches
There are two types of approaches
i) Top-Down approach (W.H Inmon)
ii) Bottom-Up approach (ralph Kimball)
Meta data
1. It describes what, when, who, where, how of the data warehouse
2. Meta data is more comprehensive and transcends the data
3. Meta data provides the format and name of data items
4. It provides the context in which the data element exists
5. It provides information such as the domain of possible values
6. The relation that data element has to others
7. The data's business rules and origin of the data
Data Mart
1. A Data Mart is a subject oriented database which supports the business needs of individual departments in the Enterprise
2. Data Mart are known as high performance query structures
3. There are two types of Data mart
i) Dependent Data mart (Inmon)
ii) Independent Data mart (Kimball)
4. Data Mart are logical which never contains data
5. Data Mart contains only views.
Data warehouse Life Cycle:
1. Project Preparation
2. Rapid/Blueprint
3. Rapid/Production implementation
4. Final Preparation
5. Acceptance Test
6. Project Review
Data warehouse Database Design:
A Data warehouse is designed with the following types of schemas:
i) Star Schema
a. A star schema is a logical database design which contains centrally located fact table which is surrounded by dimension tables.
b. Since the database design looks like a star hence it is called as a star schema database design
ii) Snow Flake Schema
In a Snow Flake Schema de-normalized dimensional table is split into one or more tables which results in normalization of dimensions
-- Disadvantage: Query performance hampers because of number of joins
-- Advantage: Table space can be minimized
iii) Integrated Schema or galaxy Schema or Constellation Schema
It is the process of joining two fact tables
--> Fact Table
i) A fact table contains composite key where each candidate key is a foreign key to the dimension table
ii) A fact table contains "facts". Facts are numeric.
iii) Not every numeric is a fact but numeric which are of type "key performance indicators" are known as facts.
iv) A fact contains at the lowest level of granularity.
--> Fact less fact table
A fact less fact table without any facts is known as fact less fact table
--> Dimension Table
i) A dimension is a descriptive data about the major aspects of business
ii) The dimensions are used to describe key performance indicator known as facts
iii) A dimension table contains dimension which are de-normalized
--> Junk Dimensions
A dimension with a type of text description, Boolean and flags are known as Junk Dimensions
--> Dirty Dimensions
A dimension table if record exits more than once with difference in non-key attributes is known as dirty dimension
--> Slowly Changing Dimensions (SCD)
SCD captures the changes which takes place over the period of time. There are three types of SCD's
i) Type 1 SCD: A type 1 SCD keeps only the current values and does not maintains history.
ii) Type 2 SCD: A type 2 SCD maintains the full history in the target. For each update it inserts a new record in the target
iii) Type 3 SCD: A type 3 SCD maintains current and previous information (partial history)
--> Dimensional Modeling
Dimensional Modeling is a methodology or approach used for designing the star schema. A dimensional modeling contains the following phases in designing the star schema
i) Conceptual Modeling
ii) Logical Modeling
iii) Physical Modeling
--> Online Analytical Processing (OLAP)
It is a set of specifications which allows the client applications in retrieving the data from data warehouse. The different types of OLAP's are:
i) DOLAP (Desktop OLAP)
ii) ROLAP (Realtional OLAP)
The information that is stored in the data warehouse is held in a relational structure. Aggregations are performed on the fly either by the database or in the analysis tool
iii) MOLAP (Multi dimentional OLAP)
This information is aggregated in a predefined manner based on the characteristics of the measures and the defined hierarchy of the dimensions. Since the data is pre-aggregated, navigating through the hierarchies is instantaneous. The user is simply navigating to a point within the multidimensional cube and not performing any on the fly aggregations
iv) HOLAP (Hybrid OLAP)
This is a combination of MOLAP and ROLAP. A portion of the data is predefined and aggregated. This would typically be the set of information that is accessed most frequently. Additional detail can held in a ROLAP structure and allow a user to drill through the MOLAP structure into the ROLAP structure
Client Perspective MOLAP HOLAP ROLAP
Query performance Fastest Faster Fast
Storage Consumption High Medium Low
1. It describes what, when, who, where, how of the data warehouse
2. Meta data is more comprehensive and transcends the data
3. Meta data provides the format and name of data items
4. It provides the context in which the data element exists
5. It provides information such as the domain of possible values
6. The relation that data element has to others
7. The data's business rules and origin of the data
Data Mart
1. A Data Mart is a subject oriented database which supports the business needs of individual departments in the Enterprise
2. Data Mart are known as high performance query structures
3. There are two types of Data mart
i) Dependent Data mart (Inmon)
ii) Independent Data mart (Kimball)
4. Data Mart are logical which never contains data
5. Data Mart contains only views.
Data warehouse Life Cycle:
1. Project Preparation
2. Rapid/Blueprint
3. Rapid/Production implementation
4. Final Preparation
5. Acceptance Test
6. Project Review
Data warehouse Database Design:
A Data warehouse is designed with the following types of schemas:
i) Star Schema
a. A star schema is a logical database design which contains centrally located fact table which is surrounded by dimension tables.
b. Since the database design looks like a star hence it is called as a star schema database design
ii) Snow Flake Schema
In a Snow Flake Schema de-normalized dimensional table is split into one or more tables which results in normalization of dimensions
-- Disadvantage: Query performance hampers because of number of joins
-- Advantage: Table space can be minimized
iii) Integrated Schema or galaxy Schema or Constellation Schema
It is the process of joining two fact tables
--> Fact Table
i) A fact table contains composite key where each candidate key is a foreign key to the dimension table
ii) A fact table contains "facts". Facts are numeric.
iii) Not every numeric is a fact but numeric which are of type "key performance indicators" are known as facts.
iv) A fact contains at the lowest level of granularity.
--> Fact less fact table
A fact less fact table without any facts is known as fact less fact table
--> Dimension Table
i) A dimension is a descriptive data about the major aspects of business
ii) The dimensions are used to describe key performance indicator known as facts
iii) A dimension table contains dimension which are de-normalized
--> Junk Dimensions
A dimension with a type of text description, Boolean and flags are known as Junk Dimensions
--> Dirty Dimensions
A dimension table if record exits more than once with difference in non-key attributes is known as dirty dimension
--> Slowly Changing Dimensions (SCD)
SCD captures the changes which takes place over the period of time. There are three types of SCD's
i) Type 1 SCD: A type 1 SCD keeps only the current values and does not maintains history.
ii) Type 2 SCD: A type 2 SCD maintains the full history in the target. For each update it inserts a new record in the target
iii) Type 3 SCD: A type 3 SCD maintains current and previous information (partial history)
--> Dimensional Modeling
Dimensional Modeling is a methodology or approach used for designing the star schema. A dimensional modeling contains the following phases in designing the star schema
i) Conceptual Modeling
ii) Logical Modeling
iii) Physical Modeling
--> Online Analytical Processing (OLAP)
It is a set of specifications which allows the client applications in retrieving the data from data warehouse. The different types of OLAP's are:
i) DOLAP (Desktop OLAP)
ii) ROLAP (Realtional OLAP)
The information that is stored in the data warehouse is held in a relational structure. Aggregations are performed on the fly either by the database or in the analysis tool
iii) MOLAP (Multi dimentional OLAP)
This information is aggregated in a predefined manner based on the characteristics of the measures and the defined hierarchy of the dimensions. Since the data is pre-aggregated, navigating through the hierarchies is instantaneous. The user is simply navigating to a point within the multidimensional cube and not performing any on the fly aggregations
iv) HOLAP (Hybrid OLAP)
This is a combination of MOLAP and ROLAP. A portion of the data is predefined and aggregated. This would typically be the set of information that is accessed most frequently. Additional detail can held in a ROLAP structure and allow a user to drill through the MOLAP structure into the ROLAP structure
Client Perspective MOLAP HOLAP ROLAP
Query performance Fastest Faster Fast
Storage Consumption High Medium Low
No comments:
Post a Comment