Data Warehouse Concepts
Contents Data
& Information Introduction to Data warehouse (DWH) Characteristics of DWH Operational System Vs DWH DWH Architectures Data Marts Metadata
Data & Information
A fundamental concept of data warehouse is the distinction between data and information. Data is composed of observable and recordable facts that are often found in operational or transactional systems. In a data warehouse environment, data only comes to have value to end-users when it is organized and presented as information. Information is an integrated collection of facts and is used as the basis for decision making.
Introduction to Data Warehouse Definitions: "A data warehouse is a subject oriented, integrated, time-variant, non volatile collection of data in support of management's decision making process". A data warehouse is a relational database that is designed for query and analysis rather than for transaction processing. A Data Warehouse is a structured repository (Subject Oriented) of Historic Data.
Data warehouses separate analysis part from transactional part and enables the organization to collect data from several sources.
Characteristics of Data Warehouse Data Warehouse is usually: Subject Oriented Integrated Non-Volatile Time-Variant Accessible & Process Oriented
Subject Oriented Sales
DWH
Marketing
Finance
Information Information is is presented presented according according to to specific specific subjects subjects or or areas areas of of interest. interest. Data Data is is manipulated manipulated to to provide provide information information about about aa particular particular subject. subject.
Integrated Operational Systems
Appln A – m/f Appln B – 1/0 Appln C – Male/Female Appln A – Bal_On_Hand Appln B – Current_Balance Appln C – Cash_On_Hand
DWH m/f
Current_Balance
Though the data in the data warehouses is scattered around different tables, databases or even servers but the data is integrated consistently in the values of variables, naming conventions and physical data definitions (datatype).
Time-Variant Operational Systems
View of Business Today
DWH
Designated Time Frame (3 – 10 years). DWH stores historical data.
Contains Contains aa history history of of the the subject, subject, as as well well as as current current information. information. Historical Historical information information is is an an important important component component of of aa data data warehouse. warehouse.
Non-Volatile Operational Systems
DWH
Insert Create Read
Update
Read Delete
Read Load
Read
Read Read Read Only
Stable Stable information information that that doesn’t doesn’t change change each each time time an an operational operational process process is is executed. executed. Information Information is is consistent consistent regardless regardless of of when when the the warehouse warehouse is is accessed. accessed. There There exist exist only only two two operations operations –– time time based based loading loading of of data, data, accessing accessing the the loaded loaded data. data.
Accessible & Process Oriented Accessible:
The primary purpose of a data warehouse is to provide readily accessible information to end-users. Process-Oriented: It is important to view data warehousing as a process for delivery of information.
Operational System Vs Data Warehouse Operational System
Data Warehouse
Characteristics
Data Focused, Transaction Processing focused system.
Subject Oriented, Integrated, Non-Volatile, Time-Variant.
Age of the data
Current, Near-term (Today, Last week).
Historic (Last month, Quarterly, Five years).
Primary Use
Day-to-day decisions, Current operational results.
Long-term decisions, Reporting, Trend detection.
Frequency of load
Twice daily, Daily, Weekly.
Weekly, Monthly, Quarterly.
DWH Architectures Data
Warehouse Architecture (Basic) Data Warehouse Architecture (with a Staging Area) Data Warehouse Architecture (with a Staging Area and Data Marts)
DWH Architectures (contd..) Operational Systems
Data Warehouse Data Extraction
Operational System
Data Storing
Users Data Access
Meta Data Data Transformation
Analysis
DWH Reporting
Data Loading Legacy Systems
Data Warehouse Architecture (Basic)
Mining
DWH Architectures (contd..) Operational Systems
Data Warehouse Data Storing
Data Extraction Operational System
Staging Area Data Transformation
Users Data Access
Meta Data
Analysis
DWH Reporting
Data Loading Legacy Systems
Data Warehouse Architecture (with a Staging Area)
Mining
DWH Architectures (contd..) Operational Systems
Data Warehouse Data Storing
Data Extraction Operational System Data Transformation
Staging Area
Meta Data
Users
Data Marts
Data Access Analysis
Sales
DWH Marketing
Reporting
Data Loading Legacy Systems
Finance Mining
Data Warehouse Architecture (with a Staging Area and Data Marts)
Data Marts
Data Marts: Data mart is a subset of DWH. A data mart is a specialized version of a DWH. A data mart configuration emphasizes easy access to relevant information.
DWH
Data Marts
Data Marts (contd..) Dependent
data mart: Data can be derived from an enterprise-wide data warehouse. Independent data mart: Data can be collected directly from sources.
Data Marts (contd..) Reasons Eases
for creating a Data mart
access to frequently needed data Creates collective view by a group of users Improves end-user response time Ease of creation Lower cost than implementing a full Data warehouse
Metadata
Metadata: Metadata is data about data. Something can be data and metadata at the same time. It is possible to create meta-meta-...-metadata.
Metadata is used to speed up and enrich searching for resources.
E.g: Browsers automatically download and locally cache metadata, to improve the speed at which files can be accessed and searched.
Questions ?
Thank You !