Datawarehousing

  • Uploaded by: Swathi Velisetty
  • 0
  • 0
  • November 2019
  • PDF

This document was uploaded by user and they confirmed that they have the permission to share it. If you are author or own the copyright of this book, please report to us by using this DMCA report form. Report DMCA


Overview

Download & View Datawarehousing as PDF for free.

More details

  • Words: 2,160
  • Pages: 57
DATA WAREHOUSING

Agenda  Introduction  Process  DSS

 Information  Dimensions

processing

 OLAP

 Architecture  Best

Practise  Case

Types

Shilpa Surve

Introduction

Definition  Data Warehouse is

• • • •

a

Subject-Oriented Integrated Time-Variant Non-volatile

12/04/08

4

What are Data Warehouses? Data warehouses store large volumes of data which are frequently used by DSS  It is maintained separately from the organization’s operational databases  Data warehouses are relatively static with only infrequent updates  A data warehouse is a stand-alone repository of information, integrated from several, possibly heterogeneous operational databases 

12/04/08

5

Steps in Building a Warehouse  Identify

key business drivers, sponsorship, risks, ROI  Survey information needs and identify desired functionality and define functional requirements for initial subject area.  Architect long-term, data warehousing architecture  Evaluate and Finalize DW tool & technology  Conduct Proof-of-Concept 12/04/08

6

Steps in building Data Warehouse  Design target data base schema  Build data mapping, extract, transformation,

cleansing and aggregation/summarization rules  Build initial data mart, using exact subset of enterprise data warehousing architecture and expand to enterprise architecture over subsequent phases  Maintain and administer data warehouse 12/04/08

7

The Three Views of Data Warehousing 

Strategic or Business view • Define key business drivers of data warehouse • How can business-driven approach achieve high ROI?



Architectural or Technology view

• Alternative data warehousing architectures • How can the right architecture achieve a high ROI?



Methodology or Implementation view

• Development and implementation methodology • How can the right methodology achieve a rapid ROI? 12/04/08

8

Swathi Velisetty

Process

DW Components Metadata Layer Extraction FS1 FS2

. . .

Transmission N

FSn

E T W O R K

Data Mart Population

Cleansing

S T A G I N G A R E A

Aggregation Summarization Transformation

ODS

DM1

DW

DM2

DMn

OLAP ANALYSIS

Legacy System 12/04/08

10

Cleansing process Process Metadata Raw data (Staging Area)

Cleansing Process

Good Control Metadata

•Clean the Raw Data •Mark it Good/Bad •Generate the cleansing Reports and mail to the DWA and Feed System representatives 12/04/08

Clean data

Bad Cleansing Reports

11

Transformation Process

Clean Operationa l Data

Process Metadata •Mapping Detail •Transformation Rule

Operational Data Store

Transformatio n Process Control Metadata

•Transform the cleaned Operational Data into DSS Data •Load the DSS data into ODS •ODS contains the current DSS data at the lowest level of granularity 12/04/08

12

Summarization Process DW

ODS Summarizatio n Process

Weekly

Monthly

Yearly

Control Metadata •Summarize and aggregate ODS data and Populate to the Warehouse • Periodicity of Summarization Process depends upon the level of summarization at Warehouse ( weekly, monthly, daily ) 12/04/08

13

Enterprise Data Warehouse Legacy

Select

Extract

Clien t/ Serve

Transform

OLTP

Integrate

External Operational Systems Enterprise wide 12/04/08

Metadata Repositor y DATA WAREHOUSE

A P I

U S E R S

Maintain Data Preparation

14

Distributed Data Marts Legacy

Select

Extract

Clien t/ Serve

Transform

OLTP

Integrate

External Operational Systems Data 12/04/08

Data Mart

Maintain Data Preparation

Data Mart

A P I

U S E R S

Data Mart

15

Multi-tiered Data Warehouse Legac y

Clien t/ Serve

Select

Extract

Transform O LTP

Extern al

Integrate Maintain

Operational Systems Enterprise wide 12/04/08

Data Mart

Metadat a Reposito

DATA WAREHOUSE

Data Mart

A P I

U S E R S

Data Mart

16

Example Monthly sales by region for 1991Weekly sales by region for

Monthly Sales by Product for 1991-94

Weekly sales by product/subproduct for 1991-94 Sales Detail for 1991-

Metadata

Sales Detail for 1985-90 12/04/08

17

Atul zade

Decision support system

What is DSS? Decision Support Systems (DSS) are interactive computer-

based systems intended to help decision makers utilize data and models to identify and solve problems and make decisions. Data Warehouse is the foundation of DSS process. It is a Strategy and a Process for Staging Corporate Data. ➭ Enable users to get a “Business View” of the data ➭ Facilitate Data based Decision Making that would drive and improve the Business ➭ Discover “Hidden Trends”

12/04/08

19

Driving Forces for DSS ➭ Changes in the Business Environment Business Speed

N O I T I ET

Customer s

RESULT:

P M O C 12/04/08

Refor m

Technology

20

How to answer these Business Queries? What is the sales distribution region wise?

How did my revenue improve in the past 5 years?

What are the slow movers in my product line?

Which channel costs me more and pays less?

Which of my Sales Agents are doing better?

Strategic Planning / Budgeting Currency Risk, Interest Rate Risk, Liquidity Risk

What is Defaulter’s Profile? Who are my profitable customers? 12/04/08

21

OLTP v/s DSS Environment  OLTP

Environment

• get data IN • large volumes of simple transaction queries • continuous data changes • low processing time • mode of processing • transaction details • data inconsistency • mostly current data • high concurrent usage • highly normalized data structure • static applications • automates routines



DSS Environment

• get information OUT • small number of diverse queries • periodic updates only • high processing time • mode of discovery • subject oriented - summaries • data consistency • historical data is relevant • low concurrent usage • fewer tables, but more columns per table • dynamic applications • facilitates creativity

Benefits for Business User • • • • • •

Flexible Information Access High Availability Ease of Use Quality & Completeness of Data Focus on Information Processing Information Base for Knowledge Discovery

12/04/08

23

Classification of Business Users Executives/Managers

• •

Multi-dimensional analysis, reporting tools

Knowledge Worker

• •

Ad hoc queries, detail & summary data, application focus

Power-Analyst

• •

Ad hoc queries, Data Analysis & Data Mining

Customer Contacts

• • 12/04/08

Detail Data at specific levels 24

Prem Sequera

Information processing

Data Processing to Information Processing Business Objectives & Goals Application Domains and Business Functions B U SI N E S S T

T R A C E

Query Processin g Report Generatio

R A

E L E M E N T S

Operationa l Data Store (ODS)

C E

OLAP/ Query Tools

Enterprise Data Warehouse

OLAP Appl.

Data Mart A

Appl. Spec. Analysis

Data Mart B

Appl. Spec. Analysis

Data Mart N

Appl. Spec. Analysis

D A T A E L E M E N T S Feed Systems and External Sources

KNOWLEDGE DISCOVERY Data Mining Applications KNOWLEDGE MANAGEMENT

Management Decision: Value Chain

Data Processing

Heterogeneous Data Sources Information Processing Knowledge Processing 12/04/08

26

Subject Oriented Analysis Process Oriented

Subject Oriented

Entry Sales Rep Quantity Sold Part Number Date Customer Name Product Description Unit Price Mail Address Transactional Storage 12/04/08

Sales

Customers Products

Data Warehouse Storage 27

Integration of Data Appl. A - M, F Appl. B - 1, 0 Appl. C - X, Y

M, F

Unit of Attributes

Appl. A - pipeline cm. Appl. B - pipeline inches Appl. C - pipeline mcf

pipeline cm

Physical Attributes

Appl. A - balance dec(13,2) Appl. B - balance PIC 9(9)V99 Appl. C - balance float Appl. A - bal-on-hand

Naming Appl. B - current_balance ConventionsAppl. C - balance Data Consistenc y

Appl. A - date (Julian) Appl. B - date (yymmdd) Appl. C - date (absolute)

Transactional Storage 12/04/08

Integration

Encodin g

balance dec(13, 2)

balance

date (Julian)

Data Warehouse Storage 28

Volatility of Data Volatil e Inse rt

NonVolatile Chan ge

Acces s

Delet e Chan ge

Inse rt

Acces s Record-by-Record Data Manipulation

Transactional Storage 12/04/08

L oad

Mass Load / Access of Data

Data Warehouse Storage 29

Time Variant Data Analysis Current Data

Transactional Storage 12/04/08

Historical Data

Data Warehouse Storage 30

Kairav Parikh

Dimension

What is a Dimension? Data Warehouse is •Subject-Oriented • •Integrated •Time-Variant •Non-volatile collection of data in support of management’s decision.

Subject

12/04/08

Dimension

32

Dimensional Hierarchy

Re

Europe

Asia

nt

America

Country Level

Pa

re

Continent Level

la tio

n

World Level

Geography Dimension World

USA

State Level

FL

City Level Attributes:

Miami

Canada GA

Population, Tourist’s Place

12/04/08

VA

Tamp a

CA Orland o

Argenti na WA Dimension Naple s Member /

Business Entity 33

Types of Dimensions • Simple Dimensions (e.g. Time) • Related Dimensions (e.g. Gender of a Customer) • Spool Dimensions (e.g. Account as an interaction between Customer and Product) • Bucket Dimensions (e.g. Income Ranges of a Customer) • Slowly Changing Dimensions (e.g. changes in Organization) • Fast Varying Dimensions (e.g. changes Retail Customers attributes) • Unused Dimensions (e.g. Order No., Invoice No.)

12/04/08

34

Dimensional Modeling STEP 1 •Identify Subjects (Dimensions) •Identify Hierarchies of a Dimension •Identify Attributes of levels in Hierarchies •Define Grain

Industry Segment Industry Type

Countr y State City

Fin. Class

Customer

12/04/08

35

Dimensional Modeling STEP 2 •Use KPIs to identify the Facts •Group the Facts in a logical set

12/04/08

Financial Transactions

Non-Financial Transactions

Trans. Amount

No. of Cheques Cleared

No. of Bonds

No. of Visits to a Branch

No. of Transactions Service Cost ...

No. of DEMAT Transactions

...

36

Dimensional Modeling STEP 3 •Link the Group of Facts to the Dimensions that participate in the Facts

Customer Time

Product Financial Transactions

Organization

Channel

12/04/08

37

Dimensional Modeling STEP 4 •Define Granularity for each Group of Facts

Customer (Customer) Time (Day-Hour)

Product (Scheme) Financial Transactions

Organization (Branch)

Channel (Channel) 12/04/08

38

Data Warehouse Schemas Star Schema •A Group of Facts connected to Multiple Dimensions

Channel

Financial Transactions

Time

Customer

12/04/08

Organization

Product

39

Data Warehouse Schemas Snow-flake Schema (= Extended Star Schema) •A Group of Facts connected to Dimensions, which are split across multiple hierarchies and attributes

Time

Channel

Product Financial Transactions

Organization

Customer Segment 12/04/08

Geograp hy 40

Data Warehouse Schemas Galaxy Schema •Multiple Groups of Facts links by few common dimensions

Dimension

Dimension Fact1

Dimension

Dimension Fact2

Dimension 12/04/08

Dimension

Fact3

Dimension 41

Akshay Shiveshwarkar

OLAP

On-Line Analytical Processing  OLAP can

be defined as a technology which allows the users to view the aggregate data across measurements (like Maturity Amount, Interest Rate etc.) along with a set of related parameters called dimensions (like Product, Organization, Customer, etc.)

12/04/08

43

What is MDDB? A multidimensional database is a computer software system designed to allow for efficient and convenient storage and retrieval of data that is • •

12/04/08

intimately related and stored, viewed and analyzed from different perspectives (Dimensions).

44

RDBMS v/s MDDB MODEL

COLOR SALES VOL.

MINI VAN BLUE 6 MINI VAN RED 5 MINI VAN WHITE 4 SPORTS COUPE BLUE 3 SPORTS COUPE RED 5 SPORTS COUPE WHITE 5 SEDAN BLUE 4 SEDAN RED 3 SEDAN WHITE 2

9 x 3 = 27 cells

12/04/08

3 x 3 = 9 cells

45

Benefits of MDDB over RDBMS 

Ease of Data Presentation & Navigation Intuitive, Spreadsheet / Crosstab like data views



Storage Space

Very low Space Consumption compared to Relational DB



Performance

Gives much better performance. Relational DB may give comparable results only through database tuning (indexing, keys etc), which may not be possible for ad-hoc queries.



Ease of Maintenance

No overhead as data is stored in the same way it is viewed. In Relational DB, indexes, sophisticated joins etc. are used which require considerable storage and maintenance 12/04/08

46

Issues with MDDB •





Sparsity

– Controlled Sparsity – Random Sparsity

Data Explosion

– Due to Sparsity – Due to Summarization

Performance

– Doesn’t perform better than RDBMS at high data volumes (>20-30 GB)

12/04/08

47

OLAP Features Subject oriented approach to Decision Support Calculations applied across dimensions, through hierarchies and/or across members  Trend analysis over sequential time periods,  What-if scenarios.  Slicing / Dicing subsets for on-screen viewing  Rotation to new dimensional comparisons in the viewing area  Drill-down/up along the hierarchy  Reach-through / Drill-through to underlying detail data  

12/04/08

48

Features of OLAP - Drill Down / Up Sales at region/District/Dealership Level



Moving Up and moving down in a hierarchy is referred to as “drill-up” / “roll-up” and “drill-down” 12/04/08

49

Ritesh Raushan

Architecture Types

Implementation Techniques OLAP Architectures • MOLAP - Multidimensional OLAP • Multidimensional Databases for database and application logic layer

• ROLAP - Relational OLAP

• Access Data stored in relational Data Warehouse for OLAP Analysis. • Database and Application logic provided as separate layers

• HOLAP - Hybrid OLAP

• OLAP Server routes queries first to MDDB, then to RDBMS and result processed on-the-fly in Server

• DOLAP - Desk OLAP

• Personal MDDB Server and application on the desktop

12/04/08

51

MOLAP - MDDB storage Web Browser

OLAP

Cube OLAP Calculation Engine

OLAP Tools

OLAP Applications

12/04/08

52

ROLAP - Standard SQL storage Relational DW

MDDB - Relational Mapping

Web Browser

SQL

OLAP Calculation Engine

OLAP Tools

OLAP Applications 12/04/08

53

HOLAP - Combination of RDBMS and MDDB OLAP Cube

Relational DW

Any Client

Web Browser

SQL

OLAP Calculation Engine

OLAP Tools

OLAP Applications 12/04/08

54

Architecture Comparison

12/04/08

55

Kiran Naik

Case

Thank you

Related Documents


More Documents from ""

Planning
November 2019 42
Dr, Reddy - Betapharm
November 2019 26
Tata Sky Marketing Plan
November 2019 18
Merger & Acquisition
November 2019 27
Novartis Analysis
November 2019 13
Financial Services - M&m
November 2019 13