Data Warehousing

  • Uploaded by: venkataswamynath channa
  • 0
  • 0
  • April 2020
  • PDF

This document was uploaded by user and they confirmed that they have the permission to share it. If you are author or own the copyright of this book, please report to us by using this DMCA report form. Report DMCA


Overview

Download & View Data Warehousing as PDF for free.

More details

  • Words: 982
  • Pages: 19
Data warehousing and mining Session VII (Part 1) 15:45 - 16:10 Sunita Sarawagi School of IT, IIT Bombay

Introduction • Organizations getting larger and amassing ever increasing amounts of data • Historic data encodes useful information about working of an organization. • However, data scattered across multiple sources, in multiple formats. • Data warehousing: process of consolidating data in a centralized location • Data mining: process of analyzing data to find useful patterns and relationships Dr. Sunita Sarawagi

Data Warehousing & Mining

2

Typical data analysis tasks • Report the per-capita deposits broken down by region and profession. • Are deposits from rural coastal areas increasing over last five years? • What percent of small business loans were cleared? • Why is it less than last year’s? How did similar businesses that did not take loans perform? • What should be the new rules for loan eligibility? Dr. Sunita Sarawagi

Data Warehousing & Mining

3

Decision support tools Direct Query

Reporting tools Crystal reports

Merge Clean Summarize Detailed transactional data

OLAP Essbase

Intelligent Miner

Relational DBMS+ e.g. Redbrick

Data warehouse

GIS data

Operational data

Bombay branch Delhi branch Oracle Dr. Sunita Sarawagi

Mining tools

Calcutta branch IMS

Data Warehousing & Mining

Census data SAS 4

Data warehouse construction • Heterogeneous data integration – merge from various sources, fuzzy matches – remove inconsistencies

• Data cleaning: – missing data, outliers, clean fields e.g. names/addresses – Data mining techniques

• Data loading: summarize, create indices • Products: Prism warehouse manager, Platinum info refiner, info pump, QDB, Vality Dr. Sunita Sarawagi

Data Warehousing & Mining

5

Warehouse maintenance • Data refresh – when to refresh, what form to send updates?

• Materialized view maintenance with batch updates. • Query evaluation using materialized views • Monitoring and reporting tools – HP intelligent warehouse advisor Dr. Sunita Sarawagi

Data Warehousing & Mining

6

Decision support tools Direct Query

Reporting tools Crystal reports

Merge Clean Summarize Detailed transactional data

OLAP Essbase

Intelligent Miner

Relational DBMS+ e.g. Redbrick

Data warehouse

GIS data

Operational data

Bombay branch Delhi branch Oracle Dr. Sunita Sarawagi

Mining tools

Calcutta branch IMS

Data Warehousing & Mining

Census data SAS 7

OLAP Fast, interactive answers to large aggregate queries. • Multidimensional model: dimensions with hierarchies – Dim 1: Bank location: • branch-->city-->state

– Dim 2: Customer: • sub profession --> profession

– Dim 3: Time: • month --> quarter --> year

• Measures: loan amount, #transactions, balance Dr. Sunita Sarawagi

Data Warehousing & Mining

8

OLAP • Navigational operators: Pivot, drill-down, roll-up, select. • Hypothesis driven search: E.g. factors affecting defaulters – view defaulting rate on age aggregated over other dimensions – for particular age segment detail along profession

• Need interactive response to aggregate queries.. Dr. Sunita Sarawagi

Data Warehousing & Mining

9

OLAP products • About 30 OLAP vendors • Dominant ones: – Oracle Express: largest market share: 20% – Arbor Essbase: technology leader – Microsoft Plato: introduced late last year, rapidly taking over...

Dr. Sunita Sarawagi

Data Warehousing & Mining

10

Microsoft OLAP strategy • Plato: OLAP server: powerful, integrating various operational sources • OLE-DB for OLAP: emerging industry standard based on MDX --> extension of SQL for OLAP • Pivot-table services: integrate with Office 2000 – Every desktop will have OLAP capability.

• Client side caching and calculations • Partitioned and virtual cube • Hybrid relational and multidimensional storage Dr. Sunita Sarawagi

Data Warehousing & Mining

11

Data mining • Process of semi-automatically analyzing large databases to find interesting and useful patterns • Overlaps with machine learning, statistics, artificial intelligence and databases but – more scalable in number of features and instances – more automated to handle heterogeneous data Dr. Sunita Sarawagi

Data Warehousing & Mining

12

Some basic operations • Predictive: – Regression – Classification

• Descriptive: – Clustering / similarity matching – Association rules and variants – Deviation detection Dr. Sunita Sarawagi

Data Warehousing & Mining

13

Classification • Given old data about customers and payments, predict new applicant’s loan eligibility. Previous customers Age Salary Profession Location Customer type

Classifier

Decision rules Salary > 5 L Prof. = Exec

Good/ bad

New applicant’s data Dr. Sunita Sarawagi

Data Warehousing & Mining

14

Classification methods • Nearest neighbor • Regression: (linear or any polynomial) – a*salary + b*age + c = eligibility score.

• Decision tree classifier • Probabilistic/generative models • Neural networks Dr. Sunita Sarawagi

Data Warehousing & Mining

15

Clustering • Unsupervised learning when old data with class labels not available e.g. when introducing a new product. • Group/cluster existing customers based on time series of payment history such that similar customers in same cluster. • Key requirement: Need a good measure of similarity between instances. • Identify micro-markets and develop policies for each Dr. Sunita Sarawagi

Data Warehousing & Mining

16

Association rules • Given set T of groups of items • Example: set of item sets purchased • Goal: find all rules on itemsets of the form a-->b such that

T Milk, cereal Tea, milk Tea, rice, bread

– support of a and b > user threshold s – conditional probability (confidence) of b given a > user threshold c

• Example: Milk --> bread • Purchase of product A --> service B Dr. Sunita Sarawagi

Data Warehousing & Mining

cereal

17

Mining market • Around 20 to 30 mining tool vendors • Major players: – – – –

Clementine, IBM’s Intelligent Miner, SGI’s MineSet, SAS’s Enterprise Miner.

• All pretty much the same set of tools • Many embedded products: fraud detection, electronic commerce applications Dr. Sunita Sarawagi

Data Warehousing & Mining

18

Conclusions • The value of warehousing and mining in effective decision making based on concrete evidence from old data • Challenges of heterogeneity and scale in warehouse construction and maintenance • Grades of data analysis tools: straight querying, reporting tools, multidimensional analysis and mining. Dr. Sunita Sarawagi

Data Warehousing & Mining

19

Related Documents

Data Warehousing
April 2020 35
Data Warehousing
October 2019 40
Data Warehousing
June 2020 23
Data Warehousing
June 2020 24
Data Warehousing
June 2020 33
Data Warehousing
June 2020 17

More Documents from ""

Slim 21
June 2020 14
Spss
April 2020 41