4 Data Warehousing Data Mining

  • Uploaded by: Pranav Jain
  • 0
  • 0
  • October 2019
  • PDF

This document was uploaded by user and they confirmed that they have the permission to share it. If you are author or own the copyright of this book, please report to us by using this DMCA report form. Report DMCA


Overview

Download & View 4 Data Warehousing Data Mining as PDF for free.

More details

  • Words: 1,912
  • Pages: 25
Data Warehousing Data Mining

Course: Basics of Management Information Systems BBA Symbiosis Centre for Management Studies Noida Dr. Tarun Kumar Singhal

1

Preface of Data Warehousing Many organizations have amassed vast amounts of data that employees use to unlock valuable secrets to enable the organization to compete successfully. Some organizations do this extremely well, but others are quite ineffective. To use analytic tools to improve organizational decisionmaking, a foundational data architecture and enterprise architecture must be in place to facilitate effective decision analysis. Dr. Tarun Kumar Singhal

2

Preface of Data Warehousing Enabling decision analysis through access to all relevant information is known as business intelligence. Business intelligence includes data warehousing, online analytical processing, data mining, and visualization and multidimensionality.

Dr. Tarun Kumar Singhal

3

Introduction of Data Warehousing Data warehousing is the process of constructing and using a data warehouse. A data warehouse is constructed by integrating data from multiple heterogeneous sources that support analytical reporting, structured and/or ad hoc queries, and decision making. Data warehousing involves data cleaning, data integration, and data consolidations. Dr. Tarun Kumar Singhal

4

Characteristics of Data Warehousing Subject-oriented Integrated Time-variant (time series) Non-volatile Summarized Not normalized Sources Metadata

Dr. Tarun Kumar Singhal

5

Functions of Data Warehousing Data Extraction − Involves gathering data from multiple heterogeneous sources. Data Cleaning − Involves finding and correcting the errors in data. Data Transformation − Involves converting the data from legacy format to warehouse format. Data Loading − Involves sorting, summarizing, consolidating, checking integrity, and building indices and partitions. Refreshing − Involves updating from data sources to warehouse. Dr. Tarun Kumar Singhal

6

Types of Data Warehousing 1. Enterprise Data Warehouse: Enterprise Data Warehouse is a centralized warehouse. It provides decision support service across the enterprise. It offers a unified approach for organizing and representing data. 2. Operational Data Store: Operational Data Store (ODS) are used when Data warehouse cannot support organizations reporting needs. In ODS, Data warehouse is refreshed in real time. Hence, it is widely preferred for routine activities like storing records of the Employees. 3. Data Mart: A data mart is a subset of the data warehouse. It specially designed for a particular line of business, such as sales, finance, sales or finance. In an independent data mart, data can collect directly from sources. Dr. Tarun Kumar Singhal

7

Components of Data Warehousing Load manager: Load manager is also called the front component. It performs with all the operations associated with the extraction and load of data into the warehouse. These operations include transformations to prepare the data for entering into the Data warehouse. Warehouse Manager: Warehouse manager performs operations associated with the management of the data in the warehouse. It performs operations like analysis of data to ensure consistency, creation of indexes and views, generation of denormalization and aggregations, transformation and merging of source data and archiving and baking-up data.

Dr. Tarun Kumar Singhal

8

Components of Data Warehousing Query Manager: Query manager is also known as backend component. It performs all the operation operations related to the management of user queries. The operations of this Data warehouse components are direct queries to the appropriate tables for scheduling the execution of queries. End-user access tools: This is categorized into five different groups like 1. Data Reporting 2. Query Tools 3. Application development tools 4. EIS tools, 5. OLAP tools and data mining tools.

Dr. Tarun Kumar Singhal

9

Online Analytical Processing (OLAP) OLAP is a category of software that allows users to analyze information from multiple database systems at the same time. OLAP is a powerful technology for data discovery, including capabilities for limitless report viewing, complex analytical calculations, and predictive “what if” scenario (budget, forecast) planning. OLAP performs multidimensional analysis of business data and provides the capability for complex calculations, trend analysis, and sophisticated data modeling. It is the foundation for many kinds of business applications for Business Performance Management, Planning, Budgeting, Forecasting, Financial Reporting, Analysis, Simulation Models, Knowledge Discovery, and Data Warehouse Reporting. OLAP enables end-users to perform ad hoc analysis of data in multiple dimensions, thereby providing the insight and understanding they need for better decision making. Dr. Tarun Kumar Singhal

10

Advantages of OLAP The more data a company can access about a specific activity, the more likely that the plan to improve that activity will be effective. All businesses collect data using many different systems, and the challenge remains: how to get all the data together to create accurate, reliable, fast information about the business. A company that can take advantage and turn it into shared knowledge, accurately and quickly, will surely be better positioned to make successful business decisions and rise above the competition. OLAP technology has been defined as the ability to achieve “fast access to shared multidimensional information.” Given OLAP technology’s ability to create very fast aggregations and calculations of underlying data sets, one can understand its usefulness in helping business leaders make better, quicker “informed” decisions.

Dr. Tarun Kumar Singhal

11

Basic Operations of OLAP Four types of analytical operations in OLAP are: Roll-up Drill-down Slice and dice Pivot (rotate)

Dr. Tarun Kumar Singhal

12

Roll-up Roll-up is also known as "consolidation" or "aggregation." The Roll-up operation can be performed in 2 ways Reducing dimensions Climbing up concept hierarchy. Concept hierarchy is a system of grouping things based on their order or level. Example In this example, cities New Jersey and Los Angeles are rolled up into country USA The sales figure of New Jersey and Los Angeles are 440 and 1560 respectively.They become 2000 after roll-up In this aggregation process, data is location hierarchy moves up from city to the country. In the roll-up process at least one or more dimensions need to be removed. Dr. Tarun Kumar Singhal

13

Drill-down In drill-down data is fragmented into smaller parts. It is the opposite of the rollup process. It can be done via Moving down the concept hierarchy Increasing a dimension Example Quarter Q1 is drilled down to months January, February, and March. Corresponding sales are also divided. In this example, dimension “months” is added.

Dr. Tarun Kumar Singhal

14

Slice Here, one dimension is selected, and a new sub-cube is created. Following example explains how slice operation performed: Dimension Time is Sliced with Q1 as the filter. A new cube is created altogether. Dice This operation is similar to a slice. The difference in dice is you select two or more dimensions that results in the creation of a sub-cube.

Dr. Tarun Kumar Singhal

15

Pivot In Pivot, you rotate the data axes to provide a substitute presentation of data.

Dr. Tarun Kumar Singhal

16

Data Mining Data mining is the process of sorting through large data sets to identify patterns and establish relationships to solve problems through data analysis. Data mining tools allow enterprises to predict future trends. In data mining, association rules are created by analyzing data for frequent if/then patterns, then using the support and confidence criteria to locate the most important relationships within the data. Support is how frequently the items appear in the database, while confidence is the number of times if/then statements are accurate.

Dr. Tarun Kumar Singhal

17

Data Mining Other data mining parameters include Sequence or Path Analysis, Classification, Clustering and Forecasting. Sequence or Path Analysis parameters look for patterns where one event leads to another later event. A Sequence is an ordered list of sets of items, and it is a common type of data structure found in many databases. A Classification parameter looks for new patterns, and might result in a change in the way the data is organized. Classification algorithms predict variables based on other factors within the database.

Dr. Tarun Kumar Singhal

18

Data Mining Tools and Techniques Data mining techniques are used in many research areas, including mathematics, cybernetics, genetics and marketing. While data mining techniques are a means to drive efficiencies and predict customer behavior, if used correctly, a business can set itself apart from its competition through the use of predictive analysis. Web mining, a type of data mining used in customer relationship management, integrates information gathered by traditional data mining methods and techniques over the web. Web mining aims to understand customer behavior and to evaluate how effective a particular website is. Dr. Tarun Kumar Singhal

19

Data Mining Tools and Techniques Data mining techniques are used in many research areas, including mathematics, cybernetics, genetics and marketing. While data mining techniques are a means to drive efficiencies and predict customer behavior, if used correctly, a business can set itself apart from its competition through the use of predictive analysis. Web mining, a type of data mining used in customer relationship management, integrates information gathered by traditional data mining methods and techniques over the web. Web mining aims to understand customer behavior and to evaluate how effective a particular website is. Dr. Tarun Kumar Singhal

20

Data Mining Tools and Techniques Other data mining techniques include network approaches based on multitask learning for classifying patterns, ensuring parallel and scalable execution of data mining algorithms, the mining of large databases, the handling of relational and complex data types, and machine learning. Machine learning is a type of data mining tool that designs specific algorithms from which to learn and predict.

Dr. Tarun Kumar Singhal

21

Benefits of Data Mining In general, the benefits of data mining come from the ability to uncover hidden patterns and relationships in data that can be used to make predictions that impact businesses. Specific data mining benefits vary depending on the goal and the industry. Sales and marketing departments can mine customer data to improve lead conversion rates or to create one-to-one marketing campaigns. Data mining information on historical sales patterns and customer behaviors can be used to build prediction models for future sales, new products and services. Companies in the financial industry use data mining tools to build risk models and detect fraud. The manufacturing industry uses data mining tools to improve product safety, identify quality issues, manage the supply chain and improve operations. Dr. Tarun Kumar Singhal

22

OLTP OLTP (online transaction processing) is a class of software programs capable of supporting transaction-oriented applications on the Internet. Typically, OLTP systems are used for order entry, financial transactions, customer relationship management (CRM) and retail sales. Such systems have a large number of users who conduct short transactions. Database queries are usually simple, require subsecond response times and return relatively few records. An important attribute of an OLTP system is its ability to maintain concurrency. To avoid single points of failure, OLTP systems are often decentralized. Dr. Tarun Kumar Singhal

23

OLTP vs OLAP We can divide IT systems into transactional (OLTP) and analytical (OLAP). OLTP systems provide source data to data warehouses, whereas OLAP systems help to analyze it.

Dr. Tarun Kumar Singhal

24

OLTP vs OLAP OLTP (On-line Transaction Processing) is characterized by a large number of short on-line transactions (INSERT, UPDATE, DELETE). The main emphasis for OLTP systems is put on very fast query processing, maintaining data integrity in multi-access environments and an effectiveness measured by number of transactions per second. In OLTP database there is detailed and current data. OLAP (On-line Analytical Processing) is characterized by relatively low volume of transactions. Queries are often very complex and involve aggregations. For OLAP systems a response time is an effectiveness measure. OLAP applications are widely used by Data Mining techniques. In OLAP database there is aggregated and historical data. Dr. Tarun Kumar Singhal

25

Related Documents


More Documents from "SRINIVASA RAO GANTA"