1| Page
• Index page page 2 • Introduction to Data warehousing page3 •
•
•
•
•
•
•
Graphical representation of data warehousing page 4 History of Data warehousing page5 Uses and advantages of data warehousing page5 Disadvantages of data warehousing page6 Meta data page6 Advantages of meta data page6 Data mining page7
2| Page
•
•
•
Function of Data mining page7 Graphical representation of data mining page8
Advantages of data mining page9
•
•
Disadvantages of data mining page9 Complete process of warehousing, meta data, data mining page10
DATA WAREHOUSE What is data warehouse? Data warehouses are computer based information systems that are home for "secondhand" data that originated from either another application or from an external system or source.
3| Page
Fig 1 Data warehousing analysis
HISTORY TO the late 1980s, IBM researchers Barry Devlin and Paul Murphy developed the "business data warehouse". The data warehousing concept was intended to provide an architectural model for the flow of data from operational systems to decision support environments. The concept attempted to address the various problems associated with this flow - mainly, the high costs associated with it. In the absence of a data warehousing architecture, an enormous amount of redundancy was required to support multiple decision support environments. In larger corporations it was typical for multiple decision support environments to operate independently.
4| Page
As a result, separate computer databases began to be built that were specifically designed to support management information and analysis purposes. These data warehouses were able to bring in data from a range of different data sources, such as mainframe computers, minicomputers, as well as personal computers and office automation software such as spreadsheet, and integrate this information in a single place. This capability, coupled with userfriendly reporting tools and freedom from operational impacts, has led to a growth of this type of computer system. Data warehouses often hold large amounts of information which are sometimes subdivided into smaller logical units called dependent data marts. Dependent Data marts allow for easier reporting by keeping relevant data together in one location. As technology improved (lower cost for more performance) and user requirements increased (faster data load cycle times and more features), data warehouses have evolved through several fundamental stages: •
Offline Operational Databases - Data warehouses in this initial stage are developed by simply copying the database of an operational system to an off-line server where the processing load of reporting does not impact on the operational system's performance.
•
Offline Data Warehouse - Data warehouses in this stage of evolution are updated on a regular time cycle (usually daily, weekly or monthly) from the operational systems and the data is stored in an integrated reporting-oriented data structure.
•
Real Time Data Warehouse - Data warehouses at this stage are updated on a transaction or event basis, every time an operational system performs a transaction (e.g. an order or a delivery or a booking etc.)
•
Integrated Data Warehouse - Data warehouses at this stage are used to generate activity or transactions that are passed back into the operational systems for use in the daily activity of the organization.
USES OF DATA WAREHOUSE: •
A data warehouse provides a common data model for all data of interest regardless of the data's source. This makes it easier to report and analyze information than it would be if multiple data models were used to retrieve information such as sales invoices, order receipts, general ledger charges, etc.
•
Prior to loading data into the data warehouse, inconsistencies are identified and resolved. This greatly simplifies reporting and analysis.
•
Information in the data warehouse is under the control of data warehouse users so that, even if the source system data is purged over time, the information in the warehouse can be stored safely for extended periods of time.
5| Page
•
Because they are separate from operational systems, data warehouses provide retrieval of data without slowing down operational systems.
•
Data warehouses can work in conjunction with and, hence, enhance the value of operational business applications, notably customer relationship management (CRM) systems.
•
Data warehouses facilitate decision support system applications such as trend reports (e.g., the items with the most sales in a particular area within the last two years), exception reports, and reports that show actual performance versus goals
DISADVANTAGES •
Because data must be extracted, transformed and loaded into the warehouse, there is an element of latency in data warehouse data.
•
Over their life, data warehouses can have high costs. Maintenance costs are high.
•
Data warehouses can get outdated relatively quickly. There is a cost of delivering suboptimal information to the organization.
•
Data owners lose control over their data, raising ownership (responsibility and accountability), security and privacy issues
•
Limited flexibility of use and types of users - requires multiple separate data marts for multiple uses and types of users
•
Typically, data is static and dated
•
Typically, no data drill-down capabilities
•
Difficult to accommodate changes in data types and ranges, data source schema, indexes and queries
•
Typically, cannot actively monitor changes in data
Meta data Meta data can be defined as a structured description of the content, quality, condition or other characteristics of data. Metadata needs to accompany data, otherwise the data being transmitted or communicated cannot be understood. Metadata is often called ‘data about data’. More precisely, it is the underlying definition or structured description of the content, quality, condition or other characteristics of data.
6| Page
It is well accepted in the world of statistics and large databases that metadata leads to better data. This is because they enable all people collecting, using and exchanging data to share the same understanding of its meaning and representation. There are two basic types of metadata – technical metadata and business metadata. Technical metadata consists of those technical descriptions of data such as tables, attributes, indexes and so forth. These technical types of metadata are found in data dictionaries, directories, and repositories. The world of business metadata is made of non technical definitions, formulae, descriptions, and so forth. Business metadata relies on context in order to give meaning and shades of meaning to business metadata. In addition valuable reference tables are stored in master data management facilities. The value of metadata is often not apparent.
Advantages • • • • • •
Meta data is most useful when integrated and end-to-end, promoting efficient data warehouse development and maintenance. The simplest and most practical use of meta data is to provide business descriptions of the data to BI tools and analytic applications. Metadata is used to facilitate the understanding, usage, and management of data, both by human and computers. Metadata is used to speed up and enrich searching for resources. In general, search queries using metadata can save users from performing more complex filter operations manually. Metadata provide additional information to users of the data it describes. This information may be descriptive or algorithmic. Metadata helps to bridge the semantic gap. By telling a computer how data items are related and how these relations can be evaluated automatically, it becomes possible to process even more complex filter and search operations.
Data mining Data mining (sometimes called data or knowledge discovery) is the process of analyzing data from different perspectives and summarizing it into useful information - information that can be used to increase revenue, cuts costs, or both. Data mining software is one of a number of analytical tools for analyzing data. It allows users to analyze data from many different dimensions or angles, categorize it, and summarize the relationships identified. Technically, data mining is the process of finding correlations or patterns among dozens of fields in large relational databases. Data mining consists of five major elements: •
Extract, transform, and load transaction data onto the data warehouse system.
7| Page
•
Store and manage the data in a multidimensional database system.
•
Provide data access to business analysts and information technology professionals.
•
Analyze the data by application software.
•
Present the data in a useful format, such as a graph or table.
Functions of data mining Data mining is primarily used today by retail, financial, communication, and marketing organizations with a strong consumer focus. It enables these companies to determine relationships among "internal" factors such as price, product positioning, or staff skills, and "external" factors such as economic indicators, competition, and customer demographics. It enables them to determine the impact on sales, customer satisfaction, and corporate profits.
d 8| Page
d
atm
(Data mining fig.2)
Advantages Marking/Retailing: Data mining can aid direct marketers by providing them with useful and accurate trends about their customers’ purchasing behavior. Banking/Crediting: Data mining can assist financial institutions in areas such as credit reporting and loan information. Law enforcement: Data mining can aid law enforcers in identifying criminal suspects as well as apprehending these criminals by examining trends in location, crime type, habit, and other patterns of behaviors. Researchers: Data mining can assist researchers by speeding up their data analyzing process; thus, allowing them more time to work on other projects.
Disadvantages Privacy Issues: For example, according to Washing Post, in 1998, CVS had sold their patient’s prescription purchases to a different company Security issues: Although companies have a lot of personal information about us available online, they do not have sufficient security systems in place to protect that information. Misuse of information: Some of the company will answer your phone based on your purchase history. If you have spent a lot of money or buying a lot of product from one company, your call will be answered really soon. So you should not think that your call is really being answer in the order in which it was receive.
9| Page
•
Whole picture of warehousing,meta daxta,data mining
• O ra
H isto ri cal data N atu re o f d ata
10 | P a g e
• Cle thanks