Visit> www.geocities.com/chinna_chetan05/forfriends.html
D ATA WA R E H O U S I N G An enterprise level IT strategy for business solutions
INTRODUCTION: IT is reshaping the basics of business. Customer services, operations, products and marketing strategies and distribution are heavily or some times even entirely dependent on Information Technology. IT and its expense have become an every day part of life. Businesses have used IT for many years to automate business processes and support analysis and presentation of information for managerial decision-making. Many organizations are using IT to develop cross-functional information systems in order to reengineer and improve vital businesses processes. The examples for suit of cross-functional enterprise resource planning software for the IT industry are ORACLE, SAP, BAAN, and PEOPLESOFT. The information systems view of business organizations can be depicted as a managerial pyramid as shown below in Fig1.
Fig1: Information System view of business organization.
Email:
[email protected]
Visit> www.geocities.com/chinna_chetan05/forfriends.html The competitive and dynamic nature of today’s transnational business environment is driving demands by business managers and analysts for IT that can provide fast answers to business queries. The IT industry has been responding to these demands with developments like data warehouses and data mining with specialized severs and software products that supports OLAP (On Line Analytical Processing). These are also called DSS (Decision Support Systems), a major category of management support systems which support managers in the decision making process. This paper attempts to project the basic details of data warehousing which is considered the enterprise solutions for many of the enterprise strategies. A data warehouse stores data from current from current and previous years that has been extracted from the previous operational database of an organization. It is a central source of data that has been screened, edited, standardized and integrated so that it can be used by managers and other end user professionals for a variety of forms of analysis and decision support in various discipline viz., the engineering, management, marketing…etc. Data Marts are specific subsets of data warehouse and Data mining is the process of extracting Knowledge that is of great relevance to the business is done by identify key factors and trends in historical patterns of business activity. Fig: 2 – How data mining extracts knowledge from a data warehouse
DATA
TARGET DATA
PRE PROCESSED
TRANSFORMED
PATTERNS
KNOWLEDGE
DATA
Source: James A. O’Brien “ Management Information Systems – Managing Information Technology in the Internet worked Enterprise. Email:
[email protected]
Visit> www.geocities.com/chinna_chetan05/forfriends.html WHAT IS DATA WAREHOUSING? An information database designed to support one or more classes of analytical tasks performed by knowledge workers such as Maintaining and reporting Analysis and diagnosis Simulation and planning (By Ramana Gogula, country managers, Sybase) Data warehousing is the process of integrating enterprise-wide corporate data into a single repository. The resulting data warehouse may then support a variety of decision analysis functions as well as strategic operational functions. (By K.Shiva Sankar, Director-Alliances, and Informix) A data warehouse is a subject oriented, integrated, time oriented, non-volatile collection of data in support of management decision-making process. (By W.H.Inmon, Data Warehousing) WHEN DO YOU DATA WAREHOUSE? Questions from CEO of an organization will be as follows: Why are our customers moving to competitors? How much of our advertising budget should be allocated to certain region? Why that is the southern-region sales have gone down? Can we analyze any hidden trends in our business directions? I want to know more about our customers buying patterns? WHEN DO ORGANIZATIONS NEED DATA WAREHOUSE TECHNOLOGIES? •
SITUATION: The nature of the business is too dynamic to deal with. SOLUTION: New business realities-need to cope-up with change.
•
SITUATION: Huge amount of heterogeneous disorganized data that needs to be understood. SOLUTION: Integrating heterogeneous data sources – A need for data abstraction technology.
Email:
[email protected]
Visit> www.geocities.com/chinna_chetan05/forfriends.html •
SITUATION: Business models are to be designed from historical and current data. SOLUTION: Too much data, not enough information – Need for meta-data management and Build business models.
•
SITUATION: There is requirement for high performance interactive decision support. SOLUTION: Think big, start small – Simple, easy and elegant way to data warehousing.
IMPLEMENTING DATA WAREHOUSE ARCHITECTURE: Enterprise – wide approach: Requires large investment initially for hardware, modeling, and data analysis. Consumes time. Require co-ordination skills across many different organizations within the company. Build reports using corporate data. Data mart approach: Select one or two key applications to focus in first, define the data marts required to support them and start with manageable project. As and when new applications are brought on-line, merge them into the enterprise-wide data warehouse. More modular. More linear cost curve. May have some contention problems. Build reports using data related to a functional growth initially, subsequent derivation from different functional groups.
Data Sources
Meta data
Extract Transform Maintain Legacy data
VISA Operational data
Email:
[email protected]
Enterprise Data warehouse
Visit> www.geocities.com/chinna_chetan05/forfriends.html Fig: 3 Architecture of Data Warehouse
IMPLEMENTATION ISSUES AND TASKS: Set attainable goals: Know what kind of data marts/data warehouses and how they are required. Validate overall requirement: Make sure that the related personnel of the organization are working towards achieving the above goal. Identify data sources: List all the data sources; find its complexity and size. Capacity planning and technology evaluation: Estimation of amount of data Nature of the data Disk space requirement Hardware, software, supporting tools requirements. Define data warehouse schemas: Star – schema Snow – flake schema Star – flake schema Data acquisition/transformation/cleansing: Transformation of data Consolidation of data Define load and maintenance strategy: Database loading and maintenance strategy automatic or manual? Stored procedures, triggers. Performance – Indexing: Performance tuning of database by properly selecting columns of the tables to be indexed Bit map indexes etc. Performance – aggregation: Computing summarized results from base table details to support user queries at various hierarchical levels. Performance sampling: Deriving estimated results from a small subset of actual detailed data. Email:
[email protected]
Visit> www.geocities.com/chinna_chetan05/forfriends.html OLAP/DSS deployment: Check whether a single interface is appropriate for all users WHAT MAKES A DATA WAREHOUSE SUCCESSFUL? System integration Choosing the right tools - (refer to Table1) Deciding on what to warehouse - (customer satisfaction) Implementation methodology WHY DATA WAREHOUSE FAIL? Business factors: Lack of business support and participation Inadequate requirements and unrealistic expectations Lack of implementation experience and methodology Inappropriate technical architecture Lack of integrated tools, no metadata Poor quality of data Technical factors: Cost overruns, caused by wrong estimation of hardware/network resources Time overruns and changing priorities Scope escalation Lack of focus on main problems Non co-operation of one or more groups
Email:
[email protected]
Visit> www.geocities.com/chinna_chetan05/forfriends.html
Data Company Aperatus Technology Brio Technology Business Objects Carleton Crystal Services DB Star Dimensional Insight ETI Holistic Systems IBM Informatica IBI Informix Software Integral Solutions Kenan Technologies KPMG Netware Oracle Platinum Technology Price Waterhouse Prism Solutions Red Brick SAS Institute Software AG Sybase Torrent Systems Vality Technology
Extraction X
Data RDBMS
OLAP
Business
Mining Intelligence Tools
Consulting
X X X X X X X X X X X X
X X X X
X X X X
X X X
X X
X X
X X
X X X
X X X X X X X
X X
X
X X
Table1: Some Data Warehouse Tools in IT Industry
Data warehouses – The ‘Ideal’ Solution v/s Data marts: ADVANTAGES OF DATA WAREHOUSE: Centralized storage of information reducing redundancy Email:
[email protected]
X X X
X X
Visit> www.geocities.com/chinna_chetan05/forfriends.html Ensures data integrity Common understanding of data across the enterprise Effort of data extraction and loading is done once More efficient use of hardware and networking resources DISADVANTAGES OF DATA WAREHOUSE: Long term, longer time for building High number of users has to agree to spend time and money for the process to succeed. One person can stop the process Differences among persons handling and functional areas can bring the project to a halt Common meta data cannot be achieved when common understanding of business does not exist More types of users, means complex summaries and applications need to be build Scope escalation, as requirements change through the length of the project Higher risk as number of variables increases ADVANTAGES OF DATA MARTS: Quicker return on investment Less costly (time, money, personnel) Lesser number of variables to control and hence lower risk Less chance of scope escalation Possible to implement even when common understanding of business across departments does not exist Lesser data and conflicts, and simpler models Less complex in terms of the types of users and hence designing of summary tables More focused on specific business problems DISADVANTAGES OF DATA MARTS: Duplication of effort in data extraction and cleansing Duplication of data at times with integrity problems Different understanding of data between departments Possibility of different standards MYTHS ABOUT DATA WAREHOUSES: Email:
[email protected]
Visit> www.geocities.com/chinna_chetan05/forfriends.html Myth: Data warehouse is a repository of all historical data. Reality: Data warehouse is historical data required for decision support, arranged by subject area. Myth: Data warehouse is always a very large database. Reality: Data warehouse could be small. It depends on the kind of business, and the amount of information required for solving business problems. Myth: Complexity of data warehouse comes from the size of data and the number of users. Reality: Complexity comes from the multiplicity of data sources, and the different types of users/subject areas. Myth: Data marts are smaller data warehouses. Reality: Data marts are focused subject specific data warehouses. Actual database size does not determine whether it’s a warehouse or a mart. Myth: Data marts have fewer users. Reality: Data marts have fewer types of users. If a solution has to be hundred sales manager its still a data mart. If it is used by 5 sales managers, 5 production managers and CFO, it turns into a warehouse. CONCLUSION: Thus we conclude that data warehousing is indispensable information technology strategy for solving enterprise- wide business problems especially when combined with data mining tools.
REFERENCES: 1. Ralph Kimball – The Data warehouse Tool kit 2. James A. O’Brien – Management Information System 3. Dataquest supplement – 1998 Jan
Email:
[email protected]