INFORMATION WAREHOUSE IN THE RETAIL INDUSTRY
Document Number GG24-4342-00
August 1994
International Technical Support Organization San Jose
Take Note! Before using this information and the product it supports, be sure to read the general information under “Special Notices” on page xiii.
First Edition (August 1994) This edition applies to the following products: • • • • • • • •
DataPropagator Relational Version 1 Release 1, Program Number 5622-244 DataHub/2 Version 1 Release 1, Program Number 5667-134 DataGuide/2 Version 1 Release 1, Program Numbers 5622-487 and 5622-488 FlowMark Version 1 Release 2, Program Number 5621-290 DataPropagator NonRelational Version 1 Release 2, Program Number 5696-705 DataRefresher Version 3 Release 1, Program Number 5696-703 Visualizer Query Version 1 Release 0, Program Number 5871-BBB S/390 Parallel Query Server
Order publications through your IBM representative or the IBM branch office serving your locality. Publications are not stocked at the address given below. An ITSO Technical Bulletin Evaluation Form for readers′ feedback appears facing Chapter 1. If the form has been removed, comments may be addressed to: IBM Corporation, International Technical Support Organization Dept. 471, Building 070B 5600 Cottle Road San Jose, California 95193-0001 When you send information to IBM, you grant IBM a non-exclusive right to use or distribute the information in any way it believes appropriate without incurring any obligation to you. Copyright International Business Machines Corporation 1994. All rights reserved. Note to U.S. Government Users — Documentation related to restricted rights — Use, duplication or disclosure is subject to restrictions set forth in GSA ADP Schedule Contract with IBM Corp.
Abstract This publication is one of three publications that relate Information Warehouse architecture and products to industry applications and requirements. These three publications are: • • •
Information Warehouse in the Finance Industry Information Warehouse in the Insurance Industry Information Warehouse in the Retail Industry .
The publications describe the Information Warehouse Architecture I and emphasize the following products: • • • • • • •
DataPropagator Relational DataHub DataGuide FlowMark DataPropagator NonRelational DataRefresher Visualizer.
These products provide a variety of functions defined in Information Warehouse Architecture I . This publication is intended for business analysts acting as consultants to an Information Warehouse implementation project and technical professionals who are designing Information Warehouse solutions in the Retail industry. A knowledge of the Information Warehouse framework is assumed. DS
(137 pages)
Abstract
iii
iv
The Retail Industry IW
Contents PART 1. INTRODUCTION
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Chapter 1. Industry Library Introduction
1
1.1 Library at a Glance . . . . . . . 1.2 Terminology . . . . . . . . . . . 1.3 Introduction to Solution Threads
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
3 3 4 4
PART 2. THE BUSINESS VIEW
. . . . . . . . . . . . . . . . . . . . . . . . . . . . .
9
. . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Chapter 2. Retail Industry Perspective 2.1 Retail Industry Trends . . . . . . . . . 2.2 Challenges . . . . . . . . . . . . . . . . 2.3 Retail Industry Organization . . . . . 2.3.1 The Store Environment . . . . . . 2.4 Corporate Environment . . . . . . . . 2.5 Retail Enterprise Network . . . . . . . 2.5.1 Target Business Units for Solution 2.5.2 Data Processing Functions . . . . 2.6 Key Systems . . . . . . . . . . . . . . 2.6.1 Store Systems . . . . . . . . . . . 2.6.2 Corporate Systems . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Chapter 3. Retail Industry Business Requirements 3.1 Value of Information . . . . . . . . . . . . . . . . . . 3.1.1 Precision . . . . . . . . . . . . . . . . . . . . . . 3.1.2 Discovery . . . . . . . . . . . . . . . . . . . . . . 3.1.3 Business Process Reengineering . . . . . . . . 3.2 Solution Thread Requirements . . . . . . . . . . . . 3.2.1 View of Data Everywhere . . . . . . . . . . . . . 3.2.2 Access to Data Everywhere . . . . . . . . . . . 3.2.3 Access to Local In-Store Processor Data . . . 3.2.4 Access Summary and Detail Data . . . . . . . 3.2.5 Four Years of Trend Data . . . . . . . . . . . . 3.2.6 Dissemination of Analysis Results . . . . . . . 3.3 Origin of Business Requirements . . . . . . . . . . 3.3.1 Inhibitors to Business Growth . . . . . . . . . . 3.3.2 Qualifying an Information Warehouse Approach
Chapter 4. The Retail Solution Thread
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
33 33 34 35 37 38 39 40 41 41 42 42 42 43 45
. . . . . . . . . . . . . . .
47 48 49 52
. . . . . . . . . . . . . . . . . . . . . . . . . .
55
. . . . . . . . . . . . . . . . . . . . . . .
4.1.1 Information Warehouse Framework in Retail . 4.1.2 New Scenarios for Profit . . . . . . . . . . . . . 4.1.3 Requirements Mapping: Summarized →Detailed
PART 3. THE TECHNOLOGY VIEW
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
11 14 14 15 18 20 20 22 23 30 30 31
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Contents
v
Chapter 5. Retail Industry Architecture
. . . . . . . . . . . . . . . . . . . . . .
5.1 Retail Application Architecture . . . . . . 5.2 The Store Logical Data Model . . . . . . 5.2.1 The IBM In-Store Processing Strategy 5.2.2 The Model . . . . . . . . . . . . . . . . 5.2.3 SLDM Benefits . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Chapter 6. Information Warehouse Framework
. . . . . . . . . . . . . . . . .
6.1 Value of the Information Warehouse Framework 6.2 Why Data Replication . . . . . . . . . . . . . . . . 6.2.1 Operational Systems . . . . . . . . . . . . . . 6.2.2 Database Technology . . . . . . . . . . . . . . 6.2.3 Cost of Data Access . . . . . . . . . . . . . . . 6.2.4 Historical Data . . . . . . . . . . . . . . . . . . 6.2.5 Ownership . . . . . . . . . . . . . . . . . . . . 6.2.6 Point-in-Time Data . . . . . . . . . . . . . . . . 6.2.7 Reconciliation . . . . . . . . . . . . . . . . . . 6.3 The Information Warehouse Architecture . . . . 6.4 Using the Information Warehouse Architecture . 6.5 Access Enablers . . . . . . . . . . . . . . . . . . . 6.5.1 Embedded SQL . . . . . . . . . . . . . . . . . . 6.5.2 SQL Call Level Interface . . . . . . . . . . . . 6.5.3 Distributed Relational Database Architecture 6.6 The Retail Industry . . . . . . . . . . . . . . . . . . 6.7 Information Catalog . . . . . . . . . . . . . . . . . 6.7.1 Function . . . . . . . . . . . . . . . . . . . . . . 6.7.2 Interfaces . . . . . . . . . . . . . . . . . . . . . 6.8 Information Warehouse Architecture Products . 6.8.1 The DataGuide Family . . . . . . . . . . . . . 6.8.2 S/390 Parallel Query Server . . . . . . . . . . 6.8.3 DataPropagator Relational . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.8.4 Personal AS/2 6.9 Why Use the Information Warehouse Architecture
. . . . . . . . . . . . . . . . .
Chapter 7. Organization Asset Data
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . .
7.1 The Solution . . . . . . . . . . . . . . . . 7.2 S/390 Parallel Query Server . . . . . . 7.2.1 Software Configuration . . . . . . . 7.2.2 Information Maintenance . . . . . . 7.2.3 Retail Enterprise Operations . . . . . . . . . . . . . . . . . 7.3 Technical Issues 7.3.1 Types of Parallelism . . . . . . . . . 7.3.2 S/390 Parallel Query Server Design 7.3.3 Query Splitting . . . . . . . . . . . . 7.3.4 Front-End MVS System . . . . . . .
vi
. . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Chapter 8. Information Catalog
. . . . . . . . . . . . . . . . . . . . . . . . . .
8.1 Information Catalog Function . . 8.2 DataGuide . . . . . . . . . . . . . 8.2.1 Basic Structure . . . . . . . . 8.2.2 Knowledge Worker Functions 8.2.3 Search . . . . . . . . . . . . . 8.2.4 Launch Applications . . . . . 8.2.5 Create Collections . . . . . . 8.2.6 Display Contact Information 8.2.7 View Current News . . . . . 8.2.8 View Glossary . . . . . . . . 8.2.9 Administrator Functions . . 8.2.10 Extending DataGuide/2 . .
. . . . . . . . . . . . . . . . . . . . . . . . . .
The Retail Industry IW
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
57 57 63 64 65 66 67 68 69 69 70 70 71 71 71 71 71 73 74 76 76 77 77 80 80 80 81 81 82 83 83 85 87 90 92 94 94 94 95 95 97 98 98 101 102 103 105 106 107 113 113 114 114 115 115 116
8.2.11 Meta-data Management 8.2.12 DataGuide Data Model 8.2.13 Interfaces . . . . . . .
Chapter 9. Conclusions
. . . . . . . . . . . . . . . . . . . . . . . . . . . . .
120 121 123
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
127
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Appendix A. Models and Modeling
. . . . . . . . . . . . . . . . . . . . . .
129 129 130 130 131 131
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
133
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
135
Contents
vii
. . . . . . . . . . . . . . . . . . . . . . . .
A.1 The Construction Model . . . . . . . . A.1.1 Entity: Things . . . . . . . . . . . . A.1.2 Entity: Agreements . . . . . . . . A.2 The Annual Report As a Model . . . A.3 Information Warehouse and Modeling
List of Abbreviations Index
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
viii
The Retail Industry IW
Figures 1. 2. 3. 4. 5. 6. 7. 8. 9. 10. 11. 12. 13. 14. 15. 16. 17. 18. 19. 20. 21. 22. 23. 24. 25. 26. 27. 28. 29. 30. 31. 32. 33. 34. 35. 36. 37. 38. 39. 40.
Basic Set of Business Objects . . . . . . . . . . . . . . . . . . . . . . . . . Current Business Environment of the Retail Enterprise . . . . . . . . . EFT Transaction in the Retail Enterprise . . . . . . . . . . . . . . . . . . . Generic Department Store Network . . . . . . . . . . . . . . . . . . . . . . Retail Enterprise Network . . . . . . . . . . . . . . . . . . . . . . . . . . . . Retail Enterprise Network . . . . . . . . . . . . . . . . . . . . . . . . . . . . Business Modeling of Key Retail Entities . . . . . . . . . . . . . . . . . . GSA′s Item File Format . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . File Redirection Services . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Retail Industry Study . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Selective Marketing Campaign . . . . . . . . . . . . . . . . . . . . . . . . . RAA Business Activities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . The Information Warehouse Architecture and Information Processing Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . In-Store Processing Application Development Strategy . . . . . . . . . SLDM Model and Database . . . . . . . . . . . . . . . . . . . . . . . . . . . Information Warehouse Architecture . . . . . . . . . . . . . . . . . . . . . Information Warehouse Framework . . . . . . . . . . . . . . . . . . . . . . Access Enablers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Future Retail Enterprise Network: Connectivity and Operating Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Organization Asset Data in the Retail Industry . . . . . . . . . . . . . . . Query Cost as a Function of Query Complexity and Data Volume . . Special and General-Purpose Solutions . . . . . . . . . . . . . . . . . . . Databases and Files in Retail Enterprise Network . . . . . . . . . . . . Connectivity between Personal AS/2 and S/390 Parallel Query Server . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Parallel CPU Processing Environments . . . . . . . . . . . . . . . . . . . Information Catalog . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . A Card Catalog Information Catalog . . . . . . . . . . . . . . . . . . . . . DataGuide/2 Initial Panel . . . . . . . . . . . . . . . . . . . . . . . . . . . . Search Specification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Search Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Save Search . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . New Initial Work Area Panel . . . . . . . . . . . . . . . . . . . . . . . . . . Navigation Tree . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . A Collection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Initial and DataGuide Administrator Panels . . . . . . . . . . . . . . . . DataGuide Administrator Panel . . . . . . . . . . . . . . . . . . . . . . . . Object Types . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Create Object Type . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Add Object Type Property . . . . . . . . . . . . . . . . . . . . . . . . . . . DataGuide Categories . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Figures
.
6 15 19 21 22 26 27 27 28 48 50 58 62 64 66 72 74 75 84 88 90 91 92
93 96 101 102 107 108 109 110 111 112 114 117 117 118 119 120 122
ix
x
The Retail Industry IW
Tables 1. 2. 3. 4. 5. 6. 7. 8.
Library of Information Warehouse: Products Covered . . . . . . . . Store Responsibilities and Functions . . . . . . . . . . . . . . . . . . . Corporate Responsibilities and Functions . . . . . . . . . . . . . . . . Relationship between Scenarios and RAA Constructs . . . . . . . . Logical Design Points for an Information Warehouse . . . . . . . . Physical Design Points for Information Warehouse Implementation DataGuide Configurations . . . . . . . . . . . . . . . . . . . . . . . . . . DataGuide Services . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
.
4 16 17 61 77 79 105 124
Tables
xi
. . . . . . . . . . . . .
xii
The Retail Industry IW
Special Notices This publication is intended to help: • • •
Business analysts understand Information Warehouse (IW) architecture concepts IBM technical professionals understand industry environments Customer data processing personnel understand industry environments.
The information in this publication is not intended as the specification of any programming interfaces that are provided by a variety of products that perform functions described in the Information Warehouse architecture. See the PUBLICATIONS section of the IBM Programming Announcement for these products for more information about what publications are considered to be product documentation. References in this publication to IBM products, programs or services do not imply that IBM intends to make these available in all countries in which IBM operates. Any reference to an IBM product, program, or service is not intended to state or imply that only IBM′s product, program, or service may be used. Any functionally equivalent program that does not infringe any of IBM′s intellectual property rights may be used instead of the IBM product, program or service. Information in this book was developed in conjunction with use of the equipment specified, and is limited in application to those specific hardware and software products and levels. IBM may have patents or pending patent applications covering subject matter in this document. The furnishing of this document does not give you any license to these patents. You can send license inquiries, in writing, to the IBM Director of Licensing, IBM Corporation, 500 Columbus Avenue, Thornwood, NY 10594 USA. The information contained in this document has not been submitted to any formal IBM test and is distributed AS IS. The information about non-IBM (VENDOR) products in this manual has been supplied by the vendor and IBM assumes no responsibility for its accuracy or completeness. The use of this information or the implementation of any of these techniques is a customer responsibility and depends on the customer′s ability to evaluate and integrate them into the customer′s operational environment. While each item may have been reviewed by IBM for accuracy in a specific situation, there is no guarantee that the same or similar results will be obtained elsewhere. Customers attempting to adapt these techniques to their own environments do so at their own risk.
Special Notices
xiii
The following terms, which are denoted by an asterisk (*) in this publication, are trademarks of the International Business Machines Corporation in the United States and/or other countries: BookManager Common User Access DataGuide DB2 DB2/6000 ImagePlus MVS/ESA QMF
CICS/ESA CUA DataHub DB2/2 IBM IMS/ESA PS/2 RISC System/6000
The following terms, which are denoted by a double asterisk (**) in this publication, are trademarks of other companies: Apple BRIDGE/FASTLOAD KnowledgeWare Application Development Workbench MacIntosh Microsoft, Windows Motif OSF ObjectStore Database 1-2-3, Lotus, Freelance, Freelance Graphics OMegamon Fast Load Fast Unload Rapid Reorg Quick Copy In2itive
Apple Computer, Inc. Bridge Technology, Inc. KnowledgeWare, Inc. Apple Computer Company Microsoft Corporation Open Software Foundation Open Software Foundation, Inc. Object Design, Inc. Lotus Development Corporation. Candle Corporation PLATINUM Technology PLATINUM Technology PLATINUM Technology PLATINUM Technology LEGENT
Other trademarks are trademarks of their respective companies.
xiv
The Retail Industry IW
Inc. Inc. Inc. Inc.
Preface This document is intended to merge industry analysis, industry architecture, the Information Warehouse architecture, new product discussion, and specific solutions to industry requirements. It contains discussion of specific industry issues, industry architecture for data processing, Information Warehouse architecture, and solutions. This document is intended for business analysts and data processing professionals.
How This Document Is Organized The document is organized as follows: •
Introduction This part introduces the library within which this particular book is included.
•
The Business View This part establishes the business-oriented level set from which business requirements and Information Warehouse solutions are developed. The first chapter presents a perspective on the Retail industry that includes trends and challenges, key systems, and information technology′s position in the Retail industry.
•
The Technology View This part presents the technology solutions for the business requirements established in the business view. It includes an overview of the industry application architecture and the Information Warehouse architecture. It then discusses the individual components of the Information Warehouse architecture and the solutions to those components, according to the needs of the industry.
Preface
xv
Related Publications The following publications are considered particularly suitable for a more detailed discussion of the topics covered in this document: •
“An Architecture for a Business and Information System,” B.A. Devlin and P.T. Murphy, IBM Systems Journal , Vol. 27, No. 1 (1988)
•
“Building Business and Application Systems with the Retail Application Architecture,” P. Stecher, IBM Systems Journal , Vol. 32, No. 2 (1993)
•
Client-Server Computing: The Design and Coding of a Business Application , GG24-3899-00.
•
DataGuide/2 V1: Using DataGuide/2 , SC26-3365
•
Delivering Data to the Information Warehouse , Rob Goldring, InfoDB Summer 1992
•
Financial Application Architecture: FAA Concepts of Application and System Architectures , LY38-4402-0
•
Information Technology and the Management Difference: A Fusion Map , IBM Systems Journal , Vol. 32, No 1 (1993)
•
Financial Application Architecture Introduction , GC31-3932-0
•
“The Future of Health Care Information Systems,” Michael Carrigan, Hospital Materiel Management Quarterly , August 1993
•
Information Warehouse Architecture I , SC26-3244
•
Insurance Industry Futures: Directions for the 21st Century , Anderson Consulting and LOMA 1993
•
“Loaning Banks Some Courage,”
•
“The Model Business,”
•
Principles of Life and Health Insurance , G. Morton, 1988 LOMA
•
“The Spectrum of Data Delivery for Business Information Systems,” Rob Goldring, DB/Expo92
xvi
The Retail Industry IW
Information Week , August 12, 1993
IW Today , August 12, 1993
International Technical Support Organization Publications •
Information Warehouse Architecture and Info. Catalog , GG24-4019
•
Information Warehouse Storage Management Guidelines and Considerations , GG24-4336
•
Information Warehouse in the Finance Industry , GG24-4340
•
Information Warehouse in the Insurance Industry , GG24-4341
•
Library for System Solutions: Data Reference , GG24-4103
A complete list of International Technical Support Organization publications, with a brief description of each, may be found in:
Bibliography of International Technical Support Organization Technical Bulletins, GG24-3070. To get a catalog of ITSO technical publications (known as “redbooks”) online, VNET users may type: TOOLS SENDTO WTSCPOK TOOLS REDBOOKS GET REDBOOKS CATALOG How to Order ITSO Technical Publications IBM employees in the USA may order ITSO books and CD-ROMs using PUBORDER. Customers in the USA may order by calling 1-800-879-2755 or by faxing 1-800-284-4721. Visa and Master Cards are accepted. Outside the USA, customers should contact their local IBM office. Customers may order hardcopy ITSO books individually or in customized sets, called GBOFs, which relate to specific functions of interest. IBM employees and customers may also order ITSO books in online format on CD-ROM collections, which contain books on a variety of products.
Preface
xvii
Acknowledgments The advisor for this project was: Steve Schaffer International Technical Support Organization, San Jose The authors of this document were: Normand Brin IBM Canada Wojeich Zagala IBM Australia Steve Schaffer International Technical Support Organization, San Jose This publication is the result of a residency conducted at the International Technical Support Organization, San Jose. Thanks to the following people for the invaluable advice and guidance provided in the production of this document: Paul Englefield, IBM Warwick Rob Goldring, IBM SWS, Santa Teresa Eileen Hiltbrand, IBM US Jacques Labrie, IBM SWS, Santa Teresa Bill Martin, IBM US Mark Mauriello, IBM Charlotte Finance Industry Rita Neuberg, IBM Charlotte Bill Payne, IBM Charlotte Insurance Industry Thanks to the following people for reviewing this document: Thomas Bilfinger, IBM ITSO, San Jose Don Cameron, IBM ITSO, San Jose Don Murray, IBM US Ralph Naegeli, IBM Switzerland Tom Romeo, IBM US Michele Schwartz, IBM SWS, Santa Teresa Special thanks to Ueli Wahli for developing the tool to generate margin comments.
xviii
The Retail Industry IW
Part 1. Introduction
Part 1. Introduction
1
2
The Retail Industry IW
Chapter 1. Industry Library Introduction This volume is one of three that look at Information Warehouse architecture and products in the finance, insurance, and retail industries. The three studies yielded somewhat different results but are presented in a standard structure. This introductory chapter describes the library, so that it may be used easily and most effectively by the individual.
1.1 Library at a Glance The study of the finance, insurance, and retail industries has been made available as a library of three books. Each book takes the same approach to discussing the respective industry, though there is some variation in the aspects of the Information Warehouse architecture covered in each industry. Table 1 on page 4 gives an overview perspective of this library. The library′s structure is based on a common set of topics that are addressed consistently across the industry studies, and different aspects of the Information Warehouse architecture being addressed in each study. This structure minimizes redundancy. Therefore, a complete review of Information Warehouse architecture function and products may require reading of all three industry studies. The common set of topics presents the study flow from a high-level perspective of the industry down to the discussion of the Information Warehouse product technology. These topics are broken down into the business view and technical view as follows: •
•
Business view − Industry perspective − Business requirements Technology view − Industry application architecture − Information Warehouse architecture − Information Warehouse framework components.
Chapter 1. Industry Library Introduction
3
Table 1. Library of Information Warehouse: Products Covered
Industry
Product
Finance
DataPropagator Relational
Insurance
FlowMark Visualizer
Retail
DataGuide* S/390 Parallel Query Server
1.2 Terminology The finance, insurance, and retail industry studies address general requirements of the respective industry. They are not studies of a real-world or a contrived enterprise. Rather, they are studies of industry issues and requirements put in the context of an industry enterprise. For this reason, we use the term financial enterprise, insurance enterprise, and retail enterprise, respectively, to reflect this approach. We begin our study by identifying the knowledge worker as the primary focus of Information Warehouse technology. Knowledge workers are the individuals in an enterprise who make decisions. They exist at every organizational level and have one thing in common: they all need information to make decisions. They get the information through informational applications, also called executive information systems, decision support software, and decision support tools. Informational applications are used to display information provided by the data replication products in the Information Warehouse solution. Therefore, the objective of the studies is to describe knowledge worker′s business need for information and the addressing of that need through the Information Warehouse architecture and products.
The knowledge worker is the focus
1.3 Introduction to Solution Threads A solution thread is a vehicle for applying architectures and products to a generic requirement of the industry. It is a generalized approach, in that the Information Warehouse architecture it uses is generalized for a given data processing function (for example, decision support) across industries, and the industry architecture it uses is generalized for enterprises within a given industry. The products used by the solution thread are generalized for that data processing function, rather than for a business requirement or hardware platform. The solution thread meets the business requirement by customizing the product to the needs of the business.
Solution thread: applying technology to a problem
4
The Retail Industry IW
The Information Warehouse architecture is a generalized architectural approach to managing data and information in a complex business environment. The industry architectures—financial, insurance, and retail—discussed in the three books of this library represent structured approaches to analyzing specific business environments. This analysis leads to specification of requirements, expressed in business terms, which data processing technologies must address. This library presents the Information Warehouse architecture as an architected approach to data processing functions required across the industries, and across the lines of business within each industry. The value of using both the industry-specific architectures and the generalized data processing architecture (Information Warehouse architecture) is to leverage the Information Warehouse architecture functions across the lines of business and to leverage the resources already invested in the industryspecific architectures. Not all features of the Information Warehouse architecture nor every product is considered in each of the three industry studies chosen for this library. Review of the three studies as a set will provide information on most of the Information Warehouse architecture features and Information Warehouse framework products. We discuss IBM′s published Information Warehouse architecture as a template for connecting business requirements to actual technical implementations, including products plugged into an Information Warehouse framework. The following example taken from the insurance industry illustrates the leverage of the Insurance Application Architecture (IAA) together with the Information Warehouse architecture. Figure 1 on page 6 shows a set of business objects for the insurance industry. These business objects are used as examples of modeling entities—OBJECT, AGREEMENT, and DEMAND_FOR_DELIVERY—found in the IAA model. IAA defines a modeling entity called an OBJECT. This entity is used to symbolically represent anything that can be insured, such as a life. IAA defines another modeling entity called Agreement. We could use this modeling entity to represent the physical insurance entity called Policy in either personal or property and casualty insurance. This is an example of the generality of the IAA being leveraged across lines of business. We could further use another entity, called DEMAND_FOR_DELIVERY, to represent Claims and PREMIUMS for Premiums.
Chapter 1. Industry Library Introduction
5
The Information Warehouse architecture is a generalized approach
Figure 1. Basic Set of Business Objects. Dashed arrows indicate data flow.
These basic entities correspond to the real-world objects. The Claims entity corresponds to the many claims made by insurees. The Premiums entity corresponds to the many premium payments made by insurees. These claims and payments are implemented as records in an operational database, say, IMS/DB, DB2*, or VSAM. The prime concern of the insurance company is profitability. In this simple exercise, profitability is defined as total premiums minus total claims. Assuming claims records and premiums records are kept in separate databases, there is a need to bring those two sets of data together. There is an additional need to reconcile this data as it is brought together. The Information Warehouse architecture defines the process by which this can be done in an architected manner. This architected approach to extracting data is called data replication. Data replication is generalized, so that it can be a solution for bringing together Claims and Premiums data for Life insurance, Property and Casualty, or any other insurance product that takes in premiums and pays out claims. The Information Warehouse architecture provides guidance for data access as well as data replication functions. Data access applies to operational data or informational data copies of the operational data, generated through data replication. Access to operational data ( direct access ) or informational data shares certain requirements for implementation. These requirements can be best understood in terms of the business requirements for data access. The lines of business are assumed to have their own data stores containing records of insurees, sometimes called client files. The Life insurance line of business has an interest in using the client file owned by the property and casualty line of business for prospecting purposes, and vice versa. This requirement brings up two issues relevant to the Information Warehouse
6
The Retail Industry IW
framework: what data is available, and how to access it. The first requirement is resolved through the Information Catalog, while the second is resolved through access enablers or data replication. The point here is that both lines of business have the same needs to access data, and both needs can be resolved by solutions based on the Information Warehouse architecture.
Chapter 1. Industry Library Introduction
7
8
The Retail Industry IW
Part 2. The Business View Chapter 2. Retail Industry Perspective
. . . . . . . . . . . . . . . . . . . . . . .
2.1 Retail Industry Trends . . . . . . . . . . . . . 2.2 Challenges . . . . . . . . . . . . . . . . . . . . 2.3 Retail Industry Organization . . . . . . . . . 2.3.1 The Store Environment . . . . . . . . . . 2.4 Corporate Environment . . . . . . . . . . . . 2.5 Retail Enterprise Network . . . . . . . . . . . 2.5.1 Target Business Units for Solution . . . 2.5.2 Data Processing Functions . . . . . . . . 2.5.2.1 Merchandising and Operations Roles 2.5.2.2 Retail Enterprise Data Flows . . . . 2.6 Key Systems . . . . . . . . . . . . . . . . . . 2.6.1 Store Systems . . . . . . . . . . . . . . . 2.6.2 Corporate Systems . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Chapter 3. Retail Industry Business Requirements
. . . . . . . . . . . . . .
3.1 Value of Information . . . . . . . . . . . . . . . . . . . . . . . . 3.1.1 Precision . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.1.2 Discovery . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.1.3 Business Process Reengineering . . . . . . . . . . . . . . 3.2 Solution Thread Requirements . . . . . . . . . . . . . . . . . . 3.2.1 View of Data Everywhere . . . . . . . . . . . . . . . . . . . 3.2.2 Access to Data Everywhere . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.2.2.1 Ease of Use 3.2.2.2 Ad Hoc Query Capability . . . . . . . . . . . . . . . . 3.2.3 Access to Local In-Store Processor Data . . . . . . . . . 3.2.4 Access Summary and Detail Data . . . . . . . . . . . . . 3.2.5 Four Years of Trend Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.2.6 Dissemination of Analysis Results 3.3 Origin of Business Requirements . . . . . . . . . . . . . . . . 3.3.1 Inhibitors to Business Growth . . . . . . . . . . . . . . . . 3.3.1.1 Faster Market Analysis and Reaction . . . . . . . . . 3.3.1.2 Rapidly changing criteria for POS data as information 3.3.1.3 Increased Volume of Valuable Data . . . . . . . . . . 3.3.2 Qualifying an Information Warehouse Approach . . . . .
Chapter 4. The Retail Solution Thread
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . .
4.1.1 Information Warehouse Framework in Retail . . . . . . . . . . . . . . 4.1.2 New Scenarios for Profit . . . . . . . . . . . . . . . . . . . . . . . . . . 4.1.2.1 Scenario 1: Differential Marketing by Preference Group . . . . . 4.1.2.2 Scenario 2: Quick Market Response and Follow-on Adjustments 4.1.2.3 Scenario 3: Sales Analysis for Promotion Negotiations . . . . . 4.1.2.4 Scenario 4: Buying on Key Product Attributes . . . . . . . . . . . 4.1.3 Requirements Mapping: Summarized →Detailed . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
Part 2. The Business View
11 14 14 15 18 20 20 22 23 23 26 30 30 31 33 33 34 35 37 38 39 40 41 41 41 41 42 42 42 43 44 44 44 45 47 48 49 49 50 51 51 52
9
10
The Retail Industry IW
Chapter 2. Retail Industry Perspective
Retailing is arguably the oldest profession in the world. From its inception in bartering and community trading to neighborhood corner stores, retailing has evolved into one of the largest, most competitive businesses across the globe. Retailing has traditionally offered fewer barriers to entry than industries such as manufacturing, publishing, or transportation. This ease of entry has led to new formats, concepts, and competitive forces in retailing that constantly challenge existing retailers. Retailers are moving toward an unprecedented level of sophistication. The act of buying was even recently considered an art form, a “feel,” and an intuition. The best merchants were measured by their instinct to know or ability to predict what the customer would want in the future. The financial side of retail, however, was more scientific in nature; not concerned with trends, demographics, or merchandising. It focused instead on gross margin return on assets (GMROA), return on investments (ROI), and dollar sales per square foot of space as the measure of success. The changes in retail in the 1980s brought about greater transformation in the industry than in the entire previous century. The retail segments have become increasingly blurred, making it difficult to uniquely identify a fullservice provider, discounter, or specialty store. Retailers now exist who offer multiple services and formats, and they have evolved from regional
Chapter 2. Retail Industry Perspective
11
players to national and international powerhouses. Thus, the intuition upon which merchants and operational and financial executives have relied to make their business and investment decisions has often proven insufficient to compete effectively in the new global retail marketplace. Operational and merchandising executives need to understand and embrace more detailed information to compete. Retail information systems were formerly deemed sufficient to capture transactional levels of data, such as what sold yesterday, where, and at what price. This transactional level of data, while critical, is no longer sufficient to make informed retail investment decisions. Today′s market—a buyer′ s market—forces wholesalers and retailers to compete intensely for customers. In the retail industry, this means developing customer loyalty and influencing customer purchase decisions concurrently with expense reduction. Retailers today must be able to take historical data—what sold, where, and at what price—and infer from that data what will sell tomorrow: • • •
Did the blouse that sold in Boston sell because it was blue? Did it sell because it was silk? Did it sell because of its ruffled collar?
Alternatively, if the blue silk blouse with ruffles sold, does that mean the retailer could exploit this emerging trend by allocating more open-to-buy to the silk dress category? Also, if it sold in Boston yesterday, how, based on trends, will that blouse sell in Columbus tomorrow? These kinds of questions demand, for their resolution, a tremendous amount of data and information. More importantly, they demand access to and manipulation of that data throughout and between all levels of the retail organization. The role of providing information has traditionally rested with the information systems department. The demands that are being placed on retail information systems departments for information, and the ability to provide that information in such a manner as to let the user manipulate and query it presents a formidable challenge to information systems professionals. These information demands require new approaches to the capture, storage, and access to data. This document offers insights into those new approaches. The Information Warehouse framework supplies the tools that enable the retailer to provide consistent, clean data and information to all of its users throughout the organization. The Information Warehouse implementation is built on widely accepted standards for the storage and transmission of and access to information. Furthermore, the Information Warehouse architecture is not proprietary. It is open to any and all users and information technology providers, so that a retailer is not bound to a specific information technology vendor for database, application, operating system, or hardware solutions. The Information Warehouse architecture is built on the use of relational databases—the evolving standard for informational data storage—and Structured Query Language (SQL) for accessing informational data across the enterprise. The ability to establish an accepted standard for an architecture allows all vendors and users to participate in the delivery of solutions that can benefit the entire industry. The Information Warehouse allows the retailer′s current “legacy” systems, such as payroll and accounts receivable, to
12
The Retail Industry IW
participate in the solution, so an entire rewrite of established mission critical systems is unnecessary. The ability of a retailer to develop information systems that not only allow it to know what happened yesterday, but also provide insights into what will happen tomorrow, is an unparalleled competitive advantage. This book describes how the retailer can develop this capability through the Information Warehouse architecture. A great deal of the initial historical effort to streamline retail operations has focused on the product resource. Today, in many retail businesses, point-ofsale (POS) systems are the primary feeder of data into the retail enterprise. Credit card purchases and electronic funds transfer (EFT) transactions are financial processes that complement the POS system and generate a great deal of detailed data on the client resource. Retail operations capable of exploiting credit card information can be found in many market sectors (for example, food, automotive, and pharmacy).
The retail industry focus: the product resource
The POS, credit card, and EFT systems feed data to software packages that manage merchandising, inventory, labor, price, and store productivity. Software packages—fed by the POS systems—execute on the POS systems, an in-store processor, and the large server. All of these packages contribute to product management or controlling the infrastructure of the retail business. They also keep costs low and thereby enable effective marketplace competition. Specialized program development, geared to product and enterprise infrastructure, goes hand-in-hand with customer support systems.
The POS, credit card, and EFT systems feed other systems
With POS data added to markets resource data, the collective information represents a large and continuously growing asset. Trend analysis is seen as the methodology for understanding client and market direction, hidden in the large volumes of data. Therefore, large volumes of data need to be managed.
Trend analysis discovers the business
The organization of this retail industry study follows that of the end-to-end design of a robust business system: • • • • • •
General business issues are discussed. A brief modeling of enterprise business is undertaken. Required business systems are identified. Detailed requirements are gathered for those business units most in need of a solution. Requirements are refined down to a level that can be directly translated into technology. Maintenance issues are resolved by automation and procedures.
The difference between the development of a single business application and the development of an approach to developing business applications lies in the end result—the business system. In the traditional approach, the end result is an application program or multiple programs delivering end-user function and selective data access. In the case of the retail solution thread used in this book, we deliver an architecture-based system to support decision making. The recipient of the Information Warehouse solution, who we will sometimes call a knowledge worker or analyst, determines function and required data access.
Chapter 2. Retail Industry Perspective
13
Enterprises from different segments of the retail marketplace face common challenges. This book uses an Information Warehouse solution thread to address selected challenges of a representative retail business.
2.1 Retail Industry Trends Today′s retail enterprises are subject to many of the forces other industrial sectors face: • • • • • •
The drive to improve quality of service The drive to improve profitability Competition Government regulation Managing mergers with other companies The desire to exploit cheaper technology.
IBM has involved representative enterprises from many retail market sectors in an effort to respond to these forces. The result is the development of the Retail Application Architecture as a beginning point for the solution. That solution is an overall information systems strategy for managing the business environment. IBM discovered during its work with retail enterprises that retailers identified three basic business resources as key to their business. Furthermore, they attached a specific hierarchy of importance to these resources for their daily operations, as follows: • • •
Product Client The client resource refers to specific individual customers. Market The market resource is made up of customers treated as one or more groups. Suppliers to the retail industry are a subset of this resource.
2.2 Challenges The retail enterprise′s corporate mission is to provide quality service to satisfied customers and maintain corporate profit. The objectives in the current market environment are to: • • • •
Reduce the cost of doing business Attract and retain customers Improve business processes Reduce cycle times.
Typical strategies are to: •
14
Increase profitability from within by: − Leveraging external resources − Focusing on costs − Emphasizing creative uses of technology (for example, database mining) − Reengineering business processes.
The Retail Industry IW
• •
Differentiate customer service from the competition Improve merchandise flow management through: − Quick response: reacting to trends and getting merchandise to the floor quickly − Electronic data interchange − Effective inventory management − Profit contribution analysis: focus on specific product profit, not overall revenue.
2.3 Retail Industry Organization Figure 2 shows the positioning of business roles (units) within the retail enterprise. Technology that addresses the business strategies listed above must have the potential to integrate these major business units. The business roles are typical of many market segments in the retail industry. The need to connect these business units to the same source of information will affect an Information Warehouse implementation.
Figure 2. Current Business Environment of the Retail Enterprise. Arrows indicate current business unit relationships. Not all relationships are implemented by electronic connections.
Chapter 2. Retail Industry Perspective
15
Roles can be within or across units
Empowering workers changes roles in the organization
Some business roles exist only at the corporate level (for example, finance); while others exist at the store level (for example, department manager). Specific business units or departments within the retail enterprise (for example, the purchasing department) may include tasks that cross these levels. Within the retail enterprise, there is a division of business responsibility between store and corporate headquarters personnel. That distinction, however, is becoming more blurred as management initiative seeks to empower workers. Today, store personnel are primarily responsible for the day-to-day operations of their assigned store. Table 2 shows the responsibilities at the store level and the functions performed within the scope of those responsibilities. Table 2. Store Responsibilities and Functions
Responsibility
Function
Product
• • •
Administration
• •
Customer
• •
Store operations
• • • • •
Controlling store inventory Local pricing initiatives Replenishment ordering Labor management (primarily shift scheduling) Time and attendance reporting Customer credit management Customer service Checkout Loss prevention Receiving Financial records Training
Table 3 on page 17 shows the responsibilities of corporate personnel and the functions performed within the scope of those responsibilities. These responsibilities tend to be concerned with tasks that are common to all stores.
16
The Retail Industry IW
Table 3. Corporate Responsibilities and Functions
Responsibility
Function
Market
• • •
Product
• • •
Operations
• • •
Administration
• • • •
Advertising and promotions Marketing campaigns Site selection Distribution Merchandising Volume purchasing Employee statistics Employee welfare and career planning Training not covered in the store Corporate finances, taxes, and legal matters Fixed asset management: buildings, equipment, insurance Payroll Translating mission, objectives, and strategy into company initiatives and measurement systems
A particular business unit may be responsible for several functions. More than one business unit may share responsibility for a given task. For example, the merchandising business unit in Figure 2 on page 15 is primarily responsible for managing marketing and product range. It must create marketing campaigns based on trends in consumption, the fashion of the day, or competitor activity. A campaign requires the involvement of staff from the operations business unit who create the advertising copy, inform the stores of price changes and promotion terms and conditions, and ensure that stock levels are adequate at stores affected by the campaign. Finally, operations must coordinate with the distribution business unit to ensure timely delivery of campaign items. This marketing campaign scenario illustrates the process-oriented relationships among business units. Figure 2 on page 15 depicts these relationships with arrows. Given the sheer number of relationships among business units, it is easier to represent the relationships by connecting each unit to its level of business processing (for example, corporate or store level). The scenario also implies a high degree of information sharing. It implies that the merchandising unit has access to the trend information held by corporate that was derived from sales information held by stores. The store and corporate network is the vehicle by which data and information flow through the enterprise to support information sharing.
Chapter 2. Retail Industry Perspective
17
Roles can span organization structures
2.3.1 The Store Environment POS systems bring in 85% of the data
The store depicted in Figure 5 on page 22 represents a standard configuration found in all other stores of the network. The POS system is the main source of data within the retail enterprise. Approximately 85% of all data in the company is derived, in some form, from POS data. POS systems use a scanner to read in the universal product code (UPC) from the merchandise, support dial-in credit card purchase verification, and handle EFT transactions for purchases made with a bank debit card. POS machines are connected to the store LAN shown in Figure 5 on page 22. One particular POS machine, the POS controller, performs some intermediate processing of entered data, such as financial summarization by day and week, and functions as a backup server in the event that the in-store processor fails. The in-store processor is the main POS server. It stores the item file used for all price lookups. It also runs software to support operations activities in the store, as follows: Labor management
Labor management includes scheduling of shifts, breaks and workload assignment.
Inventory management
Inventory management includes setting of order points for all items, creating replenishment orders, transmitting orders where electronic data interchange has been enabled, receiving merchandise, and scheduling returns.
Time and attendance
Time and attendance includes clock-in, clock-out, and related functions.
Inventory checks
Inventory checks uses a hand-held terminal (HHT) (see Figure 4 on page 21) to scan and transmit current quantities from the shelf or storeroom area to the in-store processor and compare them with expected quantities.
Software turns today ′ s data into tomorrow ′ s reports
Some of these software packages (for example, inventory management) execute automatically at the end of the shopping day. They generate reports for store personnel to read and file the next day and transmit data to the large server to be used for further processing and reports. The start of in-store processing is controlled remotely by large server operations staff. In the event of an in-store processor problem, some or all stages of in-store processing can be completed at the host by uploading data already processed and raw POS data.
The store LAN is a communications vehicle
The store LAN configuration does more than support POS communications: it enables communication between programmable workstations (PWSs) in the store and in-store processor and large server programs. These PWSs contain spreadsheet and word processing software. Communications software is also installed on each PWS, enabling sessions to be established with the large server. A variety of software is available on the large server system. Common spreadsheet files and backlevel copies of the item file from the in-store processor are stored on the LAN file server. Some of these spreadsheets are downloaded and maintained by the corporate large server.
18
The Retail Industry IW
EFT is a complex transaction that involves many external organizations. It delivers convenience to customers, improved cash flow to the store, and documentation of individual client preferences and buying patterns for business analysts. Figure 3 shows the communications required between the store and the many outside agencies involved in an EFT transaction. These communications support the following major processes in an EFT transaction: • • • • •
Credit and debit card validation Withdrawal of funds from client account Crediting of funds to retailer account Imprinting transaction information (franking) of POS documents Validation of third-party checks.
Figure 3. EFT Transaction in the Retail Enterprise
Chapter 2. Retail Industry Perspective
19
EFT is a vehicle for data creation and movement
The in-store processor serves multiple purposes
In summary, the in-store processor produces the reports, forms, and orders required to support the store′s immediate business function. It is also the primary file server for POS systems in the store network. Large server processing adds similar information objects and data stores to the large server environment. The only difference is that large server information supports the additional business responsibilities of corporate staff. Some of this host information is sent back to the store when it reflects regional trends of interest to store managers for strategic decision making.
2.4 Corporate Environment Information flows in two directions
The large server provides additional processing of transmitted store data to support tasks that are common to all stores. As part of that processing, some data is filtered back to the in-store processor (for example, price changes), from which it may be passed on to the POS machines. Additional reports such as upcoming promotion announcements, names of customers having written bad checks, and interstore transfer requests are also sent to the store for processing and printing by the in-store processor. Value of Summary Data Summary data helps control the business while detailed information helps management analyze how to improve it.
Workstations have common and specialized software
Workstations on the corporate LAN also contain the same basic decision support applications as the stores for support of spreadsheets, word processing, and communications. In addition, some staff have specialized software to support their tasks. For example, the operations business unit has software that enables them to design advertising copy layout. Software supporting graphics creation (for example, presentation charts and graphs) is maintained on the corporate LAN.
2.5 Retail Enterprise Network Figure 4 on page 21 shows a common configuration in retail department store networks. A variety of communications protocols moves data within the store network and to other participants in the stores business: corporate, financial institutes, distributors, and suppliers.
20
The Retail Industry IW
Figure 4. Generic Department Store Network
We limit the scope of this study to three distinct levels of data processing, as follows: • • •
Retail application processing In-store processing Large server processing.
The network shown in Figure 5 on page 22 indicates where some of the personnel roles execute their data processing tasks.
Chapter 2. Retail Industry Perspective
21
Figure 5. Retail Enterprise Network
This chapter discusses the general placement and flow of information in this network environment. For a more detailed description of specific data flows and the products used, see 3.3, “Origin of Business Requirements” on page 42.
2.5.1 Target Business Units for Solution The Merchandising and Operations business units were selected to pilot test the recipients of the first phase of the retail solution thread. Selection was based on interviews with key corporate executives and managers from each business unit. Data from the interviews was incorporated into a business modeling tool. The tool recommends business systems with a certain data usage profile. This profile was then mapped against the capabilities of systems currently in existence. Selection of target business units was then based on the following considerations, in highest to lowest priority sequence: • • • • •
Corporate versus business unit urgency Absence of data processing function in current business systems Volume of data required Nature of data analysis demanded by the solution Sharing potential of Information Warehouse-generated data among business units.
The driving business urgency for both business units was support of the reports described in section 4.1.2, “New Scenarios for Profit” on page 49.
22
The Retail Industry IW
The objectives of the pilot test and the strategies established to meet those objectives are as follows: Objective
Strategy
Attract and retain customers Focus on costs Improve business processes Use technology creatively Reduce cycle times
Reengineer business processes through new uses of data.
We assume that both business units lack business systems capable of economically meeting these objectives.
2.5.2 Data Processing Functions In this section, we expand the overview of data processing at the retail enterprise. Specifically, we examine personnel roles in the Merchandising and Operations business units and position them with respect to the network and business systems. We also go into more detail on what happens to data as it is handled by the in-store processor and the large server.
2.5.2.1 Merchandising and Operations Roles In general, corporate personnel make better use of their LAN systems than do store personnel. This is as much a function of job responsibilities as any other factor. Corporate personnel are closer to information systems department assistance and generally have more time allocated to Information Technology tools usage. The analysis they perform and the decisions and recommendations they make are tied to the data they access through Information Technology tools. Most of the data of interest to corporate personnel is stored on the large server where it is initially accumulated from the stores. Traditionally, reports against this data are generated using server-based software. Analysts in the Merchandising and Operations units typically use Personal System/2 desktop computers. Access to server applications, such as the report generator, is through terminal emulation products, with the data flows passing over the token-ring LAN. The interface is one that can be easily displayed on a nonprogrammable terminal. The more user-friendly graphical user interface (GUI) is deployed on the spreadsheet, word processor, graphics, and publishing software on the LAN file server or personal workstations.
Marketing Business Unit: One of the more common positions in the marketing business unit is that of the buyer, the position responsible for product lines. We focus on a buyer responsible for purchasing blouses. The retail enterprise carries 200 different blouse lines. Weekly summary reports are run on sales across all outlets. One section of this report summarizes sales by product type. The buyer can select options from the software program to create a hardcopy version of blouse product type sales. This report shows many variations of aggregated information related to blouses: •
Week-ending totals for sales by product line code by region
Chapter 2. Retail Industry Perspective
23
A brief description accompanies each product line code; for example: Brand X: Rayon, Polo style • • • •
Week-ending totals Week-ending totals 12 weeks The same 12-week The top 10 product
for sales by product line code by store number for sales by product line code by region for the last summary but by store number lines by store number and summarized by region.
Also included with the key information on each product line is quantity sold, average price, number of returns, and amount left in stock. The buyer can obtain more descriptive detail on a particular blouse in the manufacturer′ s catalog. The buyer′s most pressing informational need for detailed sales information at a store level is unfulfilled. Some information about daily totals is available from a server database used by the Distribution and Operations business units. This information is used to monitor on-hand quantities and, hence, is more amenable to inventory management than management information systems functions. Management information systems, or decision support, analysis requires additional massaging of the data to determine detailed information to answer questions such as: What color is selling well? Some of the problems that arise in answering this question include: •
Decoding Color is indicated by a code for which a lookup table is scanned to establish a text description of the color. This lookup and translation requirement is addressed in the Information Warehouse architecture as data reconciliation.
•
Lack of ad hoc access The traditional file and database systems used in the store environment are awkward, at best, for ad hoc access. Informational analysis of this operationally oriented data would require data processing expertise for application programming and a business analyst′s knowledge to navigate through the required tables and produce valid results from the correct source data and the correct analysis.
•
Busy analyst Traditional approaches to informational analysis depended on an information specialists. The information specialists were responsible for acquiring operational data and transforming it into informational data through extraction and enhancement programs. They were then responsible for making the informational data available to business analysts such as the buyer. This approach was embodied in the information center. The analysts are a limited resource and are not always available because of normal business cycles and the peak demands they create.
24
The Retail Industry IW
•
Lack of information The sales volume of a particular item must take returns into account. The net success of the item is the difference between the volume sold and the volume returned. The buyer must go to other sources to get daily details on why certain blouses were returned and who returned them.
Trend information beyond the 12 week summary report is difficult to build. The knowledge worker could manually transfer data from various sources into a single spreadsheet, but the time constraint makes this approach undesirable. Major merchandising decisions, however, occasionally make such exercises mandatory. When the buyers try to identify the product lines to reorder, cancel, or target for marketing campaigns, they normally have as little as one or two weeks′ warning. Therefore, they make decisions using aggregated information that is readily available. As the deadline approaches, the possibility of using the detailed information to verify any assumptions vanishes, and the buyer must often go with instinctive choices.
Analysts must make decisions quickly
The buyer would prefer to make decisions based on dynamically defined summarization and aggregation reports. Each report defined may in turn lead to other summarization and aggregation reports. The retail enterprise requires specialized software, hardware and the access to detailed data to support effective decision making.
Analysts need special software to analyze data
Operations Business Unit:
One of the more common positions in the operations business unit is an inventory specialist. An inventory specialist′ s main responsibility is to maintain cost-effective levels of store merchandise by working in concert with department management at individual stores and the distribution unit. A large part of an inventory specialist′ s workload is the coordination of marketing campaigns and of major shifts in the merchandise mix at particular stores. An inventory specialist typically has direct responsibility for some number of stores and coordinates with other inventory specialists on national campaigns. Most of the informational data is gleaned using standard reports from the IMS inventory database. Occasionally, when the need arises, additional queries can be created. However, each of these requirements implies a distinct coding effort to build the extract, reconciliation, and aggregation logic to deliver the information. The cost associated with this effort is an inhibitor; business analysts must go through a justification process to get this information. Analysts with experience in creating queries perform more and more work for the marketing business unit. This results in a shift of work within the work group.
Chapter 2. Retail Industry Perspective
25
2.5.2.2 Retail Enterprise Data Flows Data flows documented in this section take place at various points in the retail enterprise network configuration seen in Figure 5 on page 22. We examine the movements and processing of data that occur at the three levels of the network: POS network, in-store processor, and the large server.
Figure 6. Retail Enterprise Network. Network elements and operating systems
Although data is present in many different forms at different network levels, many of the applications executing at these levels use a consistent model of the data, depicted in Figure 7 on page 27. This model is important to the definition of informational data within the Information Warehouse.
26
The Retail Industry IW
Figure 7. Business Modeling of Key Retail Entities
POS Network Data Activity: Data is either read or changed in two key data objects: the Item file (for a representation of the Item file structure used in the General Store Application (GSA), see Figure 8) and the transaction log, which records data related to a purchase as a single record in a flat file. Relational technology is gaining acceptance as a database for these two objects.
Figure 8. GSA′s Item File Format
Chapter 2. Retail Industry Perspective
27
4690 POS terminals scan the item file for prices and register purchases in the transaction log throughout the day, under the control of GSA. The instore processor contains the two files; the 4690 POS devices access them using file redirection services provided by the 4690 operating system. Figure 9 outlines what services are used in enabling this file redirection. Both the 4690 operating system and GSA provide safeguards to ensure that transaction data is not lost in the event of a failure.
Figure 9. File Redirection Services
The key point in the 4690 POS operational environment is the efficient storage of data in a minimal number of files on the in-store processor. Data from 4690 POS devices passes through 4690 controllers. These devices can be configured to provide reports on cashier till balances by the day or week.
In-Store Processor Data Activity: enterprise runs software tasks, including labor and management and product ution thread, because the solving the solution thread
The in-store processor at the retail packages that assist store personnel with their inventory management and product audit. Labor audit are not mentioned further in the retail soldata they produce is not immediately relevant to scenarios.
The inventory management software supports the Order and Shipment entities, as follows:
28
The Retail Industry IW
Entity
Definition
Order
The software matches current inventory against predetermined order points for each product and generates orders. These orders are transmitted to the large server as part of end-of-day in-store processing. These orders may be grouped by item supplier. However, store or corporate personnel may decide to source particular items from one of their warehouse depots or from a different store.
Shipment
This entity is used to account for received merchandise through adjustments applied to overall item inventory at both the corporate and store levels. Therefore, all shipments since the last processing period must also be transmitted to the large server.
In addition to these two entities, information from both the current item file and the transaction log must be sent to the large server. Because both the store and corporate staff are interested in changes to these two files from the previous processing period, some programs have been written to extract changed information. Some of these programs run on the in-store processor and create reports for store personnel.
Large Server Data Activity: Most of the processing that occurs at night on the large server is vital to the enterprise′s survival. Programs that help maintain cash flow management perform the following functions: • • • • • • •
Transmit checks and debit card information to the bank clearing house for nightly reconciliation Process information from bank clearing houses such as payments from customers using a department store credit card Transmit credit card purchase information to credit agencies Transmit coupon data to coupon agencies to obtain reimbursement Process department store credit card transactions to keep the individual statements for customers up to date Transmit payments to suppliers based on processed invoices Transmit purchase orders to suppliers.
Most of these programs require data from the POS transaction logs originating in many stores. The large amount of data that must be transmitted for just one day′s processing precludes sending the entire transaction log and item file. Extract programs run on the in-store processor send a subset of both files to the large server. These large server processes require additional processing of both files, as follows: Time stamping
The data may typically be stored in IMS and VSAM files. An index is used to preserve the uniqueness and integrity of data. A time stamp is a key attribute of the index.
Reconciliation
Interstore transfers, done under the control of personnel from both stores, must be checked against the inventory of both stores (their respective item files). Also, transactions must be checked to ensure that they are not being double processed; that is, a transaction transmitted for the current processing period should not have been processed in a previous processing period.
Chapter 2. Retail Industry Perspective
29
Figure 7 on page 27 includes data entities that correspond to the product and invoice business entities. These entities and their processing require creation of additional tables, as follows: Product
Invoice
Move data to the knowledge worker
To simplify the sales reporting process, many items may be categorized under a single product name. For example, a shirt with long sleeves of a given brand and style may have color and size variations, all of which are grouped under one product name and number. This table is used to keep track of purchases made by customers using a department store credit card. Data in this table is used during month-end processing to generate and mail a statement to each customer. Customers pay this statement at their bank or by mail. Invoice is not used to track invoices from suppliers.
The information systems department recognizes specific difficulties of ad hoc access to IMS data. It decided to add some additional processing at the end of the batch window to extract the day′s records to a sequential file. This sequential file serves as input to spreadsheet programs. The extracts are replicated to the corporate LAN file server where a process is initiated to import the records into a new spreadsheet file. Users select the various spreadsheets based on a file naming convention. Periodically, older spreadsheets must be purged to avoid filling up the server′s disk space. Knowledge workers wanting data from the purged files must return to the IMS source. A selected subset of the spreadsheet files is also transmitted to some store LANs using replicator services of the LAN management software.
2.6 Key Systems POS and large server converged in the retail industry
The retail industry information systems evolved from the bottom up and the top down. That is, the bottom of the technology network has POS registers. POS registers are descendents of mechanical cash registers, which have been a fundamental part of the retail industry since their inception. The top of the technology network in the retail industry contains large servers, which were used for traditional administrative functions. Large server processing was performed independent of evolving POS systems. Because of this dichotomous evolution, the key systems in the retail industry are discussed in terms of store and corporate systems.
2.6.1 Store Systems Data originates in the POS system
The POS system is the key system in the store environment. The POS system originally managed the traditional functions of product management: recording what was sold, when, and at what price, along with other aspects of direct selling. It is a transactional system, and the results of its activities are simple transaction records. The retail industry at large needed to analyze this information for strategic planning purposes. Therefore, the transaction data was uploaded to the large server for reconciliation and analysis. It should be noted that recent technology and product developments make it possible for the POS systems to perform some of the functions traditionally performed only on the large server.
30
The Retail Industry IW
2.6.2 Corporate Systems The key corporate systems include the following: • • •
Sales analysis Merchandise analysis Inventory management.
Input data for key systems at the corporate level is generated at the store level. The retail enterprise′s infrastructure must provide a vehicle for replicating data from the POS system to the large server running key system software. The replication requirement has both historical and technological implications. The historical implication recognizes the POS machines as specialized; they were manufactured for the sole purpose of recording merchandise sales. At the time, there was no vision of a client-server function, so that POS systems did not even have a communications capability. The economies of scale and the state of (micro) processor technology dictated that these systems perform a minimum function. Client-server technology introduced the communications capability to the POS system to ease movement of data to the large server. Large server processing of POS data is easier with electronic data replication than with manual copying of data.
History has limited the POS role
Advances in microprocessor technology and costs have positioned the POS systems to perform some of the functions reserved for the large server. The 4690 supports the 4690, DOS, and OS/2 operating systems. The result is that the 4690 hardware can run informational applications executable on the DOS and OS/2 operating systems in the store. This flexibility makes the 4690 a platform alternative for some applications in a large retail enterprise and makes it an all-in-one system for small retail enterprises.
Technology advances have widened the POS role
Data volume remains a technology issue. A large retail enterprise will always have data volumes that surpass the capacity of store-level systems. Volumes associated with historical data in even the small retail enterprise may exceed that capacity. For these reasons, the large server will continue to have a role in support of key systems. That role is particularly significant for historical data and trend analysis against historical data or any very large data volume environment. S/390 Parallel Query Server is a solution to this data volume issue. Sales analysis incorporates reconciliation of data for the purposes of sales tracking and history. The primary objective is the analysis of sales data in different dimensions. A typical use of sales data is support of sales campaigns, wherein the marketing organization defines a specific set of products to be emphasized in a given time period. The emphasis can come in the form of discounts, rebates, or highlighting in a catalog. Analysis of sales data helps the business understand the success of the promotion.
Chapter 2. Retail Industry Perspective
31
Sales analysis shows past success and future strategy
Merchandise analysis tells what ′ s gone out the door
Merchandise analysis focuses on the sale and physical movement of product, the volume of product sold, and the price at which the product sold. It uses as input the raw data representing what was sold and feeds analyses that “hypothesize” on why product was sold. It is historical in nature and the fundamental source of data for trend analysis. Retail enterprises benefit from understanding how products sell over a long period of time as well as in short-term campaigns; the range of historical trend analysis is limited only by the data available.
Inventory management is tactical and strategic in scope
Inventory management functions include distribution, warehouse management, stocking levels, reorder points, and quantities to order. This key system clearly sits in a grey area between the operational systems needed to run the day-to-day business and the informational systems that support long-term strategic decisions. The function may be performed at the store level or at the corporate level for chain-wide inventory decisions. This function is an example of the distribution of a traditional retail industry function across the platforms in the enterprise.
32
The Retail Industry IW
Chapter 3. Retail Industry Business Requirements In this chapter, we use business requirements to describe the needs for data processing solutions to the challenges in the retail industry. These challenges derive from the trends, directions, and pressures of the industry. We assume that the pressures will persist over a relatively long period of time and that they are general in nature, rather than specific to an immediate narrow-focused need. Applications or application systems have been the usual response to specific, line-of-business requirements.
An architecture helps solve strategic problems
In this study, we stipulate that the solution to a strategic pressure must be executed in an architected context. This context assumes that multiple applications will be needed over the lifetime of the business pressure. The architecture in this context presents a standard approach for all applications developed to meet different aspects of the business pressures. The architectures we use are the Retail Application Architecture (RAA) to address industry-specific requirements of the retail enterprise, and the Information Warehouse architecture to address the cross-industry requirements to manage informational data and applications. These architectures are used as a guide for developing or acquiring software solutions to the respective requirements.
Business pressures are strategic, as are architectures
3.1 Value of Information Retail enterprises have a wealth of operational data that documents their business past. Hidden in these large volumes of data are trends and cycles of consumer buying habits. The resource requirements to uncover these facets of the retail enterprise′s business are a challenge to the information systems department. The retail environment demands that the enterprise use this data to compete. Trend analysis—the complex analysis of large volumes of data—is a key to understanding fad-oriented customers and tuning the product supply process to meet their demands.
Chapter 3. Retail Industry Business Requirements
33
Trend analysis is key to competing
Trend analysis yields precision, discover, and business improvement
The successful implementation of trend analysis systems, often involving large detail-oriented relational databases and powerful complex query processing capability, yields value in three areas: • • •
Precision Discovery Business process reengineering.
Precision and discovery result from the analysis of large volumes of data. They help knowledge workers better understand their business and make better strategic decisions for the business. This analysis feeds the reengineering of retail business processes. This, in turn, leads to new operational data for the cycle of data analysis, process evaluation and change, and further data analysis.
3.1.1 Precision Trend analysis supports intuitive decision making
The operational segment of the business performs its day-to-day activities using transaction-based data. In contrast, the strategic knowledge-based segment of the organization operates on a more intuitive basis. The Information Warehouse architecture and decision support strategies define ways for supporting this intuitive approach, using informational rather than operational data. Large informational data volumes and high-performance query tools bring a higher quality of business precision to the intuitive decision making process.
Trend analysis proves out intuition
Target marketing in the mail-order catalog business offers a good example of precision in the retail industry. We assume that the retail enterprise wants to introduce a new “boutique” catalog of tailored fashions for women. Logic tells management that women who are interested in this more expensive line of products represent less than 20% of their customer base. Lacking the knowledge of who those customers are, the company would send the catalog to their entire mailing list. With some summary knowledge from a modest decision support database, the retail enterprise can refine the mailing to customers interested in specific price range products. A more robust database might identify customers who have previously mail-ordered specific items, improving the likelihood of success even further. This target marketing minimizes the expense of sending the catalog to unqualified customers. In this example, that could be up to 80% of the printing and mailing costs of distributing the catalog to the entire customer base.
Trend analysis can help the manufacture and supply steps
We now assume that the same precision mechanism is available to the designers of the fashions and the company′s inventory buyers. They could use the information to assist them in determining most-favored colors and patterns, demographic tendencies, and seasonal habits. All of these factors can influence both the fashions offered as well as the catalog pictorial layout. The net result is customizing the appeal to a very focused market niche. The value of this precision is increased order performance within the micromarket. The more pleasing the catalog is to the ladies′ tastes and habits, the more likely they are to buy from the catalog and recommend the catalog to others.
34
The Retail Industry IW
Finally, the business unit executive could use the information for evaluating the potential of the strategy before it is executed. This is clearly better than relying on gut feel, staff intuition, or consumer-research sampling. The precision of knowing the actuals, in terms of realistic performance expectations, aids the executive′s management effectiveness throughout the subsequent campaign.
Trend analysis helps predict future success
These retail industry examples of precision value depend on informational data derived from operational detail and historical data. These data assets have been sitting idle in a variety of inaccessible storage media. Data replication products take this source data and replicate it to a platform which supports large data volumes and complex query—for example, S/390 Parallel Query Server. The data asset is now available for use by the business analysts to improve business processes. The preceding examples illustrate how the S/390 Parallel Query Server can bring new value to the business by improving bottom-line margins. The S/390 Parallel Query Server solution also contributes soft-dollar results such as happier customers and more effective leadership. This all leads to a more proactive business philosophy and operation.
Data replication delivers data for the benefit of the decision maker
3.1.2 Discovery Some business events are unplanned and unexpected; often, management is unaware that the events occurred. Such business phenomena can have a positive or negative impact on the business. In all cases, business analysts desire earlier awareness of their occurrence. Earlier awareness of any business event gives management the opportunity to maximize the benefit from the positive event and minimize or eliminate the impact of the negative.
Analysis of large data volumes reveal the unpredicted
An airline cash-flow phenomenon is a good example of the use of data to discover events. The airline industry accounts for cash when a reservation is booked and paid for by credit card or equivalent means. It recognizes revenue once the reservation converts to a boarding pass and the customer actually takes the flights. In our example, we assume that the cash-flow balance starts to increase beyond the traditional ratio of cash-to-revenue patterns of previous quarters or years. There appears to be about 10% more cash than what would normally be expected.
Discovery helps even in cash-flow management
At first, management looks at this phenomenon with a surprised, but indifferent attitude. As the 10% ratio increase grows to 20% and then 30%, management becomes curious about the business phenomenon that is contradicting their assumptions of how their business works. At this point, the business analysts are only mildly concerned, the answer is not conveniently at hand, and the interest is not at the level to motivate investigation. When the ratio exceeds 40%, the situation is elevated to an all-out audit in order to determine the cause.
Business analysts change their view based on the information
Chapter 3. Retail Industry Business Requirements
35
Knowledge workers go from curiosity to denial to confusion
The first inclination might be to suspect some sort of error in the financial system. An expensive and disruptive examination of all cash accounting procedures and programs is carried out. However, the audit shows no irregularities in the financial system. The knowledge workers′ interest level is now elevated as they sense the change in their business operations may be caused by change in customer behavior.
Detailed data helps knowledge workers understand hidden trends
Now the situation escalates to a management-level crisis. A passenger service agent detects an interesting new pattern by analyzing a large detaillevel database that tracks all frequent flyer travel activities and itineraries. The agent discovers that more customers are booking their reservations further in advance. Drilling down deeper into the detail, the analyst determines that, on average, frequent flyers are now making reservations 7 to 10 days sooner than they had in previous years. The reason behind this trend is still a mystery.
Drill down of detailed data reveals the secret
Further exploration reveals that the increased ratio is related to a new program run by the marketing organization. Several months before the cashflow picture changed, the marketing organization had announced a new free upgrade policy for frequent flyers. In response, customers were booking their itineraries further in advance of their flight, hoping to secure better positions in a space available list. This was a passenger action pattern that marketing had not anticipated.
Awareness of cause supports action
The growing awareness of the upgrade program and subsequent advanced reservation actions cause a swelling of cash on hand. Once executive management understands the cause of this new business circumstance, it feels comfortable capitalizing on it. In this case, management could invest the extra cash in higher-yield financial instruments. However, without the assurance of the reason for the cash increase, it would undoubtedly be much more reserved. If the reason for the increase in cash-to-revenue ratio were different—for example, an increase in double-booking in anticipation of labor unrest, the airline′s actions would be different.
Awareness must be timely to be valuable
The example illustrates the value of gaining an understanding of the unknown. Obviously, the sooner the discovery can be made, the sooner profit-oriented actions can be taken. The accounting department can rally the cash and the marketing department can encourage even more advanced customer reservations—all for improved business gain.
Discovery leads to precision
Discovery often leads to precision; that is, an S/390 Parallel Query Server user might discover a new customer need and then propose a new product or service with targeted promotional materials. For example, retailers with POS transaction capture can quickly spot new regional consumer habits and fads and then target appropriate promotions. A telephone company can discover new calling patterns and offer packaged services to take advantage of them.
36
The Retail Industry IW
3.1.3 Business Process Reengineering Historically, computer automation has focused on the start-to-finish functions of the business process rather than the individual steps in the process. A business process may encompass multiple activities, each of which has been automated independently. One of the principal goals of business process reengineering is to rethink the process and apply appropriate information technologies in a cross-functional manner. Data and quick information flow become key integration factors.
Use data to change the way business is done
Enterprises intent on business process reengineering are designing architectures as a foundation for the effort. Large Information Warehouse implementations containing the operational and informational detail of their core business form the centerpiece of the system. This resource is surrounded by a network that services a variety of client processors. The combination of MVS, DB2, and S/390 Parallel Query Server provides these emerging datacentered architectures with invaluable support.
Architecture improves the likelihood of success
The reengineering effort of a retail enterprise is an excellent example. At the center of the system is a DBMS with terabyte volumes of item-movement detail. The system is fed each night with the POS transaction files from all stores in the enterprise, with two to three months of history maintained. Every item sold in every store, along with time of day, price, and references to other items in the shopping basket, is recorded.
Item detail is the starting point
Initially, this information offers the value of precision to planners and merchandise managers. Yet, by also allowing their suppliers to access the information, executives are redefining the manner in which their business is conducted. With access to this detailed item-movement database, consumergoods companies can determine the regional inventory levels and offer automatic resupply, eliminating the old paper-based and burdensome functions of order-writing and order-filling.
Merchandise managers benefit first
The consumer goods supplier now has an accurate view of the movement of particular products sold in quantity, price, and location terms. The retailers can then adjust their payment agreement with the supplier, accordingly. Rather than processing brand or order level invoices, the retailer and supplier can quickly determine the volume of the supplier′s goods sold during a time period, multiply the totals by the wholesale prices, and calculate appropriate payment. This reduces massive and costly accounting functions, including invoicing and Accounts Payable, for both parties.
Information access changes supplier and retailer processes
Both partners of the business equation benefit from reduced overhead and shortened times to complete the transactions, a win-win situation. The supplier can also use the information to leverage greater precision in item manufacturing forecasts, as well as in competitive and promotional analysis. The retailer is relieved financially from inventory management—it is now the supplier′s responsibility—and can therefore apply greater focus to shelf activity and customer service, another win-win situation.
Both parties benefit
Chapter 3. Retail Industry Business Requirements
37
The Information Warehouse enables process improvement
The strategic enabler for this new way of doing business is a large, detailoriented relational database. Broad value in terms of improved efficiencies and effectiveness is observed throughout the selling chain. Recognizing the value of their information and data resource, visionary business leaders are using information today to change the way they do business and the way they manage. This includes making strategic decisions about their marketplace, pricing, quality programs, and resources. These decisions are dependent on the availability of timely and accurate data. The ability to access information and act on it quickly will become more and more critical to an enterprise′s success in the 1990s.
3.2 Solution Thread Requirements Requirements drive technology solutions
The following business requirements are the major drivers for selecting technology used in the solution thread: •
View of data everywhere (by store and corporate personnel)
•
Access to data anywhere (by corporate personnel) The access must be easy for business analysts who may not be comfortable with data processing systems. The solution must support ad hoc query.
•
Access to data on the in-store processor by store personnel
•
Need to access both summary and detail data
•
Must keep four years of trend data
•
Rapid dissemination of the products of knowledge worker analysis.
Each business requirement is explained in more detail in the sections that follow. This linkage will be reinforced as each feature of the solution thread is introduced by associating the original, summarized business requirement with the proposed feature. The requirements are listed in priority sequence with respect to their impact on profitability. The priority also relates to the degree of cost justification anticipated from implementing each requirement at the retail enterprise.
Business and data processing requirements apply
The solution chosen to meet the requirements must also address the needs of the information systems department. These needs are more oriented to data processing and operational—in support of retail operations—concerns than the pure business needs identified above. These constraints were as follows: •
Solution must interoperate across vendors.
•
Very large data volumes are involved in current store operations (terabytes (10• bytes) of data).
•
The current batch window is 60% loaded. The batch shift produces the highest processor demand of any shift on the corporate processing system. As a result of this constraint, discrete
38
The Retail Industry IW
(nightly) full refreshes are unacceptable, and any additional workload would have to be offloaded to nonbatch shifts to defer a CPU upgrade. •
Limits on support staff hiring have been imposed across the company. This translates to: −
No store batch operations Remote operation is essential to manage the workload in the store network. Operators in the corporate data processing center need to remotely manage the in-store processor and control the flow of additional data among the corporate host, in-store processors, and POS terminals.
−
Information systems staff is not available for new query development Staff in the business unit requiring the queries must absorb this workload. It may be possible to devote a partial information systems headcount who would function as a helpdesk for development of such queries.
−
Limited information systems resource to define new information extracts It is also expected that business unit staff will be responsible for identifying new sources of data to incorporate into the retail enterprise Information Warehouse implementation. The information systems staff will then make the necessary changes to share the new information and promote its use.
The requirements for the solution thread are discussed below.
3.2.1 View of Data Everywhere A view of data does not necessarily translate into accessing that data. This is the rationale for separating the awareness of data′s existence from the accessing of data. Accordingly, the knowledge worker works with specific attributes of informational data, as follows: Business term
Terms follow the business terminology adopted for the retail business. IBM′s Retail Application Architecture is a source for building a business model of the enterprise, for the terms used in the retail business, and for those terms added by refinement of the RAA model.
Business description Presents more detail about the business term. For example, a business term of Transaction_Type might have the following business description: “Identifies the type of POS transaction. Possible values are cash purchase, credit card purchase, till adjustment, credit card refund, cash refund, and audit totals.” Type of information
The object is either a column, table, query, report, image, or subject area.
Chapter 3. Retail Industry Business Requirements
39
Information awareness and information access are separate issues
Last Refresh Date
Documents the data′ s currency, when it was last updated.
Where Used
Identifies the queries, reports, tables, or other objects in which the data of interest is used.
Source
Identifies the location—for example, a corporate informational database, or an in-store processor at location X—from which the object originates.
Derivation
Describes the enhancements—for example, aggregation, subsetting, and calculation of new values—applied to the data. The data could be an unmodified copy of the original operational data.
Steward of the data
Identifies a person to be contacted to gain access to the data.
Query/report/chart run time Relates specifically to queries on detailed data where it would help to know how long a particular query, report, or other “query object” would take to run.
Just being aware of informational data has value
Note that the decision to allow a query to be started from such a view of enterprise information is a matter of what can be feasibly implemented and supported using current technology. There is still a large benefit to the retail enterprise in being able to know that an information item exists even if it cannot be accessed using conventional means. More discussion of this issue can be found in Chapter 8, “Information Catalog” on page 101.
3.2.2 Access to Data Everywhere Knowledge workers demand information, regardless of location
Knowledge workers would like to access certain information regardless of where it is stored in the enterprise. The current sources of this data may be stored on different platforms and in different formats. Knowledge administrators must make a decision about how and where the new information is to be located. The location, however, need not be an inhibitor to the knowledge workers′ accessing of the information.
Data location depends on resource demand and capacity
Performance is a major criterion in the data location decision. This criterion is influenced by the data volume; the detailed data—real-time data—source for trend analysis might involve gigabytes or terabytes of operational data. The information must first be isolated from access by mission-critical transactions supporting the day-to-day business. That is, there should be no direct access to point-of-sale data.
Data replication depends on data volume, refresh, and access frequency
In a corporate-centric retail enterprise, it is advantageous to reuse existing network and systems capability to transform operational data into informational data on the large server. Centrally created informational data could be copied to regional locations (store networks), depending on the volume of the informational data created, its expected refresh activity, and the anticipated frequency of access.
40
The Retail Industry IW
3.2.2.1 Ease of Use Details of data access on different platforms should be masked by the data access tool. Merging of data from different platforms should also not be a complex task for the knowledge worker. The lowest level of detail required of the knowledge worker should be SQL syntax. When possible, even this level of detail should be hidden using easy-to-use tools.
Minimize the technology knowledge needed
3.2.2.2 Ad Hoc Query Capability Knowledge workers should be able to take an existing query, identified through the view of all enterprise information, and modify it to suit their analysis needs. They should then be able to execute the new query. Finally, there should be some mechanism for storing useful queries and allowing these queries to be incorporated into the enterprise view of information. Storing the queries is also desirable for follow-on analytical results of the initial query (for example, reports, charts, and graphs).
Manage the Ad Hoc Query environment
3.2.3 Access to Local In-Store Processor Data Store personnel concerned with improving daily operations can use ad hoc query of in-store processor data sources to do so. They rarely need to access enterprisewide information because that level of information tends to have strategic value, rather than the tactical value they require. For those cases in which enterprise-wide information is of interest, store personnel require specialized software tools to access it. The tools supplying their view of all enterprise information should support the use of existing or creation of new queries.
Consider awareness and access of local AND enterprise data
3.2.4 Access Summary and Detail Data Summary data is derived or aggregated from detailed informational data, usually by time period (for example, day, month, or quarter) for trend analysis. Detailed data includes data from individual point-of-sale transactions and text descriptions of the products sold. This version of data is called reconciled data in the Information Warehouse architecture. Easy access to these two categories of informational data in a single query increases the value of the data.
Make derived and reconciled data available
Reconciled data can be transformed from existing operational data with a variety of software. The software can be purchased and used as is, purchased with exits used to perform reconciliation, or custom built. The reconciled data often is the source for deriving the summary data. This summarized version of information is called derived data in the Information Warehouse architecture.
Reconciled data is the source for derived data
Chapter 3. Retail Industry Business Requirements
41
3.2.5 Four Years of Trend Data Establish a guideline for historical data kept
Four years is used in the solution thread as the minimum history period for derived data to ensure statistically meaningful trend analysis. The same guideline for reconciled data has not yet been determined. Reconciled data is therefore being kept for a two-year period; a decision about its value will be made after evaluating its frequency of access and its perceived usefulness by the knowledge worker community.
3.2.6 Dissemination of Analysis Results Rapid dissemination of reports is mandatory
The dissemination of informational analysis results is the next key issue in the overall process of information utilization. This dissemination must be rapid to take advantage of the dynamic nature of decision support itself. For example, a decision to run a local campaign must be transmitted from the corporate merchandising business unit using the quickest communications vehicle available, usually electronic mail (E-mail). The campaign blueprint would need to be accompanied by documentation of database changes necessary for the item involved in the campaign. Changes to the promotional item in the database at the store level may also be needed if the campaign involves special discounts based on purchase volume, product brand, or other item attribute. This aspect of the solution thread is not specifically part of the Information Warehouse architecture but must be taken into account when integrating the Information Warehouse solution into the retail cycle.
3.3 Origin of Business Requirements The following summary of “Challenges” on page 14: • • • • •
•
urgent
business
needs
is
taken
from
2.2,
Reduce the cost of doing business Attract and retain customers Improve business processes Reduce cycle times Increase profitability from within by: − Leveraging external resources − Focusing on costs − Emphasizing creative uses of technology − Business process reengineering Improve merchandise flow management through: − Quick response: reacting to trends and getting merchandise to the floor quickly − Electronic data interchange − Effective inventory management − Profit contribution analysis: focus on profit not overall revenue.
The Merchandising and Operations business unit has been targeted for an Information Warehouse solution (see section 2.5.1, “Target Business Units for Solution” on page 22) to focus on the following business needs: • •
42
Attract and retain customers Improve business processes
The Retail Industry IW
•
Improve merchandise flow management through the implementation of a quick response merchandising system.
An Information Warehouse solution thread could meet many of the business needs listed at the beginning of this section. We chose to focus on three of those requirements for several reasons. The value of an Information Warehouse implementation is the ability to take raw operational data and present it to the knowledge worker in finished form. This suggests the need for an end-to-end methodology, and the requirements chosen are simple enough to present such a methodology.
End-to-end methodology is essential for data access
The retail industry is in a saturated and overdeveloped market state. Retailer capacity clearly surpasses the customer demand, so the customer drives the business. A variety of strong retail channels exist and are very strong today, including catalog and sales club channels. For these reasons, competing effectively for customers is an urgent need.
Focus on the more urgent requirements
Business processes will inherently improve as knowledge workers use information from the Information Warehouse implementation in new and creative ways. We also show what activities may need to be added to the Information Warehouse solution to develop business processes that cover the entire retail cycle from supply to sales.
Business process will improve across the board
The need to attract and retain customers focuses on uses of the information derived from credit card and EFT transactions. The need to implement a quick response system focuses on the analysis of customer preferences in a more general way. As an additional function under the solution of this need, we show how Information Warehouse technology improves product line decisions. We apply Information Warehouse technology to a fundamental problem in the retail industry: managing the volume of raw data that flows in through the POS outlets and using it for competitive advantage. Information Warehouse technology is well suited to this task but provides enough generality to be applicable in most environments.
3.3.1 Inhibitors to Business Growth The traditional approach to applying data processing technology to business requirements is becoming less and less adequate. Typically, requirements for data and function were identified and used to construct applications that met broad business needs. In this way, technology is being used only to increase the retail enterprise′s operational efficiency. New market pressures such as the following are forcing us to reexamine this approach: • • •
Faster market analysis and reaction Rapidly changing criteria for POS data as information The increased volume of valuable data.
Chapter 3. Retail Industry Business Requirements
43
A new approach to applying DP technology is needed
3.3.1.1 Faster Market Analysis and Reaction Speed means profit
Competitors who are fast to reach the marketplace with new concept merchandise that meets consumer demand realize larger profit margins than those who follow. While decreasing product cycle time requires speeding up processes at many different points in the retail cycle, analysis of trends and consumer buying patterns has historically been one of the slower processes.
3.3.1.2 Rapidly changing criteria for POS data as information Trend analysis drives the gathering and aggregation of data
Trend analysis has shown its merit in increasing profit. This has driven a need for gathering more POS data and aggregating that data in unlimited ways—as information—for informational analysis. Different aggregations are defined by the different fields over which the aggregation is applied (for example, item number, color, or size) and to which level each is aggregated (for example, department, store, region, or country).
Static application design and development is unacceptable
Static applications were designed around static data models, clearly deficient for an informational environment. Keeping these applications synchronized with the new POS file structures requires extensive program maintenance. In cases where the cost of such maintenance is perceived to outweigh the benefits, the maintenance is not performed or the additional POS data collection is not done. Maintenance already consumes up to 80% of most retail information systems department resource, so additional demands on this resource are difficult to service.
Make the information delivery system dynamic
The preferred approach is to develop an information delivery system that can produce different aggregations of the POS data quickly. This approach focuses on the aggregation process. The objective is to make the process as dynamic and reusable as possible. To the extent possible, the system is a dynamic function that takes the aggregation specifications as an input parameter and generates the aggregation without requiring recoding or recompiling. The key is a detailed informational data layer that can service known and unknown aggregation specifications.
3.3.1.3 Increased Volume of Valuable Data There is no limit to information or information analysis
The volume of valuable data resident in the data processing organization and available from outside sources is increasing dramatically. A vast amount of market information is being generated by the use of credit cards and bank debit cards to pay for purchases. Knowledge workers see no limit to the variety of ways to slice the information, the only limit being the knowledge workers′ imagination. The retail enterprise needs to integrate information from public databases and electronic newsclipping services. This latter category offers information that is filtered and categorized from worldwide sources. The impact of the volume of data is exacerbated by the speed with which data enters the retail environment.
44
The Retail Industry IW
These three inhibitors to growth place a stronger emphasis on delivering information to the desktop than on filtering it through the lens of a business application. The data delivery approach reinforces the evolving role of the information systems department as a provider of information. The information systems department must deliver the information in a flexible manner, in dynamically changing forms to meet the changing demands and analyses of the knowledge workers.
The key is being flexible in delivering information
Ad hoc analysis is available in a limited form. It depends on the information systems department for providing specific extracts of the source data. Additional assistance may be required to create the IMS reports or spreadsheet. The existing maintenance workload to maintain the operational applications and other essential services creates a conflict for the information systems department. The information systems department is forced to create a data processing environment that can support ad hoc analysis without being inhibited by this resource crunch. Without many of the time constraints imposed by application development, knowledge workers can spend more time on validating analysis results and on creating new analytical approaches.
Ad hoc analysis depends on information systems services
3.3.2 Qualifying an Information Warehouse Approach This section positions the Information Warehouse framework in context with the larger view of the enterprise′s information technology. The positioning can be used to apply the Information Warehouse framework concepts across industry sectors. Existing retail information systems meet established business requirements (for example, payroll, inventory, and accounts receivable). These systems generate the operational data that is the source of informational data used in informational analysis. Many enterprises apply workstation-based decision support tools to extracted POS data to create spreadsheets, reports, and various graphical presentation style objects (for example, piecharts and line graphs).
Operational data feeds informational analysis
In focusing on the value of an Information Warehouse implementation to your business, one of the first tasks is to evaluate the perceived effectiveness of current decision support. The evaluation should be directed principally at the users of those systems, with an eye toward soliciting their input on where the systems could be improved. The evaluation should also consider how much time is spent gathering and providing the information and how much time is spent utilizing it through informational applications. In some cases, the tools for analysis may be good, but the underlying data is suspect or too sparse. Survey information of this nature would then become a part of the overall requirements list.
Assess current decision support value
Chapter 3. Retail Industry Business Requirements
45
Take a strategic approach
The requirements are demanding and diverse. The methodologies for providing dynamically changing information and informational analysis can be application systems developed iteratively to extract, enhance, and load information into informational data stores. This approach, based on traditional application development, is clearly unsatisfactory for even this relatively small set of requirements, let alone the requirements of the enterprise at large. The alternative is to adopt an Information Warehouse strategy. This strategy identifies common functions (for example, load) and tools that can perform the function across the scenarios. The strategy also limits the work involved in implementing the information delivery process to minor customization of a common tool for a particular requirement.
46
The Retail Industry IW
Chapter 4. The Retail Solution Thread The retail solution thread focuses on a retail enterprise—a large department chain in the retail industry—modeled on generic characteristics of the retail industry. The examples are designed to be valid for the industry in general, so that the concepts can be applied to other retail market sectors. Smaller enterprises can apply subsets of the concepts as they fit an organization within the large retail enterprise.
The large enterprise is a model for all retail enterprises
The key objective of the solution thread is to present business concepts that are applicable to segments throughout the retail industry. This objective is consistent with, and makes use of, RAA constructs, which are customized to the individual retail enterprise. This customization is normally performed by systems or data analysts within the enterprise, or by outside consultants. RAA furnishes the data entities required to populate an Information Warehouse implementation. The same entities can be used as a template for reengineering business processes. Reengineering of business processes, however, is beyond the scope of this solution thread. The department chain network presented is based on common elements found in many department store chains today.
RAA appeals across the retail industry
The retail industry solution thread addresses a subset of the issues that pertain to clients and markets. We have chosen representative examples (see 2.1, “Retail Industry Trends” on page 14) that reflect broad industry trends. This study begins with business requirements that are relevant to the retail industry in general. It then proceeds, in progressive steps, to show the relationship of these business requirements to an actual Information Warehouse implementation. Figure 10 on page 48 shows a progressive approach to applying information systems to retail industry requirements. Business analysts use the top level of the model as a communications vehicle. The top level provides business terms for data or process entities easily understood by the analyst and directly translatable to business things and events. This top level is directly mapped to a middle layer, which is more detailed than the top layer and may actually be multiple layers, but it is not as detailed as the bottom layer. The bottom layer is understood by information systems staff just as the top layer is understood by business analysts and executives. The middle layer, then, is the translation between the business-speak of the top layer and the information systems-speak of the bottom layer. Against this background of creating a business system, we discuss the role of the DataGuide/2* and S/390 Parallel Query Server products.
Chapter 4. The Retail Solution Thread
47
The RAA model is a communications vehicle
Of equal importance to understanding the roles of these new products is a discussion of issues found in all phases of creating a retail Information Warehouse implementation.
Figure 10. Retail Industry Study
4.1.1 Information Warehouse Framework in Retail Dynamic business means dynamic analysis and dynamic information
Analysis software capability must be as dynamic as current events, the economic climate, fashion trends, and the competition are. That is, the data processing industry has recognized the need for decision support capability to support dynamic analysis. The same dynamics apply to the information presented to analysts, information that is derived from POS operational data. Accordingly, the process of generating information from operational data must be dynamic, necessitating a methodology that provides this capability. The ultimate goal is using data for competitive advantage.
Information has value beyond the enterprise
The information managed in the Information Warehouse implementation is not necessarily for the sole use of the owning enterprise. Just as knowledge workers in an enterprise want to make use of information from bulletin boards or electronic data services, it is likely that the enterprise will make its information available to outsiders. Outside suppliers tapping into the same Information Warehouse implementation as the owning company increase the value of the information and the implementation. The existence of the data begins to drive business process reengineering on its own.
48
The Retail Industry IW
These factors (data volumes, dynamic analysis, changing data gathering) dictate a departure from traditional application development to use POS data effectively. The new approach is an architecture-driven strategy based on the Information Warehouse architecture. This strategy economically and efficiently supports dynamically changing demands for information and analyses of information.
Information Warehouse framework is the right approach
4.1.2 New Scenarios for Profit The industry trends discussed in 2.1, “Retail Industry Trends” on page 14 lead to a focus on improving the retail enterprise′s relationship with its customers. To effect this improvement, we need information about the buying habits of shoppers. One source of such information are the transaction records created by credit card purchases and EFT. Shoppers′ buying habits can be analyzed by correlation with their credit card or checking account. This analysis enables both the chain and individual stores to channel marketing activities selectively and more efficiently.
The trend: focus on the customer
In the remainder of this section, we outline analysis scenarios whose enablement would cost-justify an Information Warehouse implementation.
Better information analysis justifies the cost
The following scenarios are numbered for reference in the discussion that follows and later chapters: 1.
Differential marketing by preference group
2.
Quick market response and follow-up adjustments
3.
Sales analysis for promotion negotiations
4.
Buying on key product attributes.
These scenarios result in reports that address business requirements which in turn help the enterprise address the trends in the retail industry.
4.1.2.1 Scenario 1: Differential Marketing by Preference Group The retail enterprise wants stronger loyalty among its clientele. One approach is to use differential marketing campaigns based on customer preferences. Market research has shown that customers are more receptive to mailings personalized to their tastes. Rather than blanketing a particular geographic area with mailings, selected customers found through analysis of credit card and EFT transactions are divided into preference groups. Each preference group would then receive coupons and special promotional literature. The groups would vary in terms of the promotions and coupons they receive. Afterwards, sales to these particular customers would be analyzed to determine the effectiveness of the campaign and chart new campaigns. Figure 11 on page 50 shows an example of a targeted marketing campaign.
Chapter 4. The Retail Solution Thread
49
Figure 11. Selective Marketing Campaign
4.1.2.2 Scenario 2: Quick Market Response and Follow-on Adjustments Enterprise analysts and managers need to analyze the product mix in the different stores, comparing private labels for a certain product type such as women′s hats. Products and labels that do not sell must be quickly identified. Drops in sales for certain products can be noticed earlier and corrective measures taken. Possible measures might include a sales campaign with special discounts or cutbacks on reorders. Such actions resulting from the analysis of information can cut warehousing costs. The same analysts also need to be aware of developing trends in sales of certain products. This trend observance might be related to activity at competitive stores or current events. Noting trends and correlating them with causal events must lead to quick supplier orders and distribution to retail outlets. Subsequent analysis must also prove the reliability of the observed trend so that corrective measures can be taken (for example, increase in supplier order points, volume purchases, returns to suppliers, discounts, and other special promotions).
50
The Retail Industry IW
4.1.2.3 Scenario 3: Sales Analysis for Promotion Negotiations Knowledge gained from the analysis of sales before, during, and after promotions can be used to gain better leverage in negotiating promotional programs with manufacturers.
4.1.2.4 Scenario 4: Buying on Key Product Attributes Buyers of women′s blouses, for example, need analysis of a wide range of data to assist in buying decisions. At a summary level, they need to see which lines are the best sellers and which lines are the poor performers. They also want to know which lines are the high and low performers at a regional level. However, product lines sometimes disappear or are transformed from year to year; the buyers thus need to understand product performance in terms of product attributes. For example, they may want to understand if the good or bad performers for a region have a common attribute (for example, natural cotton material or a particular style). This awareness contributes to better buying decisions for next year′s lines, even if a popular line was superseded by a new line with unknown characteristics. They may also want to look more closely at seasonal issues: was there heavier buying activity for one particular line at a certain time of the year? Was this burst of buying activity out of line with other top performers in the blouse lines? Finally, they might like to see what the blouse looks like to get an overall esthetic feel for a particular line. Perhaps a picture of the blouse has been stored by an image scanner or there is an example in a catalog stored at a particular locale; they would like to know where to look. A common requirement for the blouse buyer′s analysis is the need to look at summary data and then drill down to more detailed data for a particular chronological period. Another requirement is to see information on the blouse product lines that turns out to be inaccessible through normal decision support software. For example, nontextual information such as graphic images of the blouse could be of use to the buyer. The normal decision support software tool might not support presentation of the image, and the enterprise might choose not to invest in the data processing technology to make presentation of that data possible. In this case, the requirement is to know where to find the image data, since online access has been ruled out. The solution is described by the Information Warehouse architecture as the Information Catalog and is implemented by the DataGuide family of products. Scenario 4 is a prime example of the need for dynamic information, a requirement that holds for all of the scenarios. The dynamic nature of informational data needs is characterized by a reiterative process made up of the following steps: 1. 2. 3.
Informational analysis New information requirement Information generation process.
Chapter 4. The Retail Solution Thread
51
Information delivery must be flexible
The underlying data must be capable of supporting many different analyses. Many of the decision support analyses and the information required to perform them cannot be predicted at the time the Information Warehouse implementaion is initially built. Because informational analysis itself tends to generate new requirements for information, the processes that deliver data as information must be flexible.
4.1.3 Requirements Mapping: Summarized→Detailed In this section, we take the business descriptions of the scenarios for profit and express them in detailed terms, suitable for Information Warehouse implementation. The detailed requirements for the scenarios are as follows: •
Scenario 1: Differential marketing by preference group The detailed requirements for Scenario 1 are as follows: −
Access to customer purchase information Must include all purchases by purchase line item for the credit card or debit card used. Also, include payments by check based on checking account number.
−
Flexibility to sort purchases by any combination of attributes
−
Summarization of information at varying levels The solution must support summarization of information by userspecified ranges over various attributes. In addition to summarization by month and year, the solution should allow aggregation by quarter, season, and start and end date. It should also support summarization by store, geographic region, or chain name if the retail enterprise includes multiple chains.
−
Ability to show historical trend for four years Historical data analysis is necessary to identify changes in buying preferences.
−
Facility for sharing interim and final analysis results between Merchandising and Operations units Sharing analyses increases synergy between the two groups.
−
Online access to public domain information Public domain information includes competitive informational databases, news services, and census data for buyer demographics. The enterprise uses information from these sources in analysis of the retail enterprise′s own information.
•
Scenario 2: Quick market response and follow-on adjustments The detailed requirements for Scenario 2 are as follows: −
Need to arbitrarily group items in different ways Grouping may not always follow the static product categorization.
52
The Retail Industry IW
−
Need to look at item stock levels and purchase activity at the same time Item stock levels at both stores and warehouses are needed.
−
Need a summary view of purchase activity A standard summary by product and item will be appropriate most of the time, though analysts occasionally need to adjust summarization criteria. An example of different criteria is examining summaries by arbitrary item groupings.
−
Analysis must be capable of generating required stock level changes across all stores This information is needed for Operations personnel to synchronize quickly with the Distribution unit.
−
Need to perform a great deal of what-if analysis at all levels (both detailed and summary information)
−
Need to perform financial analysis in different forms The different forms focus on particular items or item groups in a proposed campaign to improve sales.
−
Need to share derived logistics for a sales campaign with store managers
−
Need to understand the origins of the baseline data Of particular interest is knowing whether adjustments have been applied to historical information and the nature of those adjustments.
•
Scenario 3: Sales analysis for promotion negotiations The detailed requirements for Scenario 3 are as follows: −
Must be able to analyze sales quickly by arbitrarily selected time periods
−
Need to eliminate seasonal purchases background for a product or item from the purchases that can be directly attributed to the sales campaign
−
Must be able to share sales analysis information with store managers
−
Need to perform multiple what-if financial analysis using previous sales as a model Selection of type of financial analysis should be open-ended. In this case, the Operations personnel need to determine promotion conditions that will guarantee profit for the company, the manufacturer, and the supplier.
−
Capability to relate analysis results to follow-on actions Examples of actions might include orders to suppliers, returns to suppliers, redistribution of goods between store and warehouse, price changes made to store item files, and finally, addition of terms and conditions to store promotion databases.
Chapter 4. The Retail Solution Thread
53
•
Scenario 4: Buying on key product attributes The detailed requirements for Scenario 4 are as follows: −
Exception reporting Analysts need to have exception reporting by product lines or other attributes. This could be implemented as highlighting extraordinary results in existing reports or generating separate reports for those results.
−
Summarization at user-specified levels For example, the buyers would like to see the detailed sales by size, color, and style for a given region, once they have identified that region as their most profitable one. They are interested in the six hottest selling product lines in March and April over the past three years.
−
Location of nontextual information This might be as simple as knowing that the manufacturer′s catalog for a certain blouse style is in the local catalog library or the general catalog library. It may be as complicated as giving the execution instructions for a graphical program to view the scanned image of the blouse at the buyer′s workstation.
−
Reporting on product line status Analysts need to know which product lines have been withdrawn, replaced, obsoleted, or undergone a name or description change.
−
Access to four years of information Analysts need access to at least four years of historical information to get a statistically accurate perspective of trends.
−
Access to item and purchase information in a single report Would like to join various attributes from each information source; for example, item, purchase transactions, and summary information.
54
The Retail Industry IW
Part 3. The Technology View Chapter 5. Retail Industry Architecture
. . . . . . . . . . . . . . . . . . . . . .
5.1 Retail Application Architecture . . . . . . 5.2 The Store Logical Data Model . . . . . . 5.2.1 The IBM In-Store Processing Strategy 5.2.2 The Model . . . . . . . . . . . . . . . . 5.2.3 SLDM Benefits . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Chapter 6. Information Warehouse Framework
. . . . . . . . . . . . . . . . .
6.1 Value of the Information Warehouse Framework 6.2 Why Data Replication . . . . . . . . . . . . . . . . 6.2.1 Operational Systems . . . . . . . . . . . . . . 6.2.2 Database Technology . . . . . . . . . . . . . . 6.2.3 Cost of Data Access . . . . . . . . . . . . . . . 6.2.4 Historical Data . . . . . . . . . . . . . . . . . . 6.2.5 Ownership . . . . . . . . . . . . . . . . . . . . 6.2.6 Point-in-Time Data . . . . . . . . . . . . . . . . 6.2.7 Reconciliation . . . . . . . . . . . . . . . . . . 6.3 The Information Warehouse Architecture . . . . 6.4 Using the Information Warehouse Architecture . 6.5 Access Enablers . . . . . . . . . . . . . . . . . . . 6.5.1 Embedded SQL . . . . . . . . . . . . . . . . . . 6.5.2 SQL Call Level Interface . . . . . . . . . . . . 6.5.3 Distributed Relational Database Architecture 6.6 The Retail Industry . . . . . . . . . . . . . . . . . . 6.7 Information Catalog . . . . . . . . . . . . . . . . . 6.7.1 Function . . . . . . . . . . . . . . . . . . . . . . 6.7.2 Interfaces . . . . . . . . . . . . . . . . . . . . . 6.8 Information Warehouse Architecture Products . . . . . . . . . . . . . . 6.8.1 The DataGuide Family 6.8.2 S/390 Parallel Query Server . . . . . . . . . . 6.8.3 DataPropagator Relational . . . . . . . . . . . 6.8.4 Personal AS/2 . . . . . . . . . . . . . . . . . . 6.9 Why Use the Information Warehouse Architecture
. . . . . . . . . . . . . . . . .
Chapter 7. Organization Asset Data
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . .
7.1 The Solution . . . . . . . . . . . . . . . . 7.2 S/390 Parallel Query Server . . . . . . 7.2.1 Software Configuration . . . . . . . 7.2.2 Information Maintenance . . . . . . 7.2.3 Retail Enterprise Operations . . . . 7.3 Technical Issues . . . . . . . . . . . . . 7.3.1 Types of Parallelism . . . . . . . . . 7.3.2 S/390 Parallel Query Server Design 7.3.3 Query Splitting . . . . . . . . . . . . 7.3.4 Front-End MVS System . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Part 3. The Technology View
57 57 63 64 65 66 67 68 69 69 70 70 71 71 71 71 71 73 74 76 76 77 77 80 80 80 81 81 82 83 83 85 87 90 92 94 94 94 95 95 97 98 98
55
Chapter 8. Information Catalog
. . . . . . . . . . . . . . . . . . . . . . . . . .
8.1 Information Catalog Function . . 8.2 DataGuide . . . . . . . . . . . . . 8.2.1 Basic Structure . . . . . . . . 8.2.2 Knowledge Worker Functions 8.2.3 Search . . . . . . . . . . . . . 8.2.4 Launch Applications . . . . . 8.2.5 Create Collections . . . . . . 8.2.6 Display Contact Information 8.2.7 View Current News . . . . . 8.2.8 View Glossary . . . . . . . . 8.2.9 Administrator Functions . . 8.2.10 Extending DataGuide/2 . . 8.2.11 Meta-data Management . . 8.2.12 DataGuide Data Model . . 8.2.13 Interfaces . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . .
Chapter 9. Conclusions
56
The Retail Industry IW
. . . . . . . . . . . . . . . . . . . . . . . . . .
101 102 103 105 106 107 113 113 114 114 115 115 116 120 121 123
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
127
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Chapter 5. Retail Industry Architecture In Chapter 3, “Retail Industry Business Requirements” on page 33, we presented a justification for architectures to address strategic industry pressures. In this chapter, we investigate the Retail Application Architecture (RAA) as an example of such an architecture. We also discuss the Store Logical Data Model as an underlying model to the architecture. The model serves as a basis for managing the people, places, things, resources, and activities in the retail industry business that need to be administered by data processing systems. The overall goal of the architecture is to meet the pressures of the industry and minimize the cost of the information systems used to meet those pressures.
RAA is a strategic approach to solving strategic problems
It is doubtful whether the retail industry would ever be able to reduce the cost of its information systems unless a broad set of information models and standards are accepted, by organizations and software vendors alike, as a basis for industry applications. IBM has developed the Retail Application Architecture as a common ground for the retail industry. RAA consists of a set of software architecture guidelines, interfaces, methods, and tools for retail applications.
An architecture helps meet business pressures at minimal cost
5.1 Retail Application Architecture A retail enterprise′s organization structure (see Figure 2 on page 15) does not give a complete understanding of the retail business. Rather, it is the normal day-to-day activities that define the company. IBM′s RAA has identified some major activities that are common across many sectors of the retail business. Figure 12 on page 58 shows these business activities. We use constructs from RAA to describe the business activities and common entities used by the retail enterprise. We expect the Information Warehouse solution to address management of the following major business factors: • • •
Marketing Product range Customers.
RAA′s data model is very useful to the Information Warehouse solution planner. The model gives the planner the confidence to proceed with the first phase of the Information Warehouse implementation without being concerned about major rework for follow-on phases. Specifically, the planner targets the Merchandising and Operations business units as an initial phase. In the second phase, the planners could target the distribution and human
Chapter 5. Retail Industry Architecture
57
RAA identifies common business processes
resources units. The units in the second phase would introduce new business data entities. The RAA model, which is unified across the retail enterprise, contributes to an easier integration of new entities without reengineering the Information Warehouse implementation data entities defined during the initial effort.
Use RAA for reengineering business processes
The RAA can serve as a guide for reengineering business processes. An enterprise can modify its business processes based on informational analysis facilitated by the Information Warehouse implementation. The Information Warehouse implementation is not a solution to one specific line-of-business problem. Rather, it is a generic data processing infrastructure that can be applied across lines of business. The Information Warehouse implementation takes advantage of the definitions in RAA to support the analysis of business operations. The analysis leads to reengineering of the business processes that generated the informational data underlying that analysis. The reengineering generates new informational data for Information Warehouse analysis and the cycle of analysis, reengineering, and informational data generation repeats. Adopting the RAA data model today in conjunction with the Information Warehouse infrastructure positions the enterprise for ongoing process improvement.
Figure 12. RAA Business Activities
The tasks of the RAA business activities, shown in Figure 13 on page 62 are as follows:
58
The Retail Industry IW
Manage marketing
This process covers marketing analysis and campaign formulation.
Manage product range
This process covers the selection of range of products to be carried, inventory control, buying and contracting, ordering and pricing.
Manage customers
Addresses all aspects of building a long-term relationship with customers, for example, creating an interest in the products that the company offers by direct mail and managing credit available to customers.
Manage sales
Covers all aspects of selling goods and services to the customer, from assisting the customer in finding the goods sought to taking payment.
Manage corporate goals and plans General management; long- and medium-term planning, reporting, analysis, organization, and strategy of the enterprise. Defines instructions that become input to other processes, for example, setting objectives. Receives performance reports and resource requirements from other processes. Handle products
All aspects of handling goods and preparing them for sale, from initial receipt to time of sale.
Manage personnel
Human resource management including benefits, career, training, and personnel statistics.
Manage finance and legal matters All aspects of financial, accounting, tax, and legal matters, including physical handling of cash. Manage fixed assets
Procurement and maintenance of assets owned or used by the enterprise, such as buildings, equipment, and insurance.
P. Stecher (see Building Business and Application Systems with the Retail Application Architecture ) groups entities used by each major business activities into a construct called a resource . Resources in the RAA model are special kinds of entities that are “consumed by the enterprise in the fulfillment of its objectives.” The major retail resources are: • • • • • •
Market Product Client Personnel Finance and legal matters Fixed assets.
Chapter 5. Retail Industry Architecture
59
The retail cycle has a Supply and a Sales phase
The retail cycle is divided into the supply and sales phases. The supply phase consists of selection of vendors and product range, pricing, vendor negotiations, ordering products, and receiving and distributing products. The retail supply phase is covered in RAA by the Manage Product Range and Handle Products business processes. The sales phase is covered in RAA by the Manage Sales and Manage Customers business processes. Manage Marketing relates to both phases. In the course of drawing on expertise in the retail industry to create RAA, it was found that there was a definite order of importance of resources: 1. 2. 3.
Product Client Market.
The Market resource includes suppliers from whom a retailer buys and clients to whom they sell. In the context of the Market resource, clients are treated as an amorphous whole; in the Client resource, you are dealing with a known person.
RAA helps us understand our business
The scenarios outlined in Chapter 4, “The Retail Solution Thread” on page 47 were derived primarily from business requirements driven by retail industry trends. We map those scenarios to RAA so that we can understand the business entities, activities, and processes involved. The scenarios use a subset of the business entities in the enterprise, so this is a focusing exercise with respect to the RAA model. They view the enterprise with respect to business activities rather than business units to emphasize what happens rather than where it happens. Finally, the scenarios identify the overlap between the targeted business processes and the entities required to populate our informational system—the Information Warehouse implementation. Table 4 on page 61 shows the relationship between the scenarios, business activities, and resources used.
60
The Retail Industry IW
Table 4. Relationship between Scenarios and RAA Constructs
Scen.
1
Scenario Description Differential marketing by preference group
RAA Business Activity • •
Quick market response and follow-on adjustments 2
•
•
RAA Resource Used
Manage customers Manage marketing
•
Manage product range Manage marketing
•
•
•
Sales analysis for promotion negotiations 3
•
•
Manage product range Manage marketing
•
•
Buying on key product attributes 4
•
Manage product range
•
•
Client Product
Market (both client and supplier subsets) Product Market (both client and supplier subsets) Product Market (client subset only) Product
The scenarios are directed at understanding the business systems and Information Warehouse resources that are involved in populating the informational system. The Information Warehouse architecture provides a long-term plan for defining the implementation. The Information Catalog facilitates enduser awareness of Information Warehouse resources. Its workflow management component manages the complexity of business processes as they are implemented in data processing activities. Figure 13 on page 62 reinforces this concept relationship. The top level of the diagram depicts business systems defined in business terminology; RAA provides the definitions. The bottom of the diagram depicts the business systems in data processing terminology. The Information catalog bridges these two domains by maintaining a relationship between business terms—meta-data—and their data processing equivalents.
Chapter 5. Retail Industry Architecture
61
Figure 13. The Information Warehouse Architecture and Information Processing Architecture
RAA defines business systems in business terms
RAA defines business systems in business terms. It therefore serves as a useful discussion vehicle for managers to understand the business they are in. Eventually, a business system documented in business terminology must undergo translation to a more technical language that will define how the underlying data processing applications will be built. The Information Warehouse architecture provides some key mapping of business terms to technology terms in an open approach that is oriented to addressing key Information Warehouse challenges.
62
The Retail Industry IW
5.2 The Store Logical Data Model Today′s in-store processing environments are complex, wherein POS data is stored in both keyed files and sequential files on an IBM 4690. Employee and customer data is stored in a relational database on an in-store processor such as DB2/6000* on the IBM RISC System/6000. Applications such as inventory management reside on a Personal System/2 running DOS with data stored in ASCII flat files. One problem with today′s in-store processing environment is data duplication. In the example, a store may have multiple applications that require files containing similar data. For example, there may be a POS application from source A and an Electronic Shelf Label application from source B. Each requires an item price file. This creates a problem when the item price data must be modified. Updates to two different files are required to change the price of a single item. Executive decision making is affected if the files are not synchronized. Decisions may be made on incorrect, out-of-date data. Ideally, in this situation, a single database in the store would be shared by the different applications. This eliminates the need to maintain the same data in different places and formats. Executives are thus assured that they are making decisions based on up-to-date, accurate data. Adding new applications that require additional data can impact existing applications. This is generally true when applications depend on a particular file structure and a new application needs to modify that structure. This problem can be resolved if application designers are given the ability to add new data fields to a common data format without impacting applications that do not utilize the new data field. Another problem inherent in the in-store processing environment is the lack of consistent business rules used across applications. Default reports generated by the existing systems ignore this inconsistency and produce incorrect information. In many cases, the applications must be modified to meet the requirements of the organization as well as to provide additional data. Knowledge workers need a more flexible way of analyzing the data without impacting the applications that gather it. Data storage and access is also a problem for some applications. Generally, applications use physical file access methods to store and retrieve data. This presents a problem when differences exist between operating systems under which the applications execute. For example, the 4680 operating system has a keyed file access method that is not available on some other operating systems. This problem could be reduced if a standard method for storing and accessing data were available. The Store Logical Data Model (SLDM) is a collection of common data objects that support retail applications in a store environment. You can use the SLDM to create a customized database design that will be tailored to the unique data needs of the store information processing environment. The SLDM supports in-store processing applications by helping to: •
Identify the data objects needed by in-store processing applications
•
Customize the data types
•
Incorporate information that is unique to your applications.
Chapter 5. Retail Industry Architecture
63
These three steps result in a physical database definition tailored for the environment.
5.2.1 The IBM In-Store Processing Strategy Figure 14 illustrates the IBM In-Store Processing strategy for application development. SLDM, represented in the diagram, is a tool used to define and identify common data definitions based on a set of using applications. The resulting database implementation will enable sharing of this common data. To integrate a new application into this environment, the external data is compared to the existing entities in SLDM and the application external interface is modified to use the common data. The sections that follow provide more information on SLDM.
Figure 14. In-Store Processing Application Development Strategy
64
The Retail Industry IW
5.2.2 The Model SLDM is a collection of common data objects useful in developing store applications. Disparate applications can share data through the use of these data objects in the SLDM. SLDM defines real world objects, the characteristics of those objects, and the relationships between the objects. The SLDM allows application designers to select those objects relevant to application development in their unique store environment. Through use of SLDM and the KnowledgeWare Application Development Workbench** (ADW) tool, a unique database definition is generated based on specific customer-defined application data requirements. In the past, attempts have been made to create a single physical database definition for use in application development in the retail environment. This approach provided little flexibility in the development of future applications. Also, a physical database design does not take into account the data requirements necessary for two different enterprises. For example, one enterprise may use a social security number to uniquely identify each employee, while another may use a corporate-defined serial number. In addition, any changes necessary to develop new applications require all previously developed applications to change. These problems are solved with the SLDM. Additional applications are added by identifying new information requirements and incorporating them into the existing definition through the use of SLDM and the ADW tool. Figure 15 on page 66 illustrates the relationship between data modeling and database design.
Chapter 5. Retail Industry Architecture
65
Figure 15. SLDM Model and Database
In addition, SLDM is used as an implementation methodology by IBM and IBM business partners to assist with the development of a suite of in-store applications that share data. SLDM can be used in conjunction with, or independent of, RAA.
5.2.3 SLDM Benefits The implementation of the SLDM provides a means of consolidating data. Duplicate and inconsistent data is eliminated. As a result, executive decisions may be made based on consistent data. It provides a common starting point and communications mechanism for present as well as future application development. It helps retail business professionals and information systems professionals increase their understanding of each others′ needs. This leads to a more productive application development process and helps ensure that the applications developed by the information systems professionals meet the needs of the business professionals who require data to make executive decisions. The SLDM provides a stable development environment, within which applications share consistent, compatible data and may be reused across market segments and heterogeneous hardware platforms. The SLDM provides input into a user-defined database design. Data is customized and tuned to meet the needs of your business both now and in the future. IBM and its business partners are committed to developing applications based on SLDM along with the RAA enterprise model.
66
The Retail Industry IW
Chapter 6. Information Warehouse Framework
Enterprises have long recognized the opportunities that would be available if they could make better use of their data. Data is typically stored in many locations, in different formats, and managed by products from many different vendors. It is usually difficult to access and use the data across locations and vendor products. The Information Warehouse framework is a solution to this long-standing problem. IBM and other vendors are working to make access to data across vendor products and geographic locations easier by enabling their products to work together. The products and rules by which the products can work together constitute a framework. The Information Warehouse framework is designed to provide open access to data across vendor products and hardware platforms.
Chapter 6. Information Warehouse Framework
67
The framework: better use of enterprise data
IBM and vendor solutions fit into the framework The framework enables access to all enterprise data End users spend more time using data, less time finding it
IBM and its business partners have been delivering database, data access, and decision support products which fit into the framework and make it possible for our customers to build effective integrated Information Warehouse solutions.
Enterprises benefit from the framework because they can increase the value of the investment that they have in current databases and files. Enterprises can get to the data that they need to effectively manage their businesses. The Information Warehouse framework of products includes decision support products. These applications can be used by end users to analyze and report business data from many parts of the enterprise. The Information Warehouse framework of products gives the end user access to data from other workstations, LANs, and host databases. The end users spend less time gathering and accessing the data, and more time in analysis and reporting of data.
6.1 Value of the Information Warehouse Framework The Information Warehouse framework is a comprehensive solution having value beyond the simple collection of software products. The following characteristics differentiate the Information Warehouse framework from other approaches: •
Published architecture Products that implement the published Information Warehouse architecture can work together with consistent user interfaces for easier operation.
•
Cross-platform coverage Database products reside on multiple platforms, and access is enabled to database products from a variety of software vendors on platforms from a variety of hardware vendors.
•
Architected connectivity Distributed relational database connectivity is architecture-based. has products which support this architecture today.
•
IBM
Vendor support The Information Warehouse framework has the support of other vendors in the industry. These vendors are working with IBM to bring a wide variety of software solutions to market.
•
Architected systems management DataHub* is a cornerstone of the Information Warehouse framework. It is an architected solution for systems management, with the intent to cooperate with systems management products from other vendors.
68
The Retail Industry IW
•
Integrated database tools Tools for database design work with tools for application development, saving customers time in application modeling and development.
We can address the data delivery requirement to generate informational data from operational data by either iteratively writing applications to extract, enhance, and load information, or by using the architecture-based Information Warehouse approach.
6.2 Why Data Replication There are two approaches to accessing data for informational analysis: accessing the operational data directly and accessing a replication of the data created by extraction, enhancement, and loading into the informational database. The consensus favors accessing informational copies of operational data rather than direct access of operational data. Some of the considerations leading to this preference are as follows: • • • • • • •
Operational systems Database technology Cost of data access Historical data Ownership Point-in-time data Reconciliation.
6.2.1 Operational Systems Operational systems manage the day-to-day business activities. As such, they are critical to the ongoing viability of the enterprise. These systems often perform at the limit of the hardware and software with which they are implemented. They are often a key part of customer service, a major factor in the success of an individual retail enterprise in the very competitive retail industry. Therefore, the ongoing operation of the operational systems is the highest priority. The reasons for using copy-based informational systems with respect to protecting existing operational applications fall primarily into the areas of data accountability and application performance.
Operational systems operate at technology ′ s limit
Data accountability includes security and audit considerations. Operational systems are designed from the beginning with a specific user community in mind. The data created and manipulated by those applications are carefully managed with respect to who has access to the data. The management is accomplished through operational policies and security function in the software. It includes allowing access of specific sets of data by specific users or by certain user group identifiers and associating specific users with those group identifiers. Specifying every combination of user and data is a considerable effort, so the group identifier approach is more practical. However, this may result in defining group identifiers for broad groups of users with diverse profiles to access operational data. At the very least, allowing informational access by knowledge workers would be an incremental burden on the security administrator. It could possibly be a major burden on the security software and a complication for the security policies of the enterprise. It
An isolated copy is less complicated to secure
Chapter 6. Information Warehouse Framework
69
is easier to have a copy of the data in an isolated informational environment where the access security can be handled independent of existing operational systems policies.
Data location impacts security
Data placement is also an issue in security. Copying data could increase the security exposure to the enterprise; once the copy is made, it is up to the possessor to manage security. However, controlled copying is more secure than allowing broad groups of users with diverse access profiles to access operational data. At the very least, fresh copies of data are controlled.
Performance of operational systems must be protected
Decision support activity is difficult to predict, but certain characteristics of this class of applications are well understood. Decision support queries tend to access large volumes of data; they tend to apply complicated, longrunning manipulations against that data; and they may retain claims on that data for extended periods of human think time. The queries, then, can be expected to interfere with transaction systems because of extensive locking, heavy I/O activity, and high demands on CPU and buffer pool resources. Informational analysis against copies of the operational data rather than the operational data itself prevents this interference.
6.2.2 Database Technology Database software and hardware technology favors copies
Mainframe database technology supports a wide range of operational function in terms of concurrent access and data transfer rate. The Information Warehouse architecture is platform-independent and is compatible with LAN-based operational databases and application strategies. The flexibility of the mainframe platform makes it an ideal location for data copies. The mainframe platform can support a wide range of data volume and user populations. It also leverages data by being a central location for data to be accessed across work groups or lines of business.
The LAN platform has value for certain informational uses
The LAN lags behind the mainframe in I/O interface technology and ability to process data in memory. Recoverability, availability, and security functions on the mainframe tend to have fuller function. This has historical reasons as well as reasons based on the sheer volume of data inherently found on the larger systems. There are, however, many situations where specific subsets of data can be processed effectively in the LAN environment.
6.2.3 Cost of Data Access Bring the data copy to the knowledge worker
Communication costs are reduced and response time improved if a copy is created in one or more locations. The general rule is to bring copies of data as close to the knowledge worker as possible. The store structure in retail is an example of this strategy. The network topology includes LAN-based branches and central host databases. Frequently accessed data is accessed in a most cost-effective way on a LAN server rather than running queries on the host. Factors that influence placement of data copies on the LAN or the large server include the cost of data processing (host versus LAN server) and of sending the answer set from the host (LAN communication cost is lower).
70
The Retail Industry IW
6.2.4 Historical Data Operational systems usually do not allow for historical data analysis, yet this is a major concern to business analysts.
6.2.5 Ownership Legacy system history along with security and availability requirements have placed data remotely from business activity. Copying data allows for data placement closer to knowledge workers responsible for making decisions. Ownership implies identifying an individual who is responsible for the quality and currency of the informational object.
6.2.6 Point-in-Time Data Operational data tends to change over time. A point-in-time picture of information may be necessary for comparisons and understanding trends. Pointin-time information is part of a historical database strategy.
6.2.7 Reconciliation Reconciliation of operational data is impractical as a real-time operation. Staging of data and data replication techniques are necessary to perform the reconciliation required for informational analysis without impacting the operational environment.
6.3 The Information Warehouse Architecture The primary goal of the Information Warehouse architecture is to define a basis giving end users and applications easy access to data. The Information Warehouse architecture (see Figure 16 on page 72) defines a structure, formats, protocols, and interfaces as the basis for implementing Information Warehouse solutions. This architected approach creates an environment wherein solutions are leveraged. The leveraging is realized by reusing individual component solutions across implementations and by integrating offthe-shelf components in those implementations. This is not to say that an Information Warehouse solution cannot be implemented without the architecture. Rather, the architecture contributes to the leveraging of effort and resources.
Chapter 6. Information Warehouse Framework
71
Figure 16. Information Warehouse Architecture
The long-term goal of the Information Warehouse framework is to provide access to data of all types in all stores in any environment, and the architecture is designed to accommodate the goal. The Information Warehouse architecture is open in that the interfaces are published and extensible in that software tools and data volume can be added without regressing the existing implementation. The Information Warehouse architecture defines interfaces, protocols, and formats for accessing information in an Information Warehouse implementation. These interfaces are, as follows, grouped by focus area: • •
•
•
Informational applications − End-user interface to informational applications Information Catalog and its access − Information Catalog API − Import/export interface to the Information Catalog Access to data − Embedded SQL API − Callable SQL API − Distributed Relational Database Architecture Data replication − Interface to the object handler meta-data − Tool invocation to tool interface − Interface to workflow management − Data staging interface.
The interfaces are identified and described in Information Warehouse Architecture I for the public domain. The Information Warehouse architecture
72
The Retail Industry IW
approach, using a component structure with interfaces defined between the components, and its openness regarding system platforms makes it easier for an enterprise to implement an Information Warehouse solution on its own or with the help of software vendors and service providers.
6.4 Using the Information Warehouse Architecture Figure 17 on page 74 shows the three fundamental components of the Information Warehouse framework: the architecture, products, and services. The three components work together to build a foundation of an extensible, flexible, and scalable Information Warehouse implementation. The products and services are requirements, whereas the architecture is a recommended participant in the Information Warehouse framework strategy.
Information Warehouse framework for access to data
The products are considered requirements because the Information Warehouse framework is a software solution. The products component encompasses any software solution to the Information Warehouse framework function requirement, not just purchased software. Off-the-shelf software has its advantages in speed of implementation but costs real money and may require some investment in resources to customize and integrate into the enterprise′ s environment.
Off-the-shelf software for productivity
The services component refers to resources expended by people, be they enterprise knowledge administrators or personnel hired from outside the enterprise. In either case, people must do the work of designing, developing, and implementing software solutions.
Services: implementing the solution
The Information Warehouse architecture is a structured approach for building a solution. Information Warehouse implementations built without the architecture will solve a problem, but they may not be easily extended. The architecture-based, leveraged approach allows for reuse of software for similar functions across lines of business or specific requests. It allows the incremental addition of new function without disruption to the existing implementation. It also allows the growth in usage and data volumes without disruption of the existing implementation. The Information Warehouse architecture-based implementation is the recommended approach, though it is not the only approach.
The architecture for a flexible solution
Chapter 6. Information Warehouse Framework
73
Figure 17. Information Warehouse Framework
6.5 Access Enablers Access enablers connect applications and data
Access enablers is the layer in the Information Warehouse architecture between the application and the data (see Figure 18 on page 75). The data includes meta-data as well as real-time, changed, reconciled, and derived data. In the Information Warehouse Architecture I , the focus is on informational applications using SQL. These applications access relational databases locally with SQL or remotely with SQL and for example, DRDA. The value in using SQL and DRDA is that the application uses the same data access language to access local or remote relational databases. Nonrelational databases can be accessed by using SQL mappers in the implementation of the access enablers layer.
74
The Retail Industry IW
Figure 18. Access Enablers
The four focus areas of Information Warehouse Architecture I limit the discussion of the access enablers to the SQL application program interface and the interface to the Information Catalog. The SQL API is discussed in this section; the interface to the Information Catalog is covered in section 8.2.13, “Interfaces” on page 123. The concepts of enabling products and deploying products are key to understanding the use of the access enabler layer APIs and interfaces: Enabling
An enabling product is a software program that accepts the commands defined in the API or interface and executes their defined function against a resource. DB2 is an enabling product for the SQL API, and the DataGuide products are enabling products for the interface to the Information Catalog.
Deploying
A deploying product is a software program that submits requests for data resource in the form of commands defined in the API or interface. The commands are submitted to the enabling product for execution. Visualizer is a deploying product of the SQL API, and the DataGuide knowledge worker end-user interface is a deploying product for the interface to the Information Catalog.
An informational application could include commands defined in the interface to the Information Catalog and would become a deployer of both the interface to the Information Catalog and the SQL API. The advantage of this approach is that the knowledge worker continues to use the familiar environment of the informational application. The informational application would be enhanced by using business terminology stored in DataGuide.
Chapter 6. Information Warehouse Framework
75
Interfaces are enabled, then deployed
SQL is used to access the informational and operational data categories and the interface to the Information Catalog is used to access meta-data. SQL is an industry standard data access language for performing relational operations, normally against data in a relational database. The access enablers layer also includes the definition of SQL mappers that allow SQL operations to be executed against nonrelational databases. Three key interfaces are included in the Access to Data focus area in Information Warehouse Architecture I and are related to the SQL data access language: • • •
Embedded SQL Callable SQL (commonly referred to as SQL call level interface) Distributed Relational Database Architecture.
The embedded SQL interface is an example of an interface taken from the public domain and is recognized by several standards bodies. The SQL call level interface is not as yet a standard; its prime focus is to enable software vendors to market shrink-wrap informational applications. The Distributed Relational Database Architecture was developed by IBM and is gaining recognition and support from a range of software vendors.
6.5.1 Embedded SQL Embedded SQL refers to commands included in informational applications source code. The informational application must undergo special processing prior to normal compilation. It is this special preprocessing requirement that has fostered the development of the SQL call level interface.
6.5.2 SQL Call Level Interface The SQL call level interface (CLI) is an alternative mechanism for invoking SQL from programs. The objective of CLI is to provide additional language commands (verbs) to extend the function of SQL. The most desired extension to SQL function is support of “shrink-wrap” applications. Software vendors would like to market applications utilizing the SQL but have experienced difficulty with the preprocessing and BIND requirements of embedded SQL. The informational applications are targeted for knowledge workers with little data processing skill. Requiring knowledge workers to go through the precompile, compile, and bind steps would diminish the acceptance of the informational applications by these users. The CLI introduces extensions to the embedded SQL command set which allow run-time precompile and BIND. The informational application can be used out of the box and does not require systems or database administration resource for the SQL portion of the application.
76
The Retail Industry IW
6.5.3 Distributed Relational Database Architecture Distributed Relational Database Architecture (DRDA) is a communications vehicle for issuing SQL statements to a remote relational database and returning the results. The statements are executed against a remote relational database rather than a local database. Though there may be differences in the relational databases, such as the form and content of catalogs, the SQL statements themself normally should not have to be changed to reflect the new location. It is an evolving architecture: DRDA level 2 introduces distributed two-phase commit protocols. IBM has developed and published the DRDA for use by software developers throughout the industry. The overall objective of the access enablers component is to insulate the application from the enterprise data format and location. SQL mappers manage the mapping from SQL in the applications to nonrelational enterprise data when necessary. DRDA allows for specification of the location of the relational enterprise data. That is, an application using SQL can execute against a local database on that programmable workstation′s DB2/2* database. That same application can execute against a relational database on the LAN server running DB2/2 by moving the database and causing the application to connect to DB2/2 on the LAN server. That same application can execute against a relational database on a remote server running any DB2 family member by moving the database, using DRDA and causing the application to connect to the DB2 database on the server.
6.6 The Retail Industry We have seen all of the requirements that, if met, will make the retail enterprise more competitive and responsive in its business environment. This section uses specific Information Warehouse architecture components and concepts to address business requirements in the retail industry. Table 5 presents the logical design points for an Information Warehouse implementation in the retail industry. The rationale for each design choice is not explored in great detail within this book, as our intent is to show the applicability of specific Information Warehouse products. However, adequate time and resources should be allowed for considering these choices, because the purchase of products and services must be cost-justified through association with the original business requirements. Table 5 (Page 1 of 2). Logical Design Points for an Information Warehouse
Business or Information Systems Requirement View of data anywhere (store and corporate)
Logical Design Point
Use a LAN- or large-server-based dictionary product accessed from the workstation.
Chapter 6. Information Warehouse Framework
77
Table 5 (Page 2 of 2). Logical Design Points for an Information Warehouse
Business or Information Systems Requirement
Logical Design Point
Access to data anywhere (corporate only)
Use a decision support product that can operate in the workstation or LAN environment. This product will need to be able to trigger large queries on the host or on the LAN database server transparently.
Access to data locally (store only)
Use PWS-based decision support software that can transfer data between dictionary product and itself.
View both summary and detail
See Information Warehouse Architecture I for configuration possibilities.
Trend data for four years
Use a large volume database server (host).
Interoperable solution
Use the open and extensible Information Warehouse architecture.
Terabyte data volumes
Use specialized hardware and software to maintain acceptable response time.
No full refreshes
Propagate source table updates automatically into informational tables.
Spread batch work across the day
Trickle the batch work, reports, and snapshot copy. Use small data volumes, run throughout the day, as system resources permit.
No store batch
Use workflows designed to run under central control. Also ensure that there is remote operator capability so the work on in-store processors can be controlled by operators on the large server.
No support for new queries or extracts
• • •
•
Use business unit staff to create new queries. Use information systems staff to catalog and propagate query definitions. Use business unit staff, working with information systems staff, to define new information sources. Use information systems staff to justify, implement, and promote the use of new information sources.
The logical design points map to the following approach specifications for our Retail Solution Thread (see Table 6 on page 79). Figure 19 on page 84 shows the new data processing environment created by the addition of an Information Warehouse implementation in the retail enterprise. This diagram is an extension of the diagram in Figure 5 on page 22.
78
The Retail Industry IW
Table 6. Physical Design Points for Information Warehouse Implementation
Business or Information Systems Requirement
Physical Design Point
View of data anywhere (store and corporate)
Use DataGuide.
Access to data anywhere (corporate)
Use Personal AS/2* (PAS) with appropriate access enablers to access LAN databases or S/390 Parallel Query Server on the large server.
Access to data locally (store)
Use Query/6000 to access databases on the AIX in-store processor and Visualizer to access databases on the store LAN database server.
View summary and detail
Use DB2 and expand existing batch work to load the tables required for query work. Create subsets of enhanced information from the DB2 database and use them to refresh databases on the LAN platforms, DB2/2 and DB2/6000.
Trend data for four years
Incorporate S/390 Parallel Query Server into the DB2 for MVS environment.
Terabyte data volumes
Incorporate S/390 Parallel Query Server into the DB2 for MVS environment.
Interoperable solution
Use the open and extensible Information Warehouse architecture.
No full refreshes
Use DataPropagator Relational on DB2 and DB2/2 platforms. Use DataHub/2 as the base for the DataPropagator Relational product and to support occasional full refreshes. Use Fastload** to load data into DB2/6000.
Spread batch work across the day
Because trickle snapshots are medium-sized file transfers with a small CPU consumption profile, some of this work may be moved outside the offshift batch window.
No store batch
Maintain the workflow logic centrally and provide mechanisms and procedures that allow the In-Store Processor workload to be moved to host as a backup contingency.
No support for new extract
Establish dictionary maintenance and data movement procedures that define roles and responsibilities between business unit and information systems staff.
Chapter 6. Information Warehouse Framework
79
6.7 Information Catalog Information Catalog: awareness of and access to information
The role of the Information Catalog is to help knowledge workers (end users) find out what informational objects are available to them and what that data means, in terms they understand. Once the informational object has been identified, the Information Catalog facilitates the use of informational applications to retrieve and analyze it. It is oriented toward search and display of meta-data and the informational object to which the meta-data refers.
6.7.1 Function It accesses meta-data and interacts with DSS tools
To facilitate the access to informational objects through the use of meta-data, the Information Catalog provides a mechanism for performing the following activities: • • • •
Accessing user-oriented meta-data Interacting with informational applications Distributing meta-data over multiple Information Catalogs Collecting meta-data.
The meta-data collection function accommodates definitional sources dispersed across heterogeneous meta-data stores and stored in various formats.
It supports a spectrum of end-user skill levels
Knowledge workers have a broad range of expertise in the use of computer systems. At one end of the spectrum are the users who will simply use the Information Catalog to locate predefined informational objects (for example, queries and charts). At the other end of the spectrum are users who will build their own queries for data analysis and presentation; these users want to locate and understand the data elements they can use as raw material. The Information Catalog supports this range of user requirements.
6.7.2 Interfaces Information catalog interfaces help integrate the solution
Information Warehouse Architecture I describes two interfaces that are directly relevant to Information Catalog requirements: the interface to the Information Catalog itself, and the export/import interface. Both are open interfaces, in that their specifications, syntax, formats, and protocols will be published for use by software developers. The most immediate need is for the interface to the meta-data. The Information Catalog′s greatest value is in making knowledge workers aware of meta-data in the enterprise. Use of the Information Catalog interface requires a program on the workstation which executes commands to retrieve meta-data from the Information Catalog. This requirement is addressed as the Information Catalog Browser in the Information Warehouse architecture and is met by the DataGuide end-user interface. Therefore, IBM has provided the tool that uses the interface and meets the requirement.
80
The Retail Industry IW
The export/import interface addresses the issue of moving meta-data from CASE tools or relational database catalogs. The DataGuide family includes extractors for taking meta-data from DB2 for MVS and DB2/2. A requirement exists for tools utilizing the export/import interface for extracting meta-data from other sources. These tools might be written by the information systems department, by tool vendors, or other services providers. The DataGuide family products provide sample C programs for developing extractors for other meta-data sources. The interface is an intermediate data format: DataGuide has a facility to import from this format. The tools for other sources would extract the meta-data and write it to a file in this format.
Export/import interface for existing metadata
6.8 Information Warehouse Architecture Products This section presents products of the Information Warehouse framework. In each subsection, a brief description of the product is given with the requirement(s) that it addresses. A reference is then made to the chapter that describes, in more detail, the product and its implementation within our Information Warehouse solution. Before focusing on these products, it would be wise to remember that the configuration of information into detailed and summary databases is equally as important as the products used to access that information.
6.8.1 The DataGuide Family DataGuide provides an easy-to-use facility for finding enterprise data using searches based on business terminology. It also provides a vehicle for launching queries, examining existing reports, and launching any number of related applications using located objects as input. It supports the following business requirements: •
View of data (store and corporate)
•
Access to data (corporate only) Access is limited to queries and modification of queries stored in decision support tools that can be invoked through a command-line interface. A decision support product such as PAS can be invoked for detailed data access and analysis.
•
Easy to use for novice data processing staff DataGuide provides a GUI; knowledge of SQL is unnecessary.
•
Need to access both summary and detail data DataGuide can invoke various decision support tools and the informational objects defined to those decision support tools. Knowledge administrators are responsible for creating a portfolio of decision support tools and informational objects—charts and reports—at varying levels of detail and summarization. They are also responsible for organizing the informational objects in a hierarchy for the end user, using DataGuide′ s grouping capability.
For more information on DataGuide, see 8.2, “DataGuide” on page 103.
Chapter 6. Information Warehouse Framework
81
DataGuide helps find enterprise data
6.8.2 S/390 Parallel Query Server S/390 Parallel Query Server for analysis of large data volumes
The S/390 Parallel Query Server solution is a package of hardware, software, and services that provides a specialized query engine for rapid resolution of medium to complex queries working on large amounts of data. S/390 Parallel Query Server also solves a data access problem within performance guidelines favorable to the knowledge worker. The S/390 Parallel Query Server solution provides an excellent growth path with its wide range of configurations, even considering the fast rate of growth expected for detailed historical information. S/390 Parallel Query Server′s read-only nature is in perfect harmony with the read-only access envisioned for Information Warehouse requirements. S/390 Parallel Query Server supports the following business requirements: • • •
S/390 Parallel Query Server is a specialized solution for large queries
Access to both summary and detail data Keep four years of trend data Ability to perform ad hoc query.
S/390 Parallel Query Server is a solution for enterprises wanting to interrogate large amounts of data (for example, an ad hoc query retrieving information from a large database) in an unpredictable manner. Such interrogations help the customer run their business smarter as well as improve their efficiency and effectiveness. The information that is gathered as a result of an inquiry would help knowledge workers realize the following business goals: • • • • •
Increase market share Generate additional revenue Create new demand for existing products and services Control operating costs Improve customer satisfaction.
When combined, achievement of the above goals should lead to improved profitability for your enterprise.
It is a parallel database server
S/390 Parallel Query Server is a parallel database server that is dedicated to processing queries against large amounts of data (for example, historical) in relational databases. S/390 Parallel Query Server accepts dynamic read-only SQL queries from decision support tools, such as QMF* or Data Interpretation System, or an enterprise′s own applications. S/390 Parallel Query Server uses parallel processing to reduce query time and cost, making it practical to extract data from tables with millions of rows of data, such as sales records, medical records, detailed inventory records, or demographic information. S/390 Parallel Query Server can solve a new dimension of problems and thereby provide new business value. S/390 Parallel Query Server plus informational applications provide a complete Information Warehouse framework solution. That is, S/390 Parallel Query Server provides support for the Organization Asset Data component of the Information Warehouse architecture. For more information on S/390 Parallel Query Server, see Chapter 7, “Organization Asset Data” on page 87.
82
The Retail Industry IW
6.8.3 DataPropagator Relational DataPropagator Relational is a data replication product that allows changes made in selected database sources to be applied to selected database targets. The database sources of most interest in the retail enterprise are DB2 for MVS and DB2/2. The database targets of interest are DB2/2 databases on the store and corporate LANs. DataPropagator Relational indirectly supports rapid dissemination of informational objects by making views of data anywhere easier to implement. It decreases the load on the batch window by better managing the propagation of operational data changes to corresponding informational databases. DataPropagator Relational supports data currency through update propagation, so that full refreshes are avoided. It allows data propagation activity to be executed during online hours that otherwise would be performed during the batch shift. DataHub, a prerequisite to the DataPropagator Relational product, also has its own tool for refresh propagation between DB2 for MVS and DB2/2 tables. For more information on DataHub and DataPropagator Relational, see Information Warehouse in the Finance Industry and appropriate product documentation.
6.8.4 Personal AS/2 Personal AS/2 (PAS) is an informational application for the LAN and workstation environments. The key features for which PAS was chosen by the retail enterprise are its analysis tools and graphical representation abilities. As such it supports the following business requirements: •
Ability to perform ad hoc query, particularly in the area of more complicated analysis.
•
View of data everywhere PAS can be launched by DataGuide/2.
•
Access to data everywhere (corporate personnel) PAS can access remote relational data using DRDA.
•
Access to both summary and detailed data.
Chapter 6. Information Warehouse Framework
83
DataPropagator Relational helps manage distributed data
Figure 19. Future Retail Enterprise Network: Connectivity and Operating Systems. Also included is placement of relational data stores
84
The Retail Industry IW
6.9 Why Use the Information Warehouse Architecture An Information Warehouse solution maps well against business requirements possessing the following key characteristics: • •
Large volumes of data Dynamic requirements for informational data.
The Information Warehouse implementation must be open and extensible to meet these requirements. Operational data and software that are not by nature open and extensible must be accommodated by the Information Warehouse solution. The Information Warehouse architecture provides an open and extensible structure that can be used in an Information Warehouse implementation today.
Chapter 6. Information Warehouse Framework
85
The IW must be open and extensible
86
The Retail Industry IW
Chapter 7. Organization Asset Data Enterprises consider their data to be a vital asset. For historical and organizational reasons, very few enterprises have a master plan for managing their data. Data stored in databases and files is identified as data objects by their data processing technology name. There is little categorization of data objects and no enterprise-level directory documenting the ownership, business meaning, or relationship to applications of informational objects. This lack of a data plan has contributed to the difficulty in making the data accessible to the business analysts who need it. The data categories of the Information Warehouse architecture provide a foundation for building a master plan for the organization asset data.
The Information Warehouse architecture defines five categories of data with respect to how they are used by applications and specifies four configurations, or collections of the five categories. Categorization and configuration of enterprise data are the first steps toward developing an enterprise management system for data. DataGuide is a catalog for documenting what data means from a business point of view. Information Warehouse architecture′ s Organization Asset Data component incorporates the categorization and configuration methodology. Figure 20 on page 88 highlights organization asset data in the Information Warehouse architecture.
Chapter 7. Organization Asset Data
87
IW architecture brings order to the data chaos
The IW architecture is a plan for enterprise data
Figure 20. Organization Asset Data in the Retail Industry
An industry model presents the business view
The organization asset data component consists of two elements: data and meta-data. The business view of the data is achieved through the modeling process, whether it is informal or tool-based. The discussion in Appendix A, “Models and Modeling” on page 129 lays the groundwork for understanding the meta-data to data relationship. The business analyst understands the enterprise′s business and the business objects and activities that contribute to that business. The model is a translation of that view to the data processing view of the data processing objects and processes that mirror the business objects and activities.
The data categories complement the industry model
The model is a way of managing the categories of data defined in Information Warehouse Architecture I and creating a channel of communication between business analyst and information systems department staff. The model starts from a business perspective and progresses toward the implementation of an equivalent data processing perspective. The data categories in the Information Warehouse architecture are more geared toward how the data is created and used by applications. The Information Warehouse architecture helps to make informational objects available to informational applications by guiding the extraction and enhancement of operational data to create the informational objects. The business model view provides the business semantics of the data. You need both approaches to maintain an understanding of the data and the management control of the data. The Information Warehouse architecture data categories are as follows: • •
88
Real-time Changed
The Retail Industry IW
• • •
Reconciled Derived Meta-data.
Real-time data is created and manipulated by operational applications to run the day-to-day business. Changed data represents the changes that transactions make to the operational data. Reconciled data is an informational copy of the operational data with basic conversions of codes into meaningful descriptions. Reconciled data also carries the resolution of inconsistencies between data stored for different lines of business in the enterprise. Derived data contains records that are summarized, aggregated, and transformed from (detailed) reconciled data or from the real-time data. Meta-data is descriptive data about the data in the other categories. It is used by knowledge workers searching for and trying to understand the data in those other categories. The meta-data is maintained in the Information Catalog. We know from our business requirements and information supplied by the information systems function that we would need to: • • • • • • •
Access trend data for a four span Access both summary and detail information Access large data volumes Support a large population of knowledge workers working concurrently Avoid impact on the batch window Isolate operational data from knowledge workers′ access Limit informational data access to be read-only.
The volume of trend data necessary to satisfy the new scenarios in 4.1.2, “New Scenarios for Profit” on page 49 is expected to grow over time. The existing and new reports need both detailed and summary informational data (reconciled and derived data). The solution would need to accommodate this growth with existing technology. These divergent requirements underscore the new role of the information systems department, wherein information systems is a provider of information. information systems must be in a position to provide access to detailed, reconciled data or informational data aggregated at any conceivable level at the request of the knowledge worker groups. It is further responsible for maintaining data currency and ensuring the correlation between information and its business meaning. Choosing a suite of informational applications to present the informational data is left to the business group. The business group is free to select any decision support tool that provides the desired function and that utilizes the SQL API for data access.
The new information systems role: provider of information
A review of the potential queries indicated that most of the queries were of medium complexity, with a significant number of queries fitting into the highly complex category. The majority of medium to complex queries are expected to be unstructured; that is, ad hoc in nature and not tuned for performance. The lack of structure is caused by decision support products that removed the knowledge worker from the formulation of the SQL itself. The decision support GUI makes data access easier for the knowledge worker, but it inhibits fine tuning of the queries. If some means could be found to control query structure, the new performance features of DB2 Version 3 release 1 running on an ES/9000 platform may be an adequate solution. The relational database tables accessed by the retail queries have data in the millions-of-rows range.
Consider the variety of information volume and query complexity
Chapter 7. Organization Asset Data
89
Both query complexity and data volumes will affect the amount of CPU resources expended to deliver the query result and the response time. One of the primary criteria for choosing a dedicated query engine lies in understanding the complexity of the anticipated queries and the volume of data that will be processed by the average and maximum query. Figure 21 indicates conceptually the interaction of both factors when coupled with the expected number of concurrent users.
Figure 21. Query Cost as a Function of Query Complexity and Data Volume
7.1 The Solution All of the above requirements point to a dedicated query solution. The S/390 Parallel Query Server query data solution matched up well against these requirements on the following points: •
Response time S/390 Parallel Query Server is designed to deliver optimal performance for queries against large data volumes. The design accommodates increases in the population of concurrent users.
•
Scalable solution S/390 Parallel Query Server provides room for growth. Data storage and processor capacity can be added as new query workloads are added.
•
Packaged solution
90
The Retail Industry IW
The S/390 Parallel Query Server is delivered as a total solution of hardware, software, and associated services. •
Complementary technology The S/390 Parallel Query Server solution complements existing ESCON-capable processors at retail enterprises. There is no need for extensive training and revisions to operating procedures. Support can be provided by established operating system and database specialists. Moreover, most of the system maintenance is packaged into the S/390 Parallel Query Server solution.
•
Fast batch operations Loading of the query data is faster when using the high speed ESCON channels and DB2 3.1 features that support I/O parallelism, more efficient utilization of memory, and data compression.
Figure 22 illustrates the tradeoffs between query complexity and data volume versus general and special purpose solutions.
Figure 22. Special and General-Purpose Solutions
Chapter 7. Organization Asset Data
91
7.2 S/390 Parallel Query Server S/390 Parallel Query Server is a specialized solution
S/390 Parallel Query Server is a specialized hardware and software solution to the large data volume, complex query data access requirement. It provides access to large volumes of informational data while optimizing response time. The initial appearance and subsequent reworking of that information is managed by the informational application used by the knowledge worker.
S/390 Parallel Query Server is ideal for derived and reconciled data
S/390 Parallel Query Server would be used for storing reconciled and derived data. The S/390 Parallel Query Server system is chosen when the data volume is large enough to require partitioning of the DB2 tables. Response time requirements are a key justification for environments where data volume and query complexity would make general purpose solutions inadequate. Query splitting is the major facilitator of the specialized solution, and it is assumed that at least one of the largest tables in the query is a partitioned table. The common database and files found at the level of the in-store processor and POS systems are shown in Figure 23. This figure is included here as a reminder of the source of the operational data that is the source of informational data copied into S/390 Parallel Query Server. Increasingly, relational databases are used for critical data such as ITEM, facilitating the incorporation of this data into an Information Warehouse implementation.
Figure 23. Databases and Files in Retail Enterprise Network
92
The Retail Industry IW
The initial configuration of S/390 Parallel Query Server provides access from mainframe informational application products. We need a mechanism to invoke queries from the programmable workstation and LAN environments. Note that the LAN environment is the hardware platform for informational applications, DataGuide, and administering and launching data replication tools. Today, we can provide such connectivity using Personal AS/2 Version 3 on the programmable workstation, which calls Application System Version 3 on the large server to execute a query. Alternatively, Personal AS/2 can use DDCS/2 and DRDA protocol to access the large server database without the need for host Application System Version 3. The product configuration required to support the Application System Version 3 approach is shown in Figure 24. For the remainder of this section, we elaborate on the configuration of the specialized query solution using S/390 Parallel Query Server.
Figure 24. Connectivity between Personal AS/2 and S/390 Parallel Query Server
Chapter 7. Organization Asset Data
93
7.2.1 Software Configuration Access to S/390 Parallel Query Server data is supported through a front end MVS system running DB2 2.3 or 3.1 and access enabler software. The access enabler software for DB2 access must be made known to the Application System Version 3 code for Application System Version 3 to pass its SQL requests to S/390 Parallel Query Server. SQL initiated from the large server Application System Version 3 product is passed to the S/390 Parallel Query Server complex depending on the table names used in the query. Data results would be passed directly back to the address space that contains Application System Version 3. At the workstation, knowledge workers using Personal AS/2 on the programmable workstation receive resultant data rows through the Server-Requester Programming Interface (SRPI). The SRPI session between Personal AS/2 and host AS is configured using Communications Manager/2. The answer set for a given query is input to Personal AS/2, then used by the knowledge worker for further analysis. The final result of the SQL request and the Personal AS/2 manipulation of the data is graphical display of the answer set.
7.2.2 Information Maintenance To keep the information in the S/390 Parallel Query Server data store current, the S/390 Parallel Query Server system is quiesced at night after all the necessary extract and data replication programs have been run to gather data in the form required to load the informational tables. Quiescing allows existing queries to complete and prevent any new queries from being started. All S/390 Parallel Query Server DB2 subsystems then deactivate their access to the DB2 tables in S/390 Parallel Query Server. ESCON channels from the front-end MVS system to the database DASD are varied online and the DB2 subsystem on this front-end would perform the necessary data loads, reorganizations, backups, and updating of statistics. At the end of this maintenance cycle, the environment is reversed and DB2 subsystems in the S/390 Parallel Query Server complex would regain read-only access to the DB2 data.
7.2.3 Retail Enterprise Operations This mode of maintenance is particularly appropriate to the retail enterprise′ s data processing environment. There is a necessary wait period at the end of the business day prior to availability of the POS data for processing. The retail enterprise desires the S/390 Parallel Query Server system to be available to business analysts until the last possible moment before informational refresh. The information systems department devises a method of gathering updated POS data, enhancing it, and cleansing it using staged data. These processes will be performed independent of the S/390 Parallel Query Server during a maintenance period, but before the quiesce and load period. We cannot wait for the S/390 Parallel Query Server to be quiesced before beginning all the various extracts. The output of these various extracts must be available for immediate loading of the S/390 Parallel Query Server databases, to save time in the batch window.
94
The Retail Industry IW
7.3 Technical Issues S/390 Parallel Query Server is a rack-mounted query processor that uses specialized software to split SQL queries into separate job streams. The query splitter operates above the SQL API level and is designed to introduce parallelism into queries that must scan large amounts of data. The query splitter takes the original query and splits it into component queries. Each of these component queries executes the original query on a subset of the data. This is accomplished by manipulating range predicates for each component query. Answer sets from each component query are then merged into a final answer set.
S/390 Parallel Query Server is rackmounted CMOS technology
The objective is to convert a large tablespace scan chosen for the original query into multiple keyed index scans. We use the word keyed because there are rules that include splitting the query if it created full index scans in the original query. S/390 Parallel Query Server uses a form of parallel processing to enable concurrency. We review the basic concepts of parallel processing as a preparation for discussing components of this specialized solution.
7.3.1 Types of Parallelism We discuss two forms of parallelism; I/O parallelism and processing parallelism. Prior to DB2 Version 3.1, most databases performed I/O operations for a single query in a serial mode. Multiple DASD volumes were scanned, in sequence, to satisfy a particular query. This often left the processor with idle time while it waited for the longer-running I/O to complete. DB2 3.1 introduced query I/O parallelism for queries executing against data in partitioned tables and for certain joins. This allowed I/O to be run concurrently against many different parts of the same table. This results in better utilization of the processor, because the data flow rate to the processor is increased. This can improve elapsed times by a factor of two or three. I/O parallelism in DB2 Version 3.1 offers one approach to effectively handle queries running against large volumes of data.
Parallelism can be applied to I/O and the processor
The CPU parallelism introduced by S/390 Parallel Query Server goes one step further and allows concurrent CPU processing to take place for a single query. This, coupled with I/O parallelism, allows effective handling of complex queries run against large volumes of data. In Figure 22 on page 91, the dotted curve shows the positioning of the S/390 Parallel Query Server solution as a function of the mix of complexity and data volume in the retail environment.
CPU parallelism helps address query complexity
We consider two approaches to implementing CPU parallelism: the partitioned approach and the shared approach (as shown in Figure 25 on page 96). In the partitioned approach, the data is divided across multiple DASD. Each processor in the configuration has active channel connections to only one subset of this DASD. In executing a single query, the dispatching task will need to understand this hard-wired affinity between processor and DASD in allocating work.
Chapter 7. Organization Asset Data
95
Figure 25. Parallel CPU Processing Environments
The partitioned approach has the potential to create bottlenecks if several queries wait for work scheduled on a particular subset of DASD. The partitioned approach also requires a large amount of maintenance when additional capacity is added. Let us assume the additional capacity is mandated by growth in the size of the stored tables. To maintain a similar response time, typically more DASD must be added when additional processors are added. This results in a redistribution of the data across the new complement of DASD. Affinities between CPU and DASD must be redefined which in turn may impose application changes. One variation of this partitioned parallel approach is to spread data from all tables—including the system catalog—randomly across all DASD. This largely solves the DASD bottleneck problem, partially solves the redistribution problem, but causes other problems. For example, increased parallelism is almost always at the expense of greatly reduced concurrency, and a failure of a single DASD or processor may impact the entire system. If more than one index has been defined on the data, processing of such a secondary index can often result in an uneven distribution of work among the storage devices. The secondary index is not aligned with the physical order of the data, and I/O accesses may jump about the DASD volume rather than proceeding stepwise in a single direction. The partitioned parallel architecture is very sensitive to the distribution of workload among the storage
96
The Retail Industry IW
devices. The shared parallel architecture is much less sensitive to this and therefore handles secondary indices better. In the shared approach to CPU parallelism, a unique sharing technology allows all CPU′s to access any DASD volume. There is therefore less potential of a bottleneck on a particular CPU. Adding capacity is also easier, because there is no hard-wired affinity between CPU and DASD. The dispatcher software in S/390 Parallel Query Server can dynamically adjust its allocation of workload to use the new CPU. We now look more closely at the S/390 Parallel Query Server solution.
7.3.2 S/390 Parallel Query Server Design There are several models of the S/390 Parallel Query Server solution. The base model has a central electronic complex (CEC) of six central processors (CPs) running a single image of the MVS operating system. DASD is included in the complete hardware configuration. Subsequent models add more CECs to a maximum of eight. The S/390 Parallel Query Server solution is a package that includes hardware, software, and services. The base software functions in a S/390 Parallel Query Server CEC are as follows: Query scheduler
The query scheduler is linked to management of the channel-to-channel (CTC) ESCON connections to the front-end MVS system. It runs on only one of two CPs in each CEC complex. The second CP is backup in the event of primary CP failure. These two CPs have more memory allocated to each.
Query splitter
The query splitter decides whether a query should be split or sent as is to a single CEC-server combination. It also has primary responsibility for merging answer sets. It can run on only one of the two lead CPs in each complex. However, any complex can run splitter or server tasks.
Server
The server passes on the query to a DB2 subsystem and passes the individual answer set for split queries back to the splitter.
Figure 24 on page 93 shows multiple instances of the server code as sysplex servers. Together, all the CECs in a S/390 Parallel Query Server configuration function as a sysplex, using the MVS global resource serialization services (GRS) feature to coordinate work among the CECs. The front-end MVS system is not part of the GRS ring used for task scheduling. All DASD is shared in read-only mode by every CEC in the sysplex. This is implemented at the DB2 data level using the Shared Read-Only Data feature. Automation code ships with the product and runs from the front-end DB2 subsystem to perform database updates. The automation code is based on NetView and Automated Operations Control (AOC/MVS).
Chapter 7. Organization Asset Data
97
7.3.3 Query Splitting Query splitting is the key to S/390 Parallel Query Server ′ s parallel processing
Query splitting is the process of taking a query whose predicates select some answer set, breaking it into multiple queries whose corresponding answer sets are all disjoint subsets of the original, and constructing that original set when UNIONed. Splitting is based on a partitioning key and requires that original query can be resolved into mutually exclusive subset queries. These subset queries access either the constituent physical partitions of a partitioned table or the constituent tables that make up a concatenated table. The concatenated table concept is enabled through access enabler, not DB2 code. It represents logical partitioning of data, rather than physical partitioning of data. The query splitter code contains logic to optimize the splitting process. It makes use of DB2 Explain data and is particularly sensitive to table partitioning and JOINs applied to partitioning indices.
7.3.4 Front-End MVS System The front-end MVS is used for access and maintenance
The front-end MVS system to the S/390 Parallel Query Server solution (see Figure 24 on page 93) functions as both the sole access point for queries to S/390 Parallel Query Server and the maintenance engine for updating DB2 tables in S/390 Parallel Query Server. We have chosen to use Personal AS/2 communicating with the front-end system as our means for accessing S/390 Parallel Query Server data. LAN decision support that uses TCP/IP connections to the front-end MVS will also work. In this case, VTAM must be correctly configured to translate TCP/IP flows to SNA LU 6.2 flows. LU 6.2 data flows over the ESCON CTC adaptor are the only means by which the front-end system passes data to S/390 Parallel Query Server. Host tools and applications that today submit SQL queries through the Call Attach, TSO Attach or CICS Attach facilities can link edit or load the S/390 Parallel Query Server query receiver to send queries to S/390 Parallel Query Server instead. The query receiver sends dynamic, read-only SQL queries to the S/390 Parallel Query Server for processing. Insert, update, and delete statements and static SQL are sent to the front-end DB2 system. The front-end DB2 system can be either version 2.3 or 3.1. The choice of version will impact the efficiency of S/390 Parallel Query Server, which exploits DB2 3.1 function, as follows: •
I/O parallelism for partitioned table queries This promises a two- to three-fold reduction in elapsed time for selected read-only queries.
•
Hiperpool exploitation This promises a 25% reduction in elapsed time for queries.
•
Data compression Data compression could potentially improve query response by reducing the I/O required to produce the result set.
98
The Retail Industry IW
•
Improved utility operation Much of the improvements are for utilities directed against partitioned tables (LOAD, REORG, COPY) that will now use I/O parallelism.
Data compression cannot be used by S/390 Parallel Query Server if the front-end DB2 is Version 2.3 because the front-end system must maintain the databases and therefore be able to read and write compressed data. I/O access times for uncompressed data in DB2 2.3 will be longer than the same accesses against compressed data under 3.1. Additionally, many of the I/O parallelism improvements in the standard DB2 utilities are lost if the front-end system uses DB2 2.3. The DB2 Version 3.1 data compression feature helps in managing a constrained batch window. Use of data compression allows a larger amount of informational data to be stored on the same physical DASD. For example, in the maximum S/390 Parallel Query Server Version 1 configuration, 960GB of DASD can actually store 1.5TB of data. To support data compression, the existing ES/9000 system would be upgraded to an ES/9000 model 511- or 711-based processor. After database maintenance, DB2 catalogs in each CEC, as well the access enabler software directory used by the dispatcher and splitter logic must be synchronized. This directory can be changed when the associated access enabler software is not executing. Changes to the DB2 catalogs are a bit easier and can be handled dynamically by automation tasks without taking down any DB2 subsystems.
Chapter 7. Organization Asset Data
99
100
The Retail Industry IW
Chapter 8. Information Catalog The Information Catalog is the centerpiece of the Information Warehouse solution at the retail enterprise. It is the knowledge worker′s entry point to the Information Warehouse implementation; that is, the ultimate goal of the knowledge worker is to access informational data for informational analysis. The Information Catalog is the first software tool used by the knowledge workers to access the informational data. Figure 26 highlights the pieces of the Information Catalog as they appear in the Information Warehouse architecture.
Figure 26. Information Catalog
Chapter 8. Information Catalog
101
The Information catalog is the entry point to the IW solution
8.1 Information Catalog Function A library card catalog is a kind of Information Catalog
Figure 27 shows a popular analogy for understanding the functions of the Information Catalog. The card catalog of a library serves as a vehicle for discussing those functions. Let′s assume the knowledge workers enter a library in search of a specific book. Typically, they do not know the title, but have a specific or general idea of the subject of the book, or they might know the author′s name, or possibly even a keyword from the title. This other information the knowledge worker brings to the card catalog is called metadata in the context of the Information Warehouse framework. Our knowledge workers use this meta-data to find the book they are seeking The card catalog might have two different physical sets of drawers: one has cards sorted alphabetically by author, the other by subject keywords. The knowledge workers find the card having the author′s name or the subject, note a number for the book by which they can find the book on a shelf, and then proceed to the shelf to get the book. Today′s libraries contain electronic versions of the old card catalogs, but the search procedure is essentially the same.
Figure 27. A Card Catalog Information Catalog
The Information Catalog works in essentially the same way as a library card catalog. Knowledge workers enter business terms as keywords into the Information Catalog. The Information Catalog uses those business terms to
102
The Retail Industry IW
find entries representing informational objects (for example, reports, queries, and pie charts). The Information Catalog also contains instructions on how to access these objects. In some cases, the objects are directly accessible through informational applications invoked by the Information Catalog. In other cases, the objects are not directly accessible, and the instructions might simply be text on how to get the object. An example might be an industry report purchased as a paper document from an outside vendor. The entry might have the name and phone number of the librarian at the corporate library; it is up to the knowledge worker to go to the librarian to get the document.
8.2 DataGuide Many of the most important business requirements established in previous chapters are met by IBM′s Information Catalog solution, DataGuide. In this chapter, we discuss some of the new capabilities of the DataGuide family of products. The implementation of DataGuide/2, the programmable workstation member of the DataGuide family, with S/390 Parallel Query Server extends query availability to multiple platforms with a smaller maintenance cost than with currently available products.
DataGuide The DataGuide family of products is the IBM solution to the Information Catalog requirements specified in the Information Warehouse architecture.
Chapter 8. Information Catalog
103
Searching is the most important DataGuide/2 function
Knowledge workers desiring access to enterprise data must find it first; DataGuide is the first step in gaining that access. This locate and access sequence makes the search capability DataGuide′s most important function. DataGuide′s searching function is easy-to-use, powerful, and flexible to move knowledge workers past the curiosity stage.
DataGuide/2 ′ s search capability is powerful
Knowledge workers can specify search criteria for multiple properties of a single object type. They can also specify a search criterion for the Name property to search across object types. The requirement for searching across object types is derived from the fact that the Name property is the only property known to the knowledge worker that is common to all object types. If a value is not specified for a selected property, then DataGuide returns a list of all object instances for the object type. Unqualified searches return all object instances for the object type selected.
The search request itself is an asset
DataGuide recognizes that the search request itself is as much an asset as the informational objects it returns; that is, knowledge workers invest time in constructing search requests. They often start with broad search criteria and refine them until they specify the exact informational object of interest. This search criterion may be of value on an ongoing basis to the knowledge worker who built it. Knowledge workers can save search requests in DataGuide, thereby leveraging the knowledge worker′s effort.
Launching applications is the goal
Finding informational objects is just a step in the process of decision making. Informational objects can be either physical things, such as a filed paper report or an electronic report that can be displayed on the workstation. DataGuide offers Contact information for filed paper reports and the launch capability for electronic reports. Knowledge workers obtain Contact information for an informational object or launch an informational application through the pop-up action window activated for the informational object they have located. DataGuide supports the launching of informational applications to display informational objects as a function of the knowledge worker end-user interface.
DataGuide: finding information
We first introduce the structure of DataGuide/2 and some key concepts. These concepts relate directly to the Information Warehouse architecture and the new way of viewing meta-data defined by this architecture. We then work through three specific scenarios from the retail enterprise in the areas of searching for information, administration of DataGuide/2′s structure, and extending DataGuide/2 to support some functions specific to a retail enterprise. Finally, we look at meta-data maintenance using DataGuide/2 and considerations for managing multiple copies of DataGuide/2.
104
The Retail Industry IW
8.2.1 Basic Structure DataGuide is flexible in platform configuration. configurations for DataGuide.
Table 7 shows the possible
DataGuide is flexible in configuration
Table 7. DataGuide Configurations
Knowledge worker (DataGuide)
Meta-data Server
Meta-data Store
Stand-alone
Programmable workstation (DataGuide/2)
DataGuide/2
DB2/2
LAN 1
Programmable workstation (DataGuide/2)
DataGuide/2
DB2/2 on LAN server
LAN 2
Programmable workstation on LAN 1 (DataGuide/2)
DataGuide/2
DB2/2 through LAN bridge
Large server 1
Programmable workstation (DataGuide/2)
DataGuide/2
DB2 via DRDA
Large server 2
Programmable workstation (DataGuide/2)
CDF/MVS
DB2
Large server 3
3270 (DataGuide/MVS)
CDF/MVS
DB2
DataGuide provides knowledge workers with easy-to-use functions to locate shared enterprise data, including informational objects. DataGuide/2 presents a GUI for interacting with knowledge workers; this interface approach allows the knowledge worker to find informational objects through the use of business terms. Once the object is identified, DataGuide/2 can invoke informational applications to retrieve and process the data. DataGuide/2 can also be used to provide a GUI to DataGuide/MVS, which uses CDF/MVS as a meta-data store. DataGuide/2 supports the Information Catalog API documented in Information Warehouse Architecture I . This support gives software vendors and inhouse application developer a methodology for manipulating meta-data managed by DataGuide/2. The methodology is based on the use of Information Catalog API commands. The DataGuide end-user interface is IBM′s implementation for this function. The DataGuide/2 administrator uses a separate GUI interface to manage data and informational object definitions in DataGuide/2, together with a set of utilities to extract meta-data from various sources.
Chapter 8. Information Catalog
105
DataGuide helps find data objects
DataGuide/2 supports an open interface
DataGuide/2 uses GUI
There are two distinct uses of the term object: one is defined by the designers of GUIs, the other by object-oriented language designers (for example, C+ + ). Typical examples of GUI objects are container objects; the folder is a specific implementation of a container object. For example, Data Objects is a folder containing the objects of type text, graphics, or image. Typical object-oriented language concepts are objects, object classes, class hierarchies, and the associated behavior of these constructs. GUI terminology is used in the discussion of DataGuide.
Knowledge workers and administrators use DataGuide/2
DataGuide/2 is discussed in terms of two user communities: end users (knowledge workers) who use DataGuide/2 to view meta-data, and (knowledge) administrators who are responsible for populating, maintaining, and extending DataGuide/2 in response to knowledge worker demand. We begin with the knowledge worker′s view of the product and then proceed to the administrative issues.
8.2.2 Knowledge Worker Functions DataGuide is customizable for the knowledge worker
Knowledge workers use DataGuide to locate enterprise data, using businessoriented terminology in requests submitted from the desktop. DataGuide/2 provides a GUI environment based on objects and actions to maximize the productivity of the knowledge workers. Figure 28 on page 107 shows the initial DataGuide/2 work area with the functions that the knowledge worker can execute under DataGuide/2. This work area is presented to the knowledge worker upon opening of the DataGuide/2 icon from the desktop. It holds the results of various tasks—creating collections and saving searches—performed by knowledge workers. The initial work area can be customized to contain the more commonly used objects for a particular knowledge worker.
106
The Retail Industry IW
Figure 28. DataGuide/2 Initial Panel
The functions that the knowledge worker can execute with DataGuide/2 are: • • • • • •
Search Launch applications Create private collections Display contact information View current news View glossary.
These functions are all geared to the knowledge worker. They address requirements for finding and accessing informational objects and working in an Information Warehouse environment, in general. These functions are now addressed individually.
8.2.3 Search DataGuide/2 supports two search modes: keyword and navigational. Keyword search is used by knowledge workers who have some descriptive word associated with the object of interest. The descriptive word can be applied to the object type, properties of an object, and where an object was used. For example, the knowledge worker may know that the name of a report contains the keyword “sales.” In this case, the object type is Chart, and Name is a property of the Chart object type for this informational object. The panels a knowledge worker would see when creating and saving a search are as follows:
Chapter 8. Information Catalog
107
DataGuide/2 has two search modes
Search specification Search results Search save New initial work area panel
Figure Figure Figure Figure
29 30 31 32
on on on on
page page page page
108 109 110 111
Figure 29. Search Specification
Knowledge workers arrive at the Define Search panel by double clicking on the New Search Icon on the initial work area panel. They then select the Charts entry under Object types. At this point, they can select the Name property and enter a search argument in the Enter value for selected property field. They can also choose to leave the Value blank to execute an unqualified search. The unqualified search returns all object instances for the selected object type.
108
The Retail Industry IW
Figure 30. Search Results
Knowledge workers are then presented with the Search Results - Icon List display. There are only three instances of the object type Charts in this DataGuide/2′ s meta-data store; the unqualified search returns those three. To save the search, knowledge workers first click on the Search results option on the menu bar and then select the Save option on the pull-down. The result is shown in Figure 31 on page 110.
Chapter 8. Information Catalog
109
Figure 31. Save Search
Knowledge workers can enter a name for the saved search. They have the option of choosing an icon for the saved search by clicking on the Find... button.
110
The Retail Industry IW
Upon completion of the Save Search function, the saved search appears on the initial work area panel as a new icon, as shown in Figure 32. The saved search is now available to be executed whenever the knowledge worker needs to find the set of informational objects identified by the search criteria in the saved search.
Figure 32. New Initial Work Area Panel
Navigation is a form of searching that does not require the knowledge worker to explicitly enter search criteria. Rather, a graphical tree representation of data, objects, and their relationships is traversed to locate informational objects. This approach allows knowledge workers using DataGuide/2 for the first time to become familiar with the organization of data and objects within the enterprise. Figure 33 on page 112 shows a navigational tree resulting from a search under the Subjects icon.
Chapter 8. Information Catalog
111
Navigation makes firsttime users productive
Figure 33. Navigation Tree
Navigational search is based on groupings
Descriptive information shows object properties
Navigational search—invoked through the Subjects icon—is used when information is searched for based on its grouping within the enterprise. For example, the knowledge worker might want to know which reports are part of the Annual Reports grouping. In this case, Report is the object type of an informational object; the Annual Reports grouping is defined by an administrator to reflect a business relationship. Knowledge workers can view descriptive information for selected objects in DataGuide/2 by double clicking with the left mouse button on the object. This descriptive information is called meta-data in the Information Warehouse architecture. DataGuide/2 presents a container window having values for each property of the informational object. The panel also offers the opportunity to launch a decision support tool. Single clicking with the right mouse button on the object causes DataGuide/2 to display a pop-up Actions panel. This panel offers the opportunity to launch an informational application against this object.
112
The Retail Industry IW
8.2.4 Launch Applications Launching applications involves invocation of a decision support tool using a located query object or a complete decision support report as input. DataGuide/2 can invoke an online document display package, such as BookManager Read*, to display a manual containing a relevant description of the information found in the search. Another possibility is invoking an image display package, such as IBM ImagePlus*, to display an image of a retail product.
Launching applications integrates data location and analysis
DataGuide manages the association of informational applications to informational objects. The association between application instances and objects is at the object type level. That is, the process is a three-step process, as follows:
Applications can be linked to objects
1. 2. 3.
Create the object type Associate a program with the object type Create object type instances.
Thus, multiple program instances are associated with the object type and multiple object instances are associated with the object type. DataGuide allows any of these program instances to execute against any of the object type instances, although the informational object may have been generated for a specific program instance. When the knowledge worker chooses the Start Program action against an informational object, DataGuide displays a list box containing the program instances for that object type. It is the responsibility of the administrator to ensure that the knowledge worker is given the proper direction, through an object property, to choose the correct program instance.
8.2.5 Create Collections A collection is a graphic grouping of informational objects that knowledge workers logically view as one. Such a collection might include common tables, business grouping, queries, reports or graphs that the knowledge worker often uses for analysis. Collections are defined by knowledge workers for their private use, whereas DataGuide groupings are defined by administrators for use by multiple knowledge workers. Figure 34 on page 114 shows a collection made up of two chart objects.
Chapter 8. Information Catalog
113
Collections mirror the way people think
Figure 34. A Collection
8.2.6 Display Contact Information A person may be access point
A contact is the name and computer system ID or telephone number of the steward for an informational object. The contact would be someone who could enable access to the associated data or object. The Contact function is important for nonelectronic objects such as product samples. DataGuide is an excellent facility for cataloging nonelectronic as well as electronic objects.
8.2.7 View Current News News keeps knowledge workers current with their IW
News might include which new objects or data had been added, which objects had been replaced, or a schedule of upcoming maintenance activity. Information found in this facility could be moved to the search results collection for further work. News items apply to the DataGuide environment and not necessarily to an individual informational object.
114
The Retail Industry IW
8.2.8 View Glossary The glossary contains definitions of terminology unique to the enterprise, its industry, and its technological environment. For example, certain common business terms and the normal business cycle might be documented here. Items in the glossary apply to the DataGuide environment and not necessarily to an individual informational object.
8.2.9
A glossary makes common terminology possible
Administrator Functions
DataGuide administrators use a set of panels and functions unavailable to the knowledge worker to manage the DataGuide meta-data store itself. They generally manage the environment, the knowledge workers′ access, and the contents of the DataGuide meta-data store.
Administrators manage DataGuide
The functions available to the administrator include the following: •
Adapt DataGuide to the enterprise − − − −
•
Extending DataGuide − −
•
Create business groups Associate informational application programs with objects Define contact information Create object instances
Extend starter set properties Create custom object types
Manage the meta-data − − − −
Import/export between DataGuide instances Run extractors Edit extractor results Import extractor results.
Two simple functions are helpful to adapt DataGuide to the enterprise: creating business groupings and associating informational application programs with informational object types. The business groupings define how informational objects are grouped together, and business groupings vary between enterprises. A simple example on managing groupings uses relational tables as the informational object. There are two possible perspectives on a relational table as an informational object: the relational table could be viewed as the lowest level informational object, or the columns in a table could be viewed as the lowest level. If the columns are considered of interest to the knowledge worker, then the administrator defines the table in the Grouping category and the column in the Elemental category and then must define relationships between the table and column object types. If the columns are not considered of interest to the knowledge worker, the administrator defines the table in the Elemental category and does not define the column as an object type. Groupings require planning and have implications for moving meta-data between DataGuide instances.
Chapter 8. Information Catalog
115
Adapt DataGuide to the enterprise
Extensibility is crucial to the Information Catalog
Extensibility is also important to the Information Catalog solution. Enterprises are expected to have their own variation on the structure of the metadata at the meta-data entity level. That is, different enterprises want to keep different attributes of specific meta-data entities. For example, an enterprise may want to maintain the date of last update for meta-data describing an informational object. A second enterprise may want to maintain that date and an indicator for the status of the operational application at the time the meta-data was updated. The administrator would have to add this indicator as an attribute of the meta-data object.
Meta-data management: copying metadata
Meta-data management is critical to the effective use of DataGuide on an enterprisewide basis. Because DataGuide/2 is a client-server product targeted for the LAN as the primary server, it could become constrained to a single LAN and the work group associated with that LAN. However, DataGuide/2 manages meta-data that describes informational data created from operational data throughout the enterprise. The meta-data may therefore be of interest to anyone in the enterprise in any work group. Therefore, the locality of a LAN-based Information Catalog must be reconciled with enterprisewide needs.
Associating programs facilitates launching
Associating programs to informational object instances through the object type has an implication for the knowledge worker. The Chart object type might have two report tools—for example, Query Manager and PAS—associated with it. Each Chart object instance—for example, 1993 Corporate Report and 1992 Division Report —has been built using one of these tools. The DataGuide Start Program function gives the knowledge workers the opportunity to choose the program for the informational object instance. Administrators must ensure that a property of the informational object has text directing the knowledge worker in the decision support tool to choose.
8.2.10 Extending DataGuide/2 Extend the DataGuide/2 model to suit the enterprise
DataGuide administrators can create new object types or add properties to existing object types. The DataGuide starter set creates an initial set of object types to help administrators build their object type set. The series of panels that follow show the basic steps in creating a new object type. Figure 35 on page 117 shows two panels: the initial DataGuide/2 panel, DataGuide/2 - Icon View, displayed when the administrator double-clicks on the desktop DataGuide/2 icon, and the administrator′s panel, DataGuide/2 Admin Utils - Icon View, displayed when the administrator double-clicks on the DataGuide/2 Admin Utils icon. Knowledge administrators double-click on the DataGuide/2 Knowledge Admin icon to display the DataGuide/2 - Icons panel containing the Object types icon. The Object types icon is available only to the administrator and is used to maintain DataGuide/2 meta-data.
116
The Retail Industry IW
Figure 35. Initial and DataGuide Administrator Panels
Administrators open the Object types icon, shown in Figure 36, to get the icon list of all object types currently defined in this DataGuide/2 and the icons associated with those object types. The icon list is shown in Figure 37 on page 118.
Figure 36. DataGuide Administrator Panel
Administrators open the New Object type icon to create a new object type or add a property to an existing object type.
Chapter 8. Information Catalog
117
Figure 37. Object Types
Knowledge workers double-click on the New Object type icon, at which time DataGuide/2 displays the Create Object Type panel (see Figure 38 on page 119). They can now enter the Category, Object type name, and a Short name for the new object type, MKTCHART.
118
The Retail Industry IW
Figure 38. Create Object Type
Administrators use the Create Object Type panel to define the object type and to manage object type properties. Administrators can add, modify, and remove properties and they can define a universal unique identifier (UUI) as being made up of a specific set of properties. The UUI is required and is used by DataGuide to uniquely identify each instance of all object types. Administrators click on the Define UUI... button to perform this definition. The administrators select Add from the Create Object Type panel to display the Add Property panel shown in Figure 39 on page 120.
Chapter 8. Information Catalog
119
Figure 39. Add Object Type Property
Administrators define new properties for the object type using the Add Property panel shown in Figure 39. They can specify a Property name, a Short name, the Data type, the Size, and whether entry is required. Any property to be used as part of the UUI must have Entry required specified for it, by clicking on the Entry required check box.
8.2.11 Meta-data Management Meta-data must be mobile to be useful
Two aspects of the retail enterprise put special demands on the Information Catalog and the management of meta-data: the need to consolidate metadata from heterogeneous sources, and the need to physically disperse a single logical collection of meta-data across multiple LAN work groups. The export/import interface provides the means for meeting both of these requirements.
Use the export/import strategy to manage meta-data for the enterprise
Most retail enterprises have a need for multiple DataGuide/2 instances, if for no other reason than to maintain test and production Information Catalogs. The enterprise needs to move meta-data between DataGuide/2 instances on LANs. This movement may be from DataGuide/2 to DataGuide/2 or it may be from DataGuide/2 on one LAN up to DataGuide/MVS and then down to another DataGuide/2 on a LAN. The DataGuide/2 import/export interface supports either path.
120
The Retail Industry IW
The Information Warehouse Architecture I defines an intermediate data format for meta-data. This approach breaks the meta-data movement process into three steps: extract from the source, storing on an intermediate basis, and loading into DataGuide. The format of the intermediate storage is published as part of the Information Warehouse framework strategy, and DataGuide provides a tool to import the meta-data in this published format into DataGuide′s meta-data store. The CASE tool vendor or retail enterprise now has only to write tools to extract from the source into the intermediate format.
DataGuide/2 import/export interface defines an open, intermediate format
The intermediate format also has implications in planning and project management. Enterprises can begin Information Warehouse implementation projects and build meta-data storage and end-user interface systems immediately. The export/import interface allows these projects to be migrated into the DataGuide/2 at a later point in time. The only additional investment is the tool to extract from the temporary meta-data store to the intermediate format. Special consideration should be given to the DataGuide/2 model and the relationships between the categories and object types. The meta-data in the intermediate format needs a mechanism to connect meta-data definitions.
The import/export provides flexibility
8.2.12 DataGuide Data Model In this section, data and objects within DataGuide are described using a classification schema borrowed from the world of GUI programming rather than object-oriented programming. For an explanation of object terminology in the GUI environment and the object-oriented programming environment, see the paragraph on Object terminology on page 106. The meta-data in DataGuide is defined within the hierarchy of Category, Object Type, and Objects. Categories represent a general classification of object types. Figure 40 on page 122 shows the categories that make up the DataGuide model. These categories, in turn, have a distinct set of actions associated with them. For example, application objects found in the Program category can be associated with objects from the Grouping category but not with objects or data from the Contact category. Category-action-category relationships are shown as double-ended arrows in the diagram.
Chapter 8. Information Catalog
121
Categories are a way of classifying objects
Figure 40. DataGuide Categories
Categories manage object-action associations in DataGuide
DataGuide functions are constrained with respect to the object types with which they work. For example, the Where Used search capability can only be applied to objects types within the Grouping and Elemental categories. Actions associated with any one of the seven categories apply to all objects in that category. The categories, with object types that are allowed in each category, are as follows: Grouping
Group object types in this category can contain other group types. The Group and Table object types are in the Grouping category. Groups form the highest level object in the DataGuide classification system.
Contact
The Contact category defines people responsible for objects. Contact objects can be associated with group and elemental objects. The Contact object is in this grouping category.
Elemental
Object Types in this category form the bottom nodes of a navigational tree. They can be contained by object types in the Grouping category but cannot contain other object types. Examples of object types in this category include the Column, Query, Report, and Image object types.
122
The Retail Industry IW
Program
The only object type allowed here is Program. types cannot be defined to this category.
Dictionary
This category serves to hold definitional support object types such as Glossary. This category is used to define business terminology.
Support
The Support objects.
object
defines
supporting
and
New object
informational
New object types can be created within any category except the program category, and new objects can be created within an object type. Object types have properties that apply to all object instances for that object type. These properties are analogous to attributes for entities or columns for tables and have syntactic and semantic definitions. The syntactic definition takes the form of the property′s data type (for example, VARCHAR and CHARACTER(80)). The semantic definition is a business description of the meaning of the object (for example, the property UPDATIME is the system generated time of the last update of one object instance). Object types are also the target of associations with applications such as decision support products. Only the DataGuide administrator has the authority to create object types.
Categories, object types, object instances manage your meta-data
DataGuide administrators at the retail enterprise want to describe each report as being large, medium, or small in size. This helps knowledge workers decide whether or not they want to print the report on a local miniprinter or the large printer at the main office. They use the Chart object type and create a new property called Size. They define this new property as being a character data type, with a length of 10. Each report is then defined as an instance of the object type report with the additional step of populating the Size property.
Customize meta-data
8.2.13 Interfaces The functions available to the knowledge worker and administrator through the respective end-user interfaces are also available through the DataGuide API. The API is available to any informational application that requires the services of an Information Catalog. Knowledge workers can then interact with the informational application with which they are familiar, and the informational application makes the necessary interactions with DataGuide to obtain the necessary descriptive information. These API commands perform searches for meta-data or administrative functions such as creating objects, getting objects, and providing listings.
A user uses the tool; the tool uses the DataGuide/2 API
The many functions available through the API can be categorized into Object Type, Object Instance, and Function services. Table 8 on page 124 summarizes these categories and their services.
Decision support tools can use DataGuide as a callable service
Chapter 8. Information Catalog
123
Table 8. DataGuide Services
Services include metadata search and management
Service Category
Services
Object type
Create, delete, append, and get for objects
Object instance
Create, delete, update, and get for instances
Function
Search, Search-all, and Navigate for finding meta-data
Both the object type and object instance service categories deal with the direct manipulation of meta-data in DataGuide. The Function services are more broad in their scope and cover services ranging from searching for meta-data to more system-oriented services such as export and import for moving meta-data into and out of DataGuide. The Listing service manages contacts, programs, object types, and groupings. Program invocation, commit and rollback, memory management, initiation and termination, and the trace service address the behavior of DataGuide itself.
Services are specific to categories
DataGuide API services are applicable to certain categories (see 8.2.12, “DataGuide Data Model” on page 121 for a discussion of DataGuide categories). For example, the ability to navigate is restricted to object types in the Grouping category.
Input/output structures enable the DataGuide API
Input/output structures are key to the use of DataGuide services by both the DataGuide end-user interface and other applications invoking DataGuide services. This DataGuide API data structure is self-defining and has a basic format made up of three parts, as follows:
Export/import interface makes DataGuide an open tool
The export/import interface supports the migration of meta-data from non-DataGuide sources to DataGuide and between DataGuide instances. This interface opens DataGuide to heterogeneous meta-data stores and supports a distributed environment for DataGuide/2 and DataGuide/MVS. Enterprises have meta-data in a variety of sources, including the structured columns and the Remarks column of the DB2 catalogs. Enterprises have invested time and resource in CASE tools as well. They have created and stored meta-data in CASE tool encyclopedias and other forms of meta-data stores. The export/import interface is a way of consolidating the heterogeneous meta-data into DataGuide.
The interface is open to all vendors
The export/import interface takes the form of a documented format for metadata. The DataGuide products are distributed with a tool that reads this documented format and populates the respective DataGuide meta-data store accordingly. The documentation is available to any customer or software vendor. The retail enterprise′s information systems department can write a tool to extract meta-data from the CASE tool encyclopedia used in its shop. This requires that the staff know the internal format of the meta-data in that particular CASE tool, or if provided by the CASE tool, interfaces and commands for extraction of meta-data. The enterprise then uses the DataGuide import tool to load the reformatted meta-data into the DataGuide store.
Header Definition Object
124
Structure identification and scoping Object property definitions Object property values (not always required)
The Retail Industry IW
The more likely scenario has CASE tool vendors using the documented format to write the extract tool. The tool extracts meta-data from their particular encyclopedia and stores it in the documented format in an intermediate medium, most likely a flat file. The retail enterprise can then use the DataGuide import tool to load the meta-data into the DataGuide meta-data store. This approach is preferable for the vendors, because it avoids disclosure of their internal meta-data format.
The information systems department can write the extract tool
The export/import interface can also be used to manage multiple instances of DataGuide. This requirement is rooted in the LAN orientation of DataGuide/2 and the nature of most retail enterprises. DataGuide/2 is designed to execute on a LAN server with multiple programmable workstation clients making use of its services. This configuration is very attractive to the work group or small project group.
Use the interface to manage DataGuide instances
Most retail enterprises are larger than this single work group and want to share meta-data across work groups. Large retail enterprises desire a central meta-data collection on a larger server with the capability to move the meta-data down to the work group. DataGuide/MVS can be used for the meta-data collection on the large server. DataGuide supports the movement of meta-data between DataGuide/2 and DataGuide/MVS instances through the export/import interface. Thus, DataGuide/2 offers the flexibility of the LAN environment with the flexibility to support enterprise-level meta-data and sharing across the enterprise.
DG/MVS supports centralized meta-data
DRDA can also be used to implement a centralized meta-data store. DataGuide/2 can access meta-data stored in a remote DB2 for MVS instance. It uses DRDA as the transport protocol to request meta-data. The flexibility of DRDA allows the meta-data to be moved to any DB2 family member.
DRDA supports centralized meta-data
Chapter 8. Information Catalog
125
126
The Retail Industry IW
Chapter 9. Conclusions Three years have passed since Keen discussed the range and reach model of information technology (see Shaping the Future: Business Design through Information Technology ). The basic message was that competitive advantage and economy of operation would accrue to those enterprises that improved their span of information technology communications to include their clients (reach) and enriched the functions offered to the clients (range). Along the path to the client′s door, it was equally important to get connected to your suppliers and outside institutions such as banks. And the key architectural feature of such an information technology strategy was its openness: the ability to use more than one vendor product for a particular function, not be locked into a proprietary solution that forced other parts of the solution to also come from the same vendor. Today, we are beginning to see many instances of how range is being implemented. In the example of one hypermart, providing information in a warehouse construct with preservation of the historical data allowed two things to happen: •
Better analysis The enterprise was better able to uncover sales anomalies and understand the driving factor behind the anomaly. This allowed it to refine its sales approach and achieve an immediate result.
•
Improved information accessibility Outside suppliers found value in accessing this information and paid the company back by assuming a part of their normal workload (maintaining stock levels) and eliminating parts of the business process (business reengineering by cutting out a large portion of the order-to-invoice paper trail).
Such was the unknown power of making information accessible that the full benefit of an Information Warehouse solution was not understood on day 1 of the implementation. In the evolving world marketplace, information becomes the new means of extending range and thereby gaining competitive advantage. As a result of this shift from the traditional application-oriented information technology approach, the role of the information systems specialist is also changing. In the past, information systems emphasis was on the delivery and maintenance of business systems, with 80% of the resources spent on maintenance. With the advent of powerful, independent LANs and workstations, business unit dependency on information systems has shifted from the supply of business systems to the supply of information. In the new envi-
Chapter 9. Conclusions
127
ronment, the information systems specialist is expected to be a provider of information rather than a provider of systems. In this new world, the Information Warehouse architecture supplies the plan for automating the provisioning of this information. Products such as DataGuide allow end users to see which information is available. Tied to other workstation products through the key Information Warehouse interfaces and protocols, end users can begin to manipulate this information in new ways. The new ways can often lead to better interpretation of information and reengineering of the business processes that generate this information. In their new roles, information systems specialists maintain the inventory of data and relationships between data; the knowledge worker builds small applications on the fly. As providers of the raw information, information systems specialists need to send consistent copies of shared information to various points in the enterprise. Given their limited people resources, such information flows must be automated. This need leads to new products, such as DataPropagator Relational and DataHub/2, that implement the automated flow of full copies and differential updates of source data to multiple informational objects. A key feature of such products is the use of the workstation as the point of control. Use of the workstation takes advantage of an object-oriented approach to mask the complexity required to maintain the new information network. Finally, in the area of presenting that information to the knowledge worker, the Information Warehouse architecture establishes some important approaches to assist information systems specialists in keeping their information networks open and flexible: • •
A reliance on an established data access language (SQL) A view of functions as objects that allows new functions to be snapped in over time or replaced. For example, in the DataGuide/2 product, a query or report is merely an icon to the knowledge worker. If the decision support product that ultimately runs the query or allows the end user to view the report is changed, that change and its impact on the knowledge worker can be hidden by the information systems specialist who will reconfigure the underlying DataGuide/2 representation of that object.
The competition is calling and information is answering....
128
The Retail Industry IW
Appendix A. Models and Modeling The solution presented in this book devotes significant attention to models and modeling. Modeling has been given much publicity over time and has been considered crucial to a well-organized and efficient data processing organization. However, modeling has, in general, failed to deliver on the promised benefits of model-based application generation. Some of this failure is due to the sheer variety of modeling methodologies available, and some is due to the esoteric language and complexity of the concepts inherent in modeling. Perhaps the largest contribution to this failure is the lack of open interfaces and automation between the components and phases of the application development life-cycle. In this appendix, we present the concepts and benefits of modeling by a simple example and lay the groundwork for the role of the Financial Application Architecture* (FAA), Insurance Application Architecture* (IAA), and Retail Application Architecture* (RAA), in data processing in general and Information Warehouse implementation in specific.
Modeling organizes the data processing environment
A.1 The Construction Model The building of a structure—a home, an office building, or other complex structure—serves as a platform for discussing the benefits of a model and modeling. A model is an abstract representation of a real world environment. Modeling classifies certain aspects of building into things called entities. The concept of entities immediately presents a challenge for relating modeling to a real-world environment. Entity is a term used to classify people, places, things, ideas, concepts, or events that are relevant to the business. The key here is to understand the motivation for entities. Entities provide a way to group things that have common characteristics or role with respect to the business. For example, within a construction company, entities include nails, boards, and windows; carpenters, plumbers, and electricians; contracts; trucks; and other components of the business. As a general categorization, entities is a convenient catch-all for anything with which the construction project leader—the general contractor (GC)—has to be concerned. The benefit is that the GC can look at one list of things needed to build the house.
Appendix A. Models and Modeling
129
Entity: classifying things
Entities can be grouped to simplify their use
Further reflection reveals that these entities are not all the same, that some subsets of these entities have common characteristics. If it is of benefit to group all entities, then it is of more benefit to group them so that the common aspects of the subgroupings can be utilized.
A.1.1 Entity: Things Entities make the overall process simpler
The next step is to create entities within the larger-scope Entity, based on these common aspects or ways of being used. For example, nails and boards have attributes in common: they are both things to be ordered, stored, and physically incorporated into the structure. We therefore create a specific type of entity called Materials. Materials is a grouping of things that are handled in a similar manner from a business perspective and typically have common attributes. The attributes describe the nature of the entity. In this case, the attributes are the color, size, and other physical aspects of the thing. The value here is that the GC can use one order sheet for all things that fit into the Entity category Materials. The GC can use another form for all subcontracts for services. The GC′s job is now easier because the different parts of the job can be generalized.
Entities help generalize data processing processes
At this point, we have introduced two perspectives on things essential to the business: the way these things fit into the business—what the business does with the thing—and the attributes or descriptions of these things. The benefit then is that we can think of the things that are part of our business as a general group. We can also generalize the business activities performed against the things.
A.1.2 Entity: Agreements The next set of entities is centered around the contract, or agreement, legal or otherwise, that is part of building the structure. Contracts are written, agreed to, and enforced. Contracts are different from nails and boards, which are ordered, installed, or are part of the physical structure. By separating these two sets, we can treat them appropriately from a business point of view. What is of more interest here is that they can be treated consistently from a data processing point of view. That is, application code can be written to consistently operate on data about things defined as being the same entity. Furthermore, application code written to operate on data about things used in building a house is consistent with application code written to operate on data about those same things used in building an office. This approach suggests that contracts in the construction business, categorized as an entity called agreements, can be viewed from both a business and a data processing perspective as similar to contracts in the insurance industry, where they are policies.
130
The Retail Industry IW
A.2 The Annual Report As a Model We can then extend these concepts to the corporate statement. The annual report presents the financial status of an enterprise in terms of its assets and liabilities. Both the assets and liabilities can be seen as entities, but this is still rather ambiguous with respect to common experience. A better example is the subheading Plant and Property under assets. This is a financial view of things the enterprise has as an asset. This perspective on the enterprise is similar to the points we stressed as being a benefit for modeling, entities, and the general categorization philosophy of modeling. Regardless of the industry within which the enterprise does business, the enterprise always has some type of plant and property.
The annual report is a model
From an industry perspective, we can treat all plant and property as something that has value, that exists. From a data processing perspective, we can expect to write applications that sum up the present value of that plant and property. Furthermore, we can expect to write applications that depreciate that plant and property over time. The treatment and expectations of these business objects are independent of the enterprise′s industry. We have gained perspective on a component of the business and have gained an opportunity to leverage data processing resources by this generalization, which is wholly compatible with the objectives of modeling.
A.3 Information Warehouse and Modeling Models contain representations of the enterprise′s business in the form of Entities and other modeling constructs. These constructs contain technical and descriptive information about the business objects. The technical information includes data type and length and may in fact be used in limited ways by an application generator. The descriptive information is for reading and understanding purposes only for the model user. This descriptive information is called meta-data and is a crucial part of an the Information Warehouse environment.
Models are a formal representation
Meta-data explains the meaning of the object and the data processing object in business terms. It helps the administrator know whether the business/data processing object is needed for informational analysis. It also helps the knowledge worker understand the meaning of the object, once it is incorporated into the Information Warehouse implementation. This connection between modeling, the model, and the dictionary for the knowledge worker (the Information Catalog) is dependent on a subset of the model information in the Information Catalog.
Meta-data explains object meaning
The descriptive information also reflects the reconciliation and enhancement process. It describes the business/data processing object as it exists. The knowledge administrator and the business analyst know what the informational data needs to look like for use in an informational environment. The processes of decoding, reconciling, and enhancing operational data for use in an informational environment are based on these before and after descriptions.
Meta-data reflects data enhancement
Appendix A. Models and Modeling
131
132
The Retail Industry IW
List of Abbreviations ATM
automated machine
GOI
generic face
output
inter-
CPU
central unit
processing
GSA
General Store cation
Appli-
DASD
direct access storage device
HHT
hand-held terminal
IAA
DBA
database trator
Insurance Application Architecture
IBM
DIS
data system
International Business Machines Corporation
ITSO
DRDA
Distributed Relational Database Architecture
International Technical Support Organization
EDI
electronic data interchange
MIS
management information system
EFT
electronic transfer
funds
PWS
programmable station
FAA
Financial Application Architecture
ROA
return on assets
RAA
GMROI
gross margin return on investments
Retail Application Architecture
UPC
universal code
teller
adminisinterpretation
List of Abbreviations
work-
product
133
134
The Retail Industry IW
Index A access enablers AS V3 94 DRDA 77 Information Warehouse architecture purpose 77 SQL mappers 74 actions panel 112 airline industry 35 application informational 4 architecture industry 5 Information Warehouse 5
C C+ + 106 client-server DataGuide/2 POS 31
116
D data aggregation 44 data replication effort 25 DataGuide application program interface categories 121 client-server 116 collections 113 contact information 114 DataGuide/MVS 105, 125 export/import interface 124 input/output structures 124 interface enabling 75
74
DataGuide (continued) knowledge worker end-user interface 105 overview 105 user types 106 work area 106 DataHub copy tool 83 systems management 68 DB2 data compression 99 DB2 V3.1 91 I/O parallelism 95 in-store processor 63 interface enabling 75 platform flexibility 77 remarks column 124 S/390 Parallel Query Server 98 deploying product 75 discovery 35 Distributed Relational Database Architecture See DRDA DRDA in Access Enablers 77 drill-down in DataGuide/2 112
E
123
electronic funds transfer role 19 embedded SQL 76 enabling product 75 entity classify 129
Index
135
meta-data (continued) Information Catalog 80 interface 76, 80 management 115, 120 movement 116 search 109 services 124 users 106 model -based application generation abstract 129
G GUI objects
106
I information systems department new role 45 Information Warehouse architecture Information Catalog API 105 meta-data 104, 112 objective 71 Information Warehouse framework connectivity 68 DataHub 68 definition 67 informational application interface enabling 75 Personal AS/2 83 interfaces Information Catalog 80 Information Catalog export/import
K knowledge worker definition 4
N navigation definition 111 nonprogrammable terminals in retail industry 23
O 81
object term usage 106 organization asset data categories 87 data and meta-data 88 S/390 Parallel Query Server
P L launch
104
Personal System/2 in retail industry process business 37
23
M meta-data API 105 CASE tool 124 categories 123 customization 123 data processing terms 61 definition 89, 112 example 102 export/import 81 export/import interface 120, 124 extensibility 116 extractor 81, 125 grouping 115 heterogeneous 80 in organization asset data 88
136
The Retail Industry IW
129
R reconciliation in retail industry 29 retail cycle definition 60 retail industry study challenges 14 local data access 41 primary resources 14
82
S search dialog 107 keyword 107 navigational 112 SLDM 63 solution thread objective 4 SQL call level interface 76 Store Logical Data Model See SLDM strategy retail industry 23 summary data retail industry 41
Index
137
ITSO Technical Bulletin Evaluation
RED000
Information Warehouse in The Retail Industry Publication No. GG24-4342-00 Your feedback is very important to help us maintain the quality of ITSO Bulletins. Please fill out this questionnaire and return it using one of the following methods: •
Mail it to the address on the back (postage paid in U.S. only) Give it to an IBM marketing representative for mailing Fax it to: Your International Access Code + 1 914 432 8246 Send a note to
[email protected]
• • •
Please rate on a scale of 1 to 5 the subjects below. (1 = very good, 2 = good, 3 = average, 4 = poor, 5 = very poor) Overall Satisfaction
____
Organization of the book Accuracy of the information Relevance of the information Completeness of the information Value of illustrations
____ ____ ____ ____ ____
Grammar/punctuation/spelling Ease of reading and understanding Ease of finding information Level of technical detail Print quality
____ ____ ____ ____ ____
Please answer the following questions: a)
If you are an employee of IBM or its subsidiaries: Do you provide billable services for 20% or more of your time?
Yes____ No____
Are you in a Services Organization?
Yes____ No____
b)
Are you working in the USA?
Yes____ No____
c)
Was the Bulletin published in time for your needs?
Yes____ No____
d)
Did this Bulletin meet your needs?
Yes____ No____
If no, please explain:
What other topics would you like to see in this Bulletin?
What other Technical Bulletins would you like to see published?
Comments/Suggestions:
Name
Company or Organization
Phone No.
( THANK YOU FOR YOUR FEEDBACK! )
Address
ITSO Technical Bulletin Evaluation GG24-4342-00
Fold and Tape
RED000
Please do not staple
IBML
Cut or Fold Along Line
Fold and Tape
NO POSTAGE NECESSARY IF MAILED IN THE UNITED STATES
BUSINESS REPLY MAIL FIRST-CLASS MAIL
PERMIT NO. 40
ARMONK, NEW YORK
POSTAGE WILL BE PAID BY ADDRESSEE
IBM International Technical Support Organization Department 471, Building 070B 5600 COTTLE ROAD SAN JOSE CA USA 95193-0001
Fold and Tape
GG24-4342-00
Please do not staple
Fold and Tape
Cut or Fold Along Line
Printed in U.S.A.