My Mother Thinks I’m a DBA! Cross-Platform, Multi-Vendor, Distributed Relational Data Replication with IBM DB2 DataPropagator and IBM DataJoiner Made Easy! Olivier Bonnet, Simon Harris, Christian Lenke, Li Yan Zhou, Thomas Groh
International Technical Support Organization http://www.redbooks.ibm.com
SG24-5463-00
SG24-5463-00
International Technical Support Organization My Mother Thinks I’m a DBA! Cross-Platform, Multi-Vendor, Distributed Relational Data Replication with IBM DB2 DataPropagator and IBM DataJoiner Made Easy! June 1999
Take Note! Before using this information and the product it supports, be sure to read the general information in Appendix G, “Special Notices” on page 393.
First Edition (June 1999) This edition applies to Version 5.1 of IBM DB2 DataPropagator Relational Capture for MVS, 5655-A23, Version 5.1 of IBM DB2 DataPropagator Relational Apply for MVS, 5655-A22, Version 5.1 of IBM DB2 DataPropagator Relational for AS/400, Version 5.2 of IBM DB2 Universal Database, and Version 2.1.1 of IBM DataJoiner, 5801-AAR Comments may be addressed to: IBM Corporation, International Technical Support Organization Dept. QXXE Building 80-E2 650 Harry Road San Jose, California 95120-6099 When you send information to IBM, you grant IBM a non-exclusive right to use or distribute the information in any way it believes appropriate without incurring any obligation to you.
© Copyright International Business Machines Corporation 1999. All rights reserved Note to U.S Government Users – Documentation related to restricted rights – Use, duplication or disclosure is subject to restrictions set forth in GSA ADP Schedule Contract with IBM Corp.
Contents Figures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .xi Tables. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xv Preface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xvii The Team That Wrote This Redbook . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xviii Comments Welcome . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xx Part 1. Heterogeneous Data Replication—General Discussion . . . . . . . . . . . . . . . . . . . 1 Chapter 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 1.1 Why Replication? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 1.2 Why Multi-Vendor? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 1.3 How to Use this Book? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 1.3.1 The Theory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 1.3.2 The Practical Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 1.4 Technical Warm-Up . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 1.4.1 IBM DataPropagator—Architectural Overview . . . . . . . . . . . . . . . 7 1.4.2 Extending IBM Replication to a Non-IBM RDBMS. . . . . . . . . . . . . 9 1.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 Chapter 2. Planning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13 2.1 Organizing Your Project . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14 2.2 Gathering the Detailed Requirements . . . . . . . . . . . . . . . . . . . . . . . . . 16 2.2.1 The Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16 2.2.2 List of Questions. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16 2.3 Determining the Replication Sources and Replication Targets . . . . . . 18 2.4 Technical Planning Considerations. . . . . . . . . . . . . . . . . . . . . . . . . . . 20 2.4.1 Estimating the Data Volumes . . . . . . . . . . . . . . . . . . . . . . . . . . . 20 2.4.2 About CPU, Memory, and Network Sizing. . . . . . . . . . . . . . . . . . 24 2.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25 Chapter 3. System and Replication Design—Architecture . . . . . . . . . 27 3.1 Principles of Heterogeneous Replication . . . . . . . . . . . . . . . . . . . . . . 28 3.1.1 Extending DProp Replication to Multi-Vendor Replication . . . . . . 30 3.1.2 Overview of the Most Common Replication Architectures . . . . . . 33 3.2 System Design Options . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39 3.2.1 Apply Program Placement: Pull or Push . . . . . . . . . . . . . . . . . . . 39 3.2.2 DataJoiner Placement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40 3.2.3 Control Tables Placement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46 3.3 Replication Design Options . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
© Copyright IBM Corp. 1999
iii
3.3.1 Target Table Types . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49 3.3.2 Replication Timing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51 3.3.3 Using Blocking Factor. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54 3.4 Performance Considerations for Capture Triggers . . . . . . . . . . . . . . . 55 3.4.1 Description of the Performance Test Setup. . . . . . . . . . . . . . . . . 56 3.4.2 Test Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57 3.4.3 Conclusion of the Performance Test . . . . . . . . . . . . . . . . . . . . . . 58 3.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59 Chapter 4. General Implementation Guidelines . . . . . . . . . . . . . . . . . . 61 4.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61 4.2 Components of a Heterogeneous Replication System . . . . . . . . . . . . 63 4.3 Setting up a Heterogeneous Replication System . . . . . . . . . . . . . . . . 63 4.3.1 The Implementation Checklist. . . . . . . . . . . . . . . . . . . . . . . . . . . 63 4.3.2 How to Use the Implementation Checklist . . . . . . . . . . . . . . . . . . 65 4.4 Detailed Description of the Implementation Tasks . . . . . . . . . . . . . . . 65 4.4.1 Set Up the Database Middleware Server . . . . . . . . . . . . . . . . . . 65 4.4.2 Implement the Replication Subcomponents (Capture, Apply) . . . 71 4.4.3 Set Up the Replication Administration Workstation . . . . . . . . . . . 71 4.4.4 Create the Replication Control Tables . . . . . . . . . . . . . . . . . . . . 74 4.4.5 Bind DProp Capture and DProp Apply . . . . . . . . . . . . . . . . . . . . 74 4.4.6 Status After Implementing the System Design . . . . . . . . . . . . . . 76 4.5 Next Steps—Implementing the Replication Design . . . . . . . . . . . . . . . 77 4.5.1 Replication Design for Multi-Vendor Target Servers . . . . . . . . . . 77 4.5.2 Replication Design for Multi-Vendor Source Servers . . . . . . . . . 79 4.6 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82 Chapter 5. Replication Operation, Monitoring and Tuning . . . . . . . . . 83 5.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83 5.2 Operating and Maintaining DProp Replication—Initial Tasks . . . . . . . 84 5.2.1 Initialization and Operation of the Capture Task . . . . . . . . . . . . . 85 5.2.2 Initialization of Replication Subscriptions . . . . . . . . . . . . . . . . . . 86 5.3 Operating and Maintaining DProp Replication—Repetitive Tasks . . . . 91 5.3.1 Database Related Housekeeping . . . . . . . . . . . . . . . . . . . . . . . . 91 5.3.2 Pruning. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91 5.3.3 Utility Operations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95 5.4 Monitoring a Distributed Heterogeneous Replication System . . . . . . . 98 5.4.1 Components That Need Monitoring . . . . . . . . . . . . . . . . . . . . . . 98 5.4.2 DProp’s Open Monitoring Interface . . . . . . . . . . . . . . . . . . . . . . . 99 5.4.3 Monitoring Capture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101 5.4.4 Monitoring Apply. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 106 5.4.5 Monitoring the Database Middleware Server . . . . . . . . . . . . . . 116 5.5 Tuning Replication Performance. . . . . . . . . . . . . . . . . . . . . . . . . . . . 117
iv
The IBM Data Replication Solution
5.5.1 Running Capture with the Appropriate Priority . . . . . . . . . . . . . 118 5.5.2 Adjusting Capture Tuning Parameters . . . . . . . . . . . . . . . . . . . 118 5.5.3 Using Separate Tablespaces . . . . . . . . . . . . . . . . . . . . . . . . . . 120 5.5.4 Choosing Appropriate Lock Rules. . . . . . . . . . . . . . . . . . . . . . . 121 5.5.5 Using the Proposed Change Data Indexes . . . . . . . . . . . . . . . . 121 5.5.6 Updating Database Statistics . . . . . . . . . . . . . . . . . . . . . . . . . . 122 5.5.7 Making Use of Subscription Sets . . . . . . . . . . . . . . . . . . . . . . . 122 5.5.8 Using Pull Rather Than Push Replication . . . . . . . . . . . . . . . . . 125 5.5.9 Using Multiple Apply Processes in Parallel . . . . . . . . . . . . . . . . 125 5.5.10 Using High Performance Full Refresh Techniques . . . . . . . . . 125 5.5.11 Using Memory Rather Than Disk for the Spill File . . . . . . . . . . 126 5.5.12 Enabling Block Fetch for Apply . . . . . . . . . . . . . . . . . . . . . . . . 126 5.5.13 Tuning Pruning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 127 5.5.14 Optimizing Network Performance . . . . . . . . . . . . . . . . . . . . . . 128 5.5.15 DB2 for OS/390 Data Sharing Remarks . . . . . . . . . . . . . . . . . 128 5.6 Other Useful Techniques . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 129 5.6.1 Deactivating Subscription Sets . . . . . . . . . . . . . . . . . . . . . . . . . 129 5.6.2 Selectively Preventing Automatic Full Refreshes . . . . . . . . . . . 129 5.6.3 Full Refresh on Demand . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 132 5.6.4 Dropping Unnecessary Capture Triggers for Non-IBM Sources 133 5.6.5 Modifying Triggers for Non-IBM Sources . . . . . . . . . . . . . . . . . 134 5.6.6 Changing Apply Qualifier or Set Name for a Subscription Set . . 134 5.7 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 136 Part 2. Heterogeneous Data Replication—Case Studies . . . . . . . . . . . . . . . . . . . . . . 137 Chapter 6. Case Study 1—Point of Sale Data Consolidation, Retail . 139 6.1 The Business Problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 140 6.2 Architecting the Replication Solution . . . . . . . . . . . . . . . . . . . . . . . . 142 6.2.1 Data Consolidation—System Design . . . . . . . . . . . . . . . . . . . . 142 6.2.2 Data Consolidation—Replication Design. . . . . . . . . . . . . . . . . . 145 6.3 Setting Up the System Environment . . . . . . . . . . . . . . . . . . . . . . . . . 149 6.3.1 The System Topology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 150 6.3.2 Configuration Tasks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 151 6.4 Nice Side Effect: Using SPUFI to Access Multi-Vendor Data . . . . . . 158 6.5 Implementing the Replication Design . . . . . . . . . . . . . . . . . . . . . . . . 159 6.5.1 Registering the Replication Sources . . . . . . . . . . . . . . . . . . . . . 159 6.5.2 Preparation of the Target Site Union. . . . . . . . . . . . . . . . . . . . . 160 6.5.3 Defining Replication Subscriptions . . . . . . . . . . . . . . . . . . . . . . 161 6.5.4 Starting Apply . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 162 6.5.5 Some Performance Remarks . . . . . . . . . . . . . . . . . . . . . . . . . . 163 6.6 Moving from Test to Production . . . . . . . . . . . . . . . . . . . . . . . . . . . . 164 6.7 Some Background on Replicating from Multi-Vendor Sources . . . . . 166
v
6.7.1 Using Triggers to Emulate Capture Functions. . . . . . . . . . . 6.7.2 The Change Data Table for a Non-IBM Replication Source 6.7.3 How Apply Replicates the Changes from Non-IBM Sources 6.8 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . .
. . . .
. 166 . 169 . 169 . 171
Chapter 7. Case Study 2—Product Data Distribution, Retail . . . . . . . 173 7.1 The Business Problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 173 7.1.1 Source Data Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 175 7.1.2 Target Data Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 176 7.2 Architecting the Replication Solution . . . . . . . . . . . . . . . . . . . . . . . . 177 7.2.1 Data Distribution—System Design . . . . . . . . . . . . . . . . . . . . . . 177 7.2.2 Data Distribution—Replication Design . . . . . . . . . . . . . . . . . . . 181 7.3 Setting Up the System Environment . . . . . . . . . . . . . . . . . . . . . . . . . 186 7.3.1 The System Topology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 186 7.3.2 Configuration Tasks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 188 7.4 Implementing the Replication Design . . . . . . . . . . . . . . . . . . . . . . . . 192 7.4.1 Define DB2 for OS/390 as Replication Source . . . . . . . . . . . . . 192 7.4.2 Define Empty Subscription Sets . . . . . . . . . . . . . . . . . . . . . . . . 194 7.4.3 Create a Password File . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 196 7.4.4 Add Members to the Subscription Sets . . . . . . . . . . . . . . . . . . . 196 7.4.5 Add Statements or Stored Procedures to Subscription Sets . . . 199 7.4.6 Start DProp Capture and Apply on the Host . . . . . . . . . . . . . . . 200 7.4.7 Start DProp Apply on the DataJoiner Server . . . . . . . . . . . . . . . 200 7.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 201 Chapter 8. Case Study 3—Feeding a Data Warehouse. . . . . . . . . . . . 203 8.1 The Business Problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 203 8.1.1 Source Data Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 205 8.1.2 Target Data Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 206 8.2 Architecting the Replication Solution . . . . . . . . . . . . . . . . . . . . . . . . 209 8.2.1 Feeding a Data Warehouse—System Design . . . . . . . . . . . . . . 209 8.2.2 Feeding a Data Warehouse—Replication Design . . . . . . . . . . . 210 8.3 Setting Up the System Environment . . . . . . . . . . . . . . . . . . . . . . . . . 213 8.3.1 The System Topology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 213 8.3.2 Configuration Tasks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 214 8.4 Implementing the Replication Design . . . . . . . . . . . . . . . . . . . . . . . . 217 8.4.1 Defining the Subscription Set . . . . . . . . . . . . . . . . . . . . . . . . . . 219 8.4.2 Maintaining a Change History for Suppliers . . . . . . . . . . . . . . . 220 8.4.3 Using Target Site Views to Denormalize Outlet Information . . . 228 8.4.4 Using Source Site Joins to Denormalize Product Information . . 237 8.4.5 Using a CCD Target Table to Manage the Sales Facts . . . . . . . 245 8.4.6 Adding Temporal History Information to Target Tables . . . . . . . 250 8.4.7 Maintaining Aggregate Information . . . . . . . . . . . . . . . . . . . . . . 256
vi
The IBM Data Replication Solution
8.4.8 Pushing Down the Replication Status to Oracle . . . . . . . . . . . . 259 8.4.9 Initial Load of Data into the Data Warehouse . . . . . . . . . . . . . . 261 8.5 A Star Join Example Against the Data Warehouse Target Tables. . . 270 8.6 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 270 Chapter 9. Case Study 4—Sales Force Automation, Insurance. . . . . 271 9.1 The Business Problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 271 9.1.1 Data Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 272 9.1.2 Comments about the Table Structures . . . . . . . . . . . . . . . . . . . 273 9.2 Update-Anywhere Replication versus Multi-Site Update Programming . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 274 9.3 Architecting the Replication Solution . . . . . . . . . . . . . . . . . . . . . . . . 275 9.3.1 Ms Jet Update-Anywhere Scenario—System Design . . . . . . . . 276 9.3.2 MS Jet Update Anywhere—Replication Design. . . . . . . . . . . . . 277 9.4 Setting Up the System Environment . . . . . . . . . . . . . . . . . . . . . . . . . 277 9.4.1 The System Topology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 277 9.4.2 Configuration Tasks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 279 9.5 Implementing the Replication Design . . . . . . . . . . . . . . . . . . . . . . . . 287 9.5.1 Creating Source Views to Enable Subsetting . . . . . . . . . . . . . . 287 9.5.2 Registering the Replication Sources . . . . . . . . . . . . . . . . . . . . . 289 9.5.3 Defining the Replication Subscriptions . . . . . . . . . . . . . . . . . . . 292 9.5.4 Focus on Major Pitfalls . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 300 9.6 Replication Results for Sales Representative 1 . . . . . . . . . . . . . . . . 301 9.6.1 Contents of the Source Tables at the Beginning . . . . . . . . . . . . 302 9.6.2 Contents of the Main Control Tables at the Beginning . . . . . . . 303 9.6.3 Start ASNJET to Perform the Initial Full-Refresh . . . . . . . . . . . 306 9.6.4 Results of the Initial Full-Refresh . . . . . . . . . . . . . . . . . . . . . . . 306 9.6.5 Replicating Updates . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 311 9.7 Operational Aspects . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 314 9.7.1 Operational Implications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 314 9.7.2 Monitoring and Problem Determination . . . . . . . . . . . . . . . . . . . 316 9.8 Benefits of this Solution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 318 9.8.1 Other Configuration Options . . . . . . . . . . . . . . . . . . . . . . . . . . . 319 9.9 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 319 Appendix A. Index to Data Replication Tips, Tricks, Techniques . . . 321 Appendix B. Non-IBM Database Stuff . . . . . . . . . . . . . . . . . . . . . . . . . . . 325 B.1 Oracle Stuff . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 325 B.1.1 Configuring Oracle Connectivity . . . . . . . . . . . . . . . . . . . . . . . . . . . 325 B.1.2 Using Oracle’s SQL*Plus . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 325 B.1.3 The Oracle Data Dictionary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 326 B.1.4 Oracle Error Messages . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 327 B.1.5 Oracle Server . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 327 vii
B.1.6 Oracle Listener . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 328 B.1.7 Other Useful Oracle Tools . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 328 B.1.8 More Information. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 328 B.2 Informix Stuff. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 329 B.2.1 Configuring Informix Connectivity . . . . . . . . . . . . . . . . . . . . . . . . . . 329 B.2.2 Using Informix’s dbaccess . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 329 B.2.3 Informix Error Messages. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 330 B.2.4 More Information. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 330 B.3 Microsoft SQL Server Stuff . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 330 B.3.1 Configuring Microsoft SQL Server Connectivity . . . . . . . . . . . . . . . 330 B.3.2 Using the Microsoft Client OSQL . . . . . . . . . . . . . . . . . . . . . . . . . . 331 B.3.3 Microsoft SQL Server Data Dictionary . . . . . . . . . . . . . . . . . . . . . . 331 B.3.4 Helpful SQL Server Stored Procedures . . . . . . . . . . . . . . . . . . . . . 332 B.3.5 Microsoft SQL Server Error Messages . . . . . . . . . . . . . . . . . . . . . . 332 B.3.6 Microsoft SQL Server Administration . . . . . . . . . . . . . . . . . . . . . . . 332 B.3.7 ODBCPing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 333 B.3.8 More Information. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 333 B.4 Sybase SQL Server Stuff . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 333 B.4.1 Configuring Sybase SQL Server Connectivity . . . . . . . . . . . . . . . . 333 B.4.2 Using the Sybase Client isql . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 334 B.4.3 Sybase SQL Server Data Dictionary . . . . . . . . . . . . . . . . . . . . . . . 334 B.4.4 Helpful SQL Server Stored Procedures . . . . . . . . . . . . . . . . . . . . . 335 B.4.5 Sybase SQL Server Error Messages . . . . . . . . . . . . . . . . . . . . . . . 335 B.4.6 More Information. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 335 Appendix C. General Implementation Checklist . . . . . . . . . . . . . . . . . . 337 Appendix D. DJRA Generated SQL for Case Study 2. . . . . . . . . . . . . . 339 D.1 Define Replication Sources . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 339 D.2 Create Empty Subscription Sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 341 D.3 Add a Member to Subscription Sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . 342 D.4 Add Stored Procedure to Subscription Sets . . . . . . . . . . . . . . . . . . . . . . 345 Appendix E. DJRA Generated SQL for Case Study 3 . . . . . . . . . . . . . . 347 E.1 Output from Define the SALES_SET Subscription Set. . . . . . . . . . . . . . 347 E.2 Output from Register the Supplier Table . . . . . . . . . . . . . . . . . . . . . . . . 348 E.3 Output from Subscribe to the Supplier Table . . . . . . . . . . . . . . . . . . . . . 349 E.4 Output from Register the Store and Region Tables . . . . . . . . . . . . . . . . 351 E.5 Output from Subscribe to the Region Table . . . . . . . . . . . . . . . . . . . . . . 353 E.6 Output from Subscribe to the Store Table . . . . . . . . . . . . . . . . . . . . . . . 355 E.7 Output from Register the Items, ProdLine, and Brand Tables . . . . . . . . 357 E.8 Output from Register the Products View. . . . . . . . . . . . . . . . . . . . . . . . . 361 E.9 Output from Subscribe to the Products View . . . . . . . . . . . . . . . . . . . . . 362 E.10 Output from Register the Sales Table. . . . . . . . . . . . . . . . . . . . . . . . . . 365 viii
The IBM Data Replication Solution
E.11 Output from Subscribe to the Sales Table . . . . . . . . . . . . . . . . . . . . . . 366 E.12 SQL After to Support Temporal Histories for Supplier Table . . . . . . . . 369 E.13 Maintain Base Aggregate Table from Change Aggregate Subscription 370 Appendix F. DJRA Generated SQL for Case Study 4 . . . . . . . . . . . . . . 381 F.1 Structures of the Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 381 F.2 SQL Script to Define the CONTRACTS Table as a Replication Source . 383 F.3 SQL Script to Define the VCONTRACTS View as a Replication Source 385 F.4 SQL Script to Create the CUST0001 Empty Subscription Set . . . . . . . . 386 F.5 SQL Script to Add a Member to the CONT0001 Empty Subscription Set . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 387 Appendix G. Special Notices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 393 Appendix H. Related Publications. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 397 H.1 International Technical Support Organization Publications . . . . . . . . . . 397 H.2 Redbooks on CD-ROMs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 397 H.3 Other Publications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 397 H.4 Hot Web Sites . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 398 How to Get ITSO Redbooks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 401 How IBM Employees Can Get ITSO Redbooks . . . . . . . . . . . . . . . . . . . . . . . 401 How Customers Can Get ITSO Redbooks. . . . . . . . . . . . . . . . . . . . . . . . . . . 402 IBM Redbook Order Form . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 403 List of Abbreviations. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 405 Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 407 ITSO REDBOOK EVALUATION . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 411
ix
x
The IBM Data Replication Solution
Figures 1. 2. 3. 4. 5. 6. 7. 8. 9. 10. 11. 12. 13. 14. 15. 16. 17. 18. 19. 20. 21. 22. 23. 24. 25. 26. 27. 28. 29. 30. 31. 32. 33. 34. 35. 36. 37. 38. 39. 40.
Part 1 - Structural Overview. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 Components of a Replication System Using IBM DProp . . . . . . . . . . . . . . . 9 Extending DProp Replication Through Nickname Technology . . . . . . . . . 10 Reposition Yourself: Chapter 2—Overview . . . . . . . . . . . . . . . . . . . . . . . . 13 Reposition Yourself: Chapter 3—Overview . . . . . . . . . . . . . . . . . . . . . . . . 27 Replication to a Non-IBM Target (Components and Placement). . . . . . . . 34 Replication from a Non-IBM Source (Components and Placement) . . . . . 35 Replication Both Ways Between DB2 and a Non-IBM Database . . . . . . . 37 DJRA and DataJoiner Database Connectivity . . . . . . . . . . . . . . . . . . . . . . 39 DataJoiner Placement—Data Distribution to Non-IBM Targets . . . . . . . . . 42 DataJoiner Placement—Data Consolidation from Non-IBM Sources . . . . 45 Why One DataJoiner Database for Each Non-IBM Source Server? . . . . . 46 Apply and Control Server Placement. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47 Performance Analysis for Trigger Based Change Capture . . . . . . . . . . . . 58 Reposition Yourself: Chapter 4—Overview . . . . . . . . . . . . . . . . . . . . . . . . 62 Define a Non-IBM Table as a Replication Target . . . . . . . . . . . . . . . . . . . 78 Define a Non-IBM Table as a Replication Source . . . . . . . . . . . . . . . . . . . 80 Reposition Yourself: Chapter 5—Overview . . . . . . . . . . . . . . . . . . . . . . . . 83 Initial Handshake between Capture and Apply . . . . . . . . . . . . . . . . . . . . . 88 Apply Cycle at a Glance. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123 Case Study 1—High Level System Architecture . . . . . . . . . . . . . . . . . . . 140 Major Information Flows. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 141 Target Site UNION Example—Basic Idea . . . . . . . . . . . . . . . . . . . . . . . . 146 Add a Computed Column to a Subscription. . . . . . . . . . . . . . . . . . . . . . . 148 Case Study 1—Test System Topology . . . . . . . . . . . . . . . . . . . . . . . . . . 151 Replication Performance Remarks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 164 Example of an Oracle Change Capture Trigger (Insert Trigger) . . . . . . . 168 Replication from a Multi-Vendor Source Table . . . . . . . . . . . . . . . . . . . . 170 Case Study 2—High Level System Architecture . . . . . . . . . . . . . . . . . . . 174 Partial Data Model for the Retail Company Headquarters. . . . . . . . . . . . 176 Partial Data Model for a Branch of the Retail Company . . . . . . . . . . . . . 177 Data Distribution with Read-Only Target Tables . . . . . . . . . . . . . . . . . . . 178 Replicating to Non-IBM Target Tables. . . . . . . . . . . . . . . . . . . . . . . . . . . 179 One DataJoiner Connected to Multiple Store Servers . . . . . . . . . . . . . . . 180 One DataJoiner for Each Branch Office. . . . . . . . . . . . . . . . . . . . . . . . . . 180 Replication of the Product Information. . . . . . . . . . . . . . . . . . . . . . . . . . . 182 Three-Tier Replication Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . 184 Case Study 2—System Topology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 187 Configure Microsoft SQL Server Client Connectivity . . . . . . . . . . . . . . . . 189 Creating Replication Control Tables with DJRA . . . . . . . . . . . . . . . . . . . 191
© Copyright IBM Corp. 1999
xi
41. 42. 43. 44. 45. 46. 47. 48. 49. 50. 51. 52. 53. 54. 55. 56. 57. 58. 59. 60. 61. 62. 63. 64. 65. 66. 67. 68. 69. 70. 71. 72. 73. 74. 75. 76. 77. 78. 79. 80. 81. 82. 83.
xii
Register Table ITEMS as a Replication Source. . . . . . . . . . . . . . . . . . . . 193 Register Views as Replication Sources . . . . . . . . . . . . . . . . . . . . . . . . . . 194 Use DJRA to Create Empty Subscription Set. . . . . . . . . . . . . . . . . . . . . . 195 Create Empty Subscription Sets for CCDs . . . . . . . . . . . . . . . . . . . . . . . 196 Add a Member to Subscription Sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . 197 Data Subsetting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 198 Add Stored Procedure to Subscription Sets . . . . . . . . . . . . . . . . . . . . . . 200 The Business Environment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 204 Data Model Diagram of Source Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . 205 Data Model Diagram of Target Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . 207 Case Study 3—System Topology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 214 Create the SALES_SET Subscription Set . . . . . . . . . . . . . . . . . . . . . . . . 219 Transformation of Supplier Table. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 221 Define Supplier Table as a Replication Source . . . . . . . . . . . . . . . . . . . 222 Subscription Definition for Supplier Table . . . . . . . . . . . . . . . . . . . . . . . . 224 CCD Table Attributes for Supplier . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 225 Transformation of Store and Region into Outlets. . . . . . . . . . . . . . . . . . . 230 Registration of Store and Region Tables . . . . . . . . . . . . . . . . . . . . . . . . . 232 Defining the Region Subscription Member . . . . . . . . . . . . . . . . . . . . . . . 233 Subscription Definition for Store Table . . . . . . . . . . . . . . . . . . . . . . . . . . 234 Transformation of Items, ProdLine and Brand . . . . . . . . . . . . . . . . . . . . . 238 Defining Multiple Base Tables as Replication Sources . . . . . . . . . . . . . . 239 Defining a DB2 View as a Replication Source . . . . . . . . . . . . . . . . . . . . 241 Add Products View Subscription Definition . . . . . . . . . . . . . . . . . . . . . . . 243 Transformation of Sales. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 246 Registration Definition for Sales Table. . . . . . . . . . . . . . . . . . . . . . . . . . . 247 Subscription Definition for Sales Table . . . . . . . . . . . . . . . . . . . . . . . . . . 249 Adding the SQL After to Support Temporal Histories . . . . . . . . . . . . . . . 253 Maintain Base Aggregate Table from Change Aggregate Subscription . 258 Data Model. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 273 Case Study 4—System Topology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 278 General Implementation Steps . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 280 DB2 DataJoiner Replication Administration Main Panel . . . . . . . . . . . . . 285 Create Replication Control Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 286 Define One Table as a Replication Source . . . . . . . . . . . . . . . . . . . . . . . 289 Define the CONTRACTS Table as a Replication Source . . . . . . . . . . . . 290 Define DB2 Views as Replication Sources . . . . . . . . . . . . . . . . . . . . . . . 291 Define DB2 Views as Replication Sources - Continued... . . . . . . . . . . . . 292 Create Empty Subscription Sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 294 Create Empty Subscription Sets - Continued ... . . . . . . . . . . . . . . . . . . . 295 Add a Member in Subscription Sets. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 296 Add a Member in Subscription Sets - Continued... . . . . . . . . . . . . . . . . . 297 Microsoft Access Databases Created by ASNJET . . . . . . . . . . . . . . . . . 309
The IBM Data Replication Solution
84. Tables in the Target Database DBSR0001 . . . . . . . . . . . . . . . . . . . . . . . 310 85. Content of the CONTRACTS Table . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 311 86. Content of the Conflict Table for Contracts . . . . . . . . . . . . . . . . . . . . . . . 314
xiii
xiv
The IBM Data Replication Solution
Tables 1. 2. 3. 4. 5. 6. 7. 8. 9. 10. 11. 12. 13. 14. 15. 16. 17. 18. 19. 20.
Available Replication Features in a Heterogeneous Environment. . . . . . . 31 Determining the Status of Subscription Sets . . . . . . . . . . . . . . . . . . . . . . 107 Timestamp Information Available from the Subscription Set Table . . . . . 108 Number of Connections Needed to Fulfill Replication Task. . . . . . . . . . . 124 Informix Instances used in this Case Study . . . . . . . . . . . . . . . . . . . . . . . 152 Subscription Set Characteristics for the Data Consolidation Approach . . 162 Source Data Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 206 Target Data Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 208 Attributes of Supplier Target Table . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 220 ’Attributes of Store and Region Target Tables. . . . . . . . . . . . . . . . . . . . . 229 Replication Attributes of Items, ProdLine and Brand Tables . . . . . . . . . . 237 Replication Attributes of Sales Table . . . . . . . . . . . . . . . . . . . . . . . . . . . . 245 Index to Data Replication Tips, Tricks, and Techniques . . . . . . . . . . . . . 321 Useful Oracle Data Dictionary Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . 327 Invocation Parameters for OSQL. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 331 Useful SQL Server Data Dictionary Tables . . . . . . . . . . . . . . . . . . . . . . . 332 Microsoft SQL Server Stored Procedures . . . . . . . . . . . . . . . . . . . . . . . . 332 Invocation Parameters for isql . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 334 Useful SQL Server Data Dictionary Tables . . . . . . . . . . . . . . . . . . . . . . . 334 Microsoft SQL Server Stored Procedures . . . . . . . . . . . . . . . . . . . . . . . . 335
© Copyright IBM Corp. 1999
xv
xvi
The IBM Data Replication Solution
Preface DB2 DataPropagator (DProp) is IBM’s strategic data replication solution. As a tightly integrated component of IBM’s DB2 Universal Database (UDB) products, IBM DProp enables cross-platform data replication among all members of the DB2 UDB family. DProp is a separately orderable feature for DB2 for OS/390, DB2 for AS/400, DB2 for VM or VSE. In combination with other IBM products, such as IBM DataPropagator NonRelational (DPropNR), DataRefresher, or IBM DataJoiner, IBM DProp easily integrates non-relational data as well as data stored in non-IBM relational database systems into an enterprise-wide distributed multi-platform replication scenario. In this redbook we focus on how IBM’s data replication solution is extended to non-IBM relational database systems. IBM DataPropagator and IBM DataJoiner will be used to integrate non-IBM relational databases, such as Informix, Oracle, Sybase Open Server or Microsoft SQL Server, into IBM’s enterprise-wide replication solution. Additionally, we will demonstrate how mobile clients using Microsoft Access or other Microsoft Jet databases as a data store can be supplied with data maintained on central DB2 database servers. To make the redbook most useful to support both the design and the implementation phase of a heterogeneous replication project, the book covers general guidelines and specific case studies separately. First, we discuss general implementation options that are available to exploit the flexibility of IBM’s data replication solution. We present guidelines to help you specify your business driven requirements as well as to adjust available replication options and advanced techniques to those business requirements. General setup and maintenance tasks that basically apply to all kinds of heterogeneous data replication projects are covered. Next, we introduce practical case studies, showing how the previously discussed options closely relate to exemplary business requirements. The case studies are meant to be a reference for technically oriented project members whose duty it is to set up distributed, cross-platform, multi-vendor relational database systems. Implementation checklists are provided as well as recommendations for using some of the advanced replication options IBM DProp offers. To show the great variety of available options, each case study deals with a different business problem. Furthermore, in each case study we
© Copyright IBM Corp. 1999
xvii
focus on a different non-IBM database system, either used as source for data replication or as replication target. In the case studies, the DB2 databases are either DB2 for OS/390 databases or DB2 UDB for Windows NT databases, but the provided guidelines are also applicable as well for any other member of the DB2 family. These include: DB2 for AS/400, DB2 for UDB on UNIX platforms, DB2 UDB for OS/2, and DB2 for VM/VSE. Replication between relational databases and non-relational databases (Lotus Notes databases, IMS hierarchical databases, VSAM files) is not a topic of this redbook. But solutions exist: • Lotus Notes Pump (used in conjunction with DProp), for Lotus Notes databases • DPropNR (used in conjunction with DProp), for IMS hierarchical databases • DataRefresher, possibly in conjunction with Data Difference Utility (DDU) for IMS databases and VSAM files
The Team That Wrote This Redbook This redbook was produced by a team of specialists from around the world working at the International Technical Support Organization San Jose Center. Thomas Groh is a Data Management and Business Intelligence Specialist at the International Technical Support Organization, San Jose Center. He writes extensively and teaches IBM classes worldwide on all areas of Data Management and Data Warehousing. Before joining the ITSO in 1998, Thomas worked for IBM Professional Services in Vienna, Austria as an IT Architect. His technical background includes end-to-end design for traditional online transaction processing systems in mainframe and client/server environments, as well as designing, building, and managing data warehouse solutions ranging from small data marts to very large database implementations using massive parallel processing platforms. Thomas Groh managed the project that delivered this redbook. Olivier Bonnet is a Data Replication specialist in France. He provides both professional and support defect services for DProp and DataJoiner. He has three years of experience in the data replication field and has been working at IBM for thirteen years. He gained his DProp and DataJoiner experience from numerous customer projects on all DB2 platforms. Before this data replication experience, he was a project specialist in the AS/400 area for four years, and
xviii
The IBM Data Replication Solution
an application developer in the OS/390 environment for six years. He holds a high-school engineer diploma (Institut Industriel du Nord). Simon Harris is a Data Management pre-sales technical support specialist in Europe. He has nine years of experience in the Data Management field and holds a degree in Computing Science from Aston University, UK. His areas of expertise include DataJoiner, DB2 UDB, and now DProp. He is an IBM Certified Advanced Technical Expert in DB2 UDB. Christian Lenke is a Data Management Services Professional in Germany. He started to work on cross-platform data replication four years ago, in the early days of DProp V 1.2.1. He holds a Dipl. Wirtschaftsinformatiker from the University of Essen, Germany. He gained his experience on multi-vendor relational database applications in numerous customer projects, integrating DB2, Oracle, Informix and Sybase databases. Christian has really enjoyed being a member of this team, and writing on this hot topic. Li Yan Zhou is a Data Management Specialist in Beijing, China. She has three years of experience in supporting Data Management products in the IBM Software Solution Division. Thanks to the following people for their invaluable contributions to this project: Rob Goldring IBM Santa Teresa Lab Madhu Kochar IBM Santa Teresa Lab Micks Purnell IBM Santa Teresa Lab Kathy Kwong IBM Santa Teresa Lab Bob Haimovits International Technical Support Organization, Poughkeepsie Vasilis Karras International Technical Support Organization, Poughkeepsie
xix
Comments Welcome Your comments are important to us! We want our redbooks to be as helpful as possible. Please send us your comments about this or other redbooks in one of the following ways: • Fax the evaluation form found in “ITSO REDBOOK EVALUATION” on page 411 to the fax number shown on the form. • Use the electronic evaluation form found on the Redbooks Web sites: For Internet users For IBM Intranet users
http://www.redbooks.ibm.com http://w3.itso.ibm.com
• Send us a note at the following address:
[email protected]
xx
The IBM Data Replication Solution
Part 1. Heterogeneous Data Replication—General Discussion
© Copyright IBM Corp. 1999
1
2
The IBM Data Replication Solution
Chapter 1. Introduction Congratulations for choosing IBM’s replication solution for your enterprise-wide relational data replication system. Good choice! To make your first approach to enterprise-wide data replication a real success, this redbook will help you leap right into the world of cross-platform, multi-vendor, high-performance relational data replication, using IBM’s extended database middleware IBM DB2 DataPropagator (DProp) and IBM DataJoiner. Remark: Throughout this book, the term "multi-vendor" is used as a synonym for "non-IBM".
1.1 Why Replication? Today’s businesses are dependent on information and applications to manipulate this information. Providing the necessary data to the users is therefore a constant and major requirement. Generally, the data that makes up the information exists in the enterprises’ databases. These databases are either centralized (central host-based databases) or decentralized (either department-level databases or LAN-based databases). The major issue is to give the customers access to the right data, at the right time, and at an optimized cost. Two different approaches to data access can be chosen: • Remote access to data • Moving the data close to the user, which means replicating the data Each approach has advantages and drawbacks, and corresponds to different uses: • Remote access fits better in environments where no latency is allowed: The users need the data to be absolutely current. This approach requires a highly available and high-performance network. • Replication enables the users to have local copies of the data, so that they can use the data with local applications without needing a permanent connection to the enterprise’s databases. Whether you choose remote access or replication will be determined primarily by the acceptable or desired latency, and the ability of the network to
© Copyright IBM Corp. 1999
3
accommodate the data transfer requirements. Sometimes, the best solution could also be a combination of the two approaches. Basically, the most common uses of data replication are the following: • Data distribution from one source database towards many target databases. • Feeding a data warehouse from a production database, utilizing the data manipulation functions provided by the replication product (DProp). The replicated data can, for example, be enhanced, aggregated, and/or histories can be built. • Data consolidation from several source databases towards one target database.
1.2 Why Multi-Vendor? Today’s IT environments have become very complex and diverse. Legacy applications exist along with client/server applications, packaged applications that solve special business problems, and Web-based applications—each of them managing or accessing data organized in different databases on different platforms. Mergers and acquisitions of companies have also contributed to the diversity of the typical computing environments we face today. So, databases and applications from multiple different vendors on different hardware platforms have to co-exist and communicate. Their integration into a consolidated enterprise-wide database architecture is one of the biggest challenges the relational database technology has to face. With DataJoiner, both remote access to heterogeneous data sources and replication (in conjunction with DProp) between heterogeneous databases can be implemented.
1.3 How to Use this Book? Designing, implementing, and operating a cross-platform, multi-vendor data replication system is not a trivial task. You need a great variety of skills: • Replication skills (good knowledge of DProp and DataJoiner, as well as all the replication issues and techniques) • Heterogeneous connectivity skills • Database skills of course, for your different types of database servers • Systems skills on the various platforms
4
The IBM Data Replication Solution
• Network skills • Knowledge of the applications and how they manipulate the data The DProp reference documentation provides a lot of useful information about how to implement a replication system in a DB2-only environment. The DataJoiner reference documentation explains how DB2 client applications can access non-IBM databases such as Oracle, Informix, Microsoft SQL Server, and Sybase. But this redbook is the first publication that fully explains how you can combine DProp and DataJoiner to implement a heterogeneous data replication system. Of course, this book does not cover all the areas listed above. It will provide you with the guidelines and recommendations that you should follow during your heterogeneous data replication project. All the steps are detailed, and the book also gives you detailed examples for the setup of the most frequently used replication configurations. After you have read this book, you will be on your way to becoming a replication specialist and ready for practical hands-on experience. You will know how to handle your project, what you can expect from your replication system, how you can implement a test system, and which steps you should follow. Then you will need to get familiar with DProp and DataJoiner, and probably re-read some parts of this book, before you move to production.
1.3.1 The Theory The architecture of the first part of this book mirrors the phases of your heterogeneous data replication project: replication planning; replication design; general implementation guidelines; and operation, monitoring, and tuning. This is illustrated by Figure 1, which represents the structure of both the first part of the book and your project. To keep the figure simple, iterations, which are of course possible, are not displayed.
Introduction
5
Approaching Data Replication
Replication Planning
Replication Design
General Implementation Guidelines
Operation, Monitoring & Tuning
Figure 1. Part 1 - Structural Overview
Each phase is fully described in a separate chapter in the first part of the book. Each chapter gives all the guidelines and recommendations that should be followed to successfully achieve the objectives of the corresponding phase. To help you re-position yourself at the beginning of each chapter, the figure above is reproduced and detailed. Replication Planning (Chapter 2): The planning phase of a replication project is intended to gather all the business requirements that the replication system will have to fulfill, and precisely determine how the target tables will be derived from the source tables. Replication Design (Chapter 3): You will then define the technical design of the replication system, choosing the placement of the middleware components. In this chapter we will provide you with the necessary background information to help you choose between the many implementation options offered by the IBM replication solution, so that the system will fulfill your business requirement. General Implementation Guidelines (Chapter 4): To assist you with the actual implementation of the replication system, we developed a generally applicable implementation checklist by identifying, sequencing, and describing all the setup activities that are necessary before the actual data replication subsystems can be started.
6
The IBM Data Replication Solution
Operation, Monitoring, and Tuning (Chapter 5): This chapter contains all the detailed operational information that you should take into account before moving the multi-platform replication solution to your production environment.
1.3.2 The Practical Examples To deliver the biggest practical advantage, the guidelines and recommendations that we elaborated in Part 1 of this book are detailed in Part 2 in four separate case studies. To demonstrate the great variety of available solutions, each case study deals with a different business scenario: 1. Consolidating sales data (Chapter 6) 2. Distributing product data (Chapter 7) 3. Feeding a data warehouse using advanced replication techniques (Chapter 8) 4. Using update-anywhere in a mobile computing environment (Chapter 9) Additionally, each case study integrates a different multi-vendor database system into IBM’s replication solution: Informix, Oracle, Microsoft SQL Server, Microsoft Access. Before we jump into the phases of a replication project, let us have a look at the technical warm-up below. It explains the basic DProp and DataJoiner concepts that you will need to know to fully understand the contents of the next chapters.
1.4 Technical Warm-Up IBM DB2 DataPropagator (DProp) and IBM DataJoiner are the central components of IBM’s cross-platform replication solution. We will use these introductory sections to achieve a common understanding of broadly used technical expressions and to name and understand DProp’s distributed COMPONENTS.
1.4.1 IBM DataPropagator—Architectural Overview From a technical point of view, the three main activities involved when replicating database changes from a set of source tables to a set of target tables are: • Setting up the replication system • Capturing changes at the source database and store them into staging tables
Introduction
7
• Apply database changes from the staging tables to the target databases IBM DProp provides components to implement these main activities: • The Capture component asynchronously captures changes to database tables by reading the database log or journal. It places the captured changes into change data tables, also referred to as staging tables. • The Apply component reads the staging tables and applies the changes to the target tables. • The Administration component generates Data Definition Language (DDL) and Data Manipulation Language (DML) statements to configure both Capture and Apply. The two main tasks are to define replication sources (also referred to as registrations) and to create replication subscriptions. Replication sources are defined to limit the change capture activity to only those tables that are going to be replicated. Replication subscriptions contain all the settings the Apply program uses when replicating the change data to the target tables. To set up homogeneous replication between DB2 database systems, either the DB2 Control Center or the DataJoiner Replication Administration can be used. The setup of multi-vendor replication configurations (replication configurations where non-IBM relational databases are involved, either as sources or as targets) is only supported by the DataJoiner Replication Administration product (DJRA), which is included in DataJoiner. Basically, the three components operate independently and asynchronously to minimize the impact of replication on your applications and online transaction processing (OLTP) systems. The only interface between the different components of the replication system is a set of relational tables, the DProp control tables. The administration component feeds these control tables when you define the replication sources and the replication targets. The runtime components (Capture and Apply) read the control tables to find out what they have to do. They also update the control tables to report progress and synchronize their activities. The basic principles of DProp are illustrated by Figure 2 on page 9:
8
The IBM Data Replication Solution
SOURCE
TARGET CONTROL APPLY
BASE TARGET
UNIT OF WORK CHANGE DATA
TARGET TARGET
LOG
CONTROL CAPTURE
ADMINISTRATION
DProp Capture captures change data from the DB2 log. DProp Apply replicates change data to the target tables. Control tables are used to store the replication configuration. The Administration component is used to feed the control tables.
Figure 2. Components of a Replication System Using IBM DProp
1.4.2 Extending IBM Replication to a Non-IBM RDBMS IBM’s strategic solution to enable the transparent access to non-IBM relational data is IBM DataJoiner, IBM’s database middleware gateway. The DataJoiner server enables all kinds of DB2 clients, either DRDA clients, such as DB2 for OS/390 systems, or DB2 LAN clients, such as UNIX or Windows applications, to transparently access back-end data sources (DB2 or multi-vendor databases). Using DataJoiner in conjunction with DProp, the IBM replication solution is extended to multi-vendor relational database systems, such as Oracle, Informix Dynamic Server, Sybase SQL Server or Microsoft SQL Server. Since DB2 applications can transparently access multi-vendor databases through DataJoiner, DProp can also access those database systems, to apply the changes to multi-vendor platforms, or to use those non-IBM systems as sources for replication. To achieve the transparent access to non-IBM relational data, DataJoiner uses a nickname technology to reference the physical database objects (such as tables or stored procedures) that are stored in the remote data sources.
Introduction
9
The nicknames are created within the DataJoiner database. Once nicknames are in place, every DB2 client application, such as DProp Apply, can transparently access (read, write, or execute) the referenced database objects by simply accessing the nicknames. Client applications cannot distinguish if the accessed or invoked database object is locally stored in the DataJoiner database or just referenced by a nickname and physically stored within a back-end database. Figure 3 on page 10 illustrates the nickname technology.
Nickname 1
Table 1
Nickname n
Table n
Stored Procedure Nickname
Stored Procedure
DataJoiner Database
Multi-Vendor Database
DB2 Client Application
Figure 3. Extending DProp Replication Through Nickname Technology
To connect to non-IBM database systems, DataJoiner uses the native client software provided by the non-IBM databases. That means, an Oracle server is accessed using a SQL*Net client or Net8, an Informix server is accessed using ESQL/C, a Sybase server is accessed using dblib or ctlib and a Microsoft SQL Server is accessed using ODBC. It is when the connectivity to the back-end data source has been established, that nicknames can be created to reference database objects, such as tables, stored procedures, user defined types or user defined functions. Change capture, which for DB2 systems is achieved by reading the DB2 log, is achieved by using native triggers for all the supported non-IBM data sources. When a non-IBM table is registered for change replication, all the necessary triggers or stored procedures are automatically generated by the replication administration component (DataJoiner Replication Administration).
10
The IBM Data Replication Solution
1.5 Summary The purpose of this introduction was to briefly explain how the book is organized and how it should be used. It also gave you a technical warm-up, introducing the basic concepts of heterogeneous replication using DProp and DataJoiner.
Introduction
11
12
The IBM Data Replication Solution
Chapter 2. Planning The first thing you have to do when you begin studying your heterogeneous replication system is .... Think ! You have to prepare yourself to find out about the details of the business requirements that make you consider replication and details about your data. Do not go into the technical details too soon. After you have organized your project (just as any other project: staff the project, train the people, define the project plan) you must first clearly determine what the business requirements are, that is, what your users really need (which kind of data is needed, when is it needed, and for which purposes?) In this chapter, of course, we will not answer these questions. They depend on your specific business. But we will help you determine which questions you should ask the users to gather the business requirements, and focus on the general topics you should study before you move to the implementation phase of the project. This is illustrated by Figure 4:
Approaching Data Replication
Replication Planning
Replication Design
Organizing Your Project
Gathering Business Requirements
General Implementation Guidelines
Determining Replication Sources & Targets
Operation, Monitoring & Tuning
Technical Planning Considerations
Figure 4. Reposition Yourself: Chapter 2—Overview
© Copyright IBM Corp. 1999
13
At the end of this phase, you will have drawn a high-level global picture of your replication sources and replication targets, showing how the replication system will provide answers to the business requirements. Technical considerations have only minor importance during this phase. You will study the technical replication considerations later. At the end of this phase, you will also have written a document detailing the list of all the target tables, with all the columns, and the correlation between the target and the source data. Tables structures, column name mappings, and data types will have to be described. Remark: You certainly already have a general idea of what you want to build and why (for example, a datawarehouse for decision support, or a data distribution system). So perhaps you already started the business requirements gathering and analysis before you started reading this book (because you did not decide to implement a heterogeneous replication system just for the pleasure of having one, and the need to have such a replication system is probably not recent). If this is the case, use the present chapter to verify that you did not forget important things.
2.1 Organizing Your Project The implementation of a heterogeneous replication system is a project in itself. As for any other project, the planning phase includes organizational activities: • Identify project sponsor • Define project scope • Develop work plan and procedures (do not forget to plan for reviews and testing) • Assign resources and assets to the project • Evaluate and schedule the activities • Train the project team These activities are not detailed in this book, because they are not specific to this kind of project, but we wanted to remind you not to underestimate their importance and the associated workload. Recommendations for the project staffing: You will need at least the following skills: • A data replication professional with the following set of skills:
14
The IBM Data Replication Solution
• DProp and DataJoiner • Relational database skills, especially DB2 • Basic knowledge of the multi-vendor databases (for example, Oracle, Informix, Microsoft SQL Server) • Knowledge of the corresponding client access software of the involved databases • Platform skills (UNIX, Windows NT, OS/390, AS/400, depending on your IT environment) • Network and connectivity skills (TCP/IP, SNA) • Project management skills • Database administrator skills for each database product involved, with all the necessary access authorizations. The database administrator(s) must have a good knowledge of: • The table structures: • Of the source tables • Of the target tables • The applications that use the tables: • Applications that update the source tables • Applications that use the target tables • The SQL and utilities that the database servers provide: • The database servers on the source side • The database servers on the target side • Application specialists for applications on both the source and target side • System specialists for the different platforms involved • Network specialists Depending on the size of your company and the project scope, each role described above can be covered by one or more persons, or several roles can be covered by a single person. You will also have to involve the users in the project, as early as possible (you will need them during the business requirements definition, and then during the test phase).
Planning
15
2.2 Gathering the Detailed Requirements You must determine with the users, in detail, what their data needs are, and how the data is going to be used. Be sure to get information on the uses and business needs for the replicated data from all people who are important to the success of the project (that is, the users of the replicated data, the management of the department that needs the replicated data, other staff and management who have any interest in the data being replicated).
2.2.1 The Approach To gather the detailed requirements, you will of course have to organize meetings with the users, and it is important that you get them involved in the project as early as possible. Do not forget that you will need the users to actively test the new system once you have built it, and that it is the users’ commitment to the project that will determine its success. The list of questions below can help you prepare the for user interviews. You must also have a deep knowledge of the current data and applications, because you will need to determine how the future tables will relate to the existing ones. Perhaps some new tables will have to be created, or existing ones reorganized, or joined together. So you will need to review the application documentation and/or interview the programmers or software providers.
2.2.2 List of Questions The first user interviews, and your own knowledge of the existing data and applications, should help you answer the following questions: • Will the users need current data, or will they be able to work with data that is not current up to the second? • This is a very important question, because DProp is an asynchronous replication product. If the users need to always use the most recent data, perhaps you should consider other approaches, such as remote access or distributed transaction programming. • In particular, if the overall purpose of your project is to build a hot site recovery system, then asynchronous replication is probably not the correct approach. • What is an acceptable level of latency for the replicated data? One hour, one day, one week? (This should be identified for each of the target tables individually.)
16
The IBM Data Replication Solution
• Is all the data already present in the existing tables? • If not, are you able to create additional data, or to derive the new data from existing columns? • If yes, perhaps the data exists but it is present in several separate tables, and these tables are perhaps even located in different databases. • Will the users need to be able to update the replicated data, or will they only need to read the data? • If the users must be able to update the replicated data (to create new customer orders for example), is it a problem if someone else updates another copy of the data (same row), elsewhere, at the same time? If it is a problem, how can you prevent such a conflict from occurring, or how can you deal with update conflicts if you cannot prevent them? In case a conflict occurs, who should win the conflict? • Do you intend to have referential integrity constraints between the target tables? • Will all the users need the same data, or will they need different subsets of the data? • If they need different subsets of rows, do you have a subsetting criterion present in all the tables (department number for example)? Is the subsetting criterion contained in a column that can be updated in place by an existing application? Will some users need different subsets of columns? • Where will the users be located? How many different geographical locations will there be? • How will the users access the data? • Will they use laptops with local applications? • How will they connect to the corporate network (communications links)? How often? • How much data has to be replicated? What is the level of volatility (how many updates, inserts, deletes per hour or per day)? • Are there periods when insert/update/delete activity on source tables is more frequent, such as during batch processing? • Is the distributed database application homogeneous (that means, all your database servers are of the same type), or are you dealing with a multi-vendor environment, replicating data between databases of different vendors and across different system platforms?
Planning
17
• Will the users need history information that is not present in the corporate data? • Are there special auditing needs? • Is there a need of retaining the values of the columns before the record was changed in the tables that the users will use (before-images of columns)? • Are there some complex data manipulations that must be done on the data before the users use the data? • Are the existing tables normalized, and do you always follow the relational model recommendations (no update of primary keys in particular)? • Will the headquarters need consolidated data from geographically dispersed data? • Are there special filtering needs such as: Propagate the inserts and updates but not the deletes? Remark: In this book you will find implementation examples for nearly all the data replication requirements listed above. Table 1 on page 31 provides a list of the most important replication features and tells you where you can find the examples. Once you have answers to these questions, you know the business requirements and you have a more precise idea of what you will easily be able to provide (when you have all the requested data already available) and what will be more difficult to provide (when you do not have the requested data available!) You must, of course, consolidate and sort all the information that you have collected, and probably solve some conflicts (some users have contradictory requirements). Then you can move on to the next step and begin drawing the global picture that we were talking about in the introductory part of this chapter.
2.3 Determining the Replication Sources and Replication Targets Draw a high-level picture of all your replication sources and replication targets. Reminder: At this point, the focus is on building the overall architecture of the replication system. Try to avoid DProp or DataJoiner specific details, for example:
18
The IBM Data Replication Solution
• The middleware should not appear. • The DProp control tables should not appear. • The technical names of the application tables should not appear except if they are explicit from a business point of view, or if they are known by everybody in your organization. But you should name the available communication links (with no technical details) between the sources and the targets, and you should name the source and target platforms. The picture should, in fact, be considered as a communications media between you and the users, to show users how the replication system will provide them with new facilities and new functions. The picture should show which are the replication source tables and the replication target tables, where they are located, how and when the users will have access to the replicated data, and what they will be able to do with the data. But you must go farther, in your analysis, than just drawing this picture. You also have to develop a document detailing the list of all the target tables, with all the columns and their meaning, and explain how each column will be derived from the source data (source table and column name, or calculation formula). The document has to provide answers for the following questions: • How does the structure of the target tables relate to the structure of the source tables? • In which cases will data for a target table come from columns of multiple source tables? • In which cases will different columns of one source table be sent to different target tables? • In which cases is the column length or data type of the target column different from the source column? • In which cases does source data need to be translated? • In which cases does data for a target column need to come from multiple source columns? You will probably need the users’ help again to complete this document. So, in fact, the two steps (2.2, “Gathering the Detailed Requirements” on page 16, and 2.3, “Determining the Replication Sources and Replication Targets” on
Planning
19
page 18) are iterative. You will need several iterations to stabilize the requirements analysis documentation. So far, you have taken the users’ requirements into consideration, but you must also establish capacity planning requirements. The next section helps you do this.
2.4 Technical Planning Considerations In this section, we want to discuss the technical planning aspects that you should start thinking about before you actually move on to the implementation of the replication solution. You might want to revisit this section after you have gained a more detailed understanding of DProp, because the content is using DProp specifics.
2.4.1 Estimating the Data Volumes The introduction of a replication system (heterogeneous or not) will have a significant impact on the disk space utilization, because: • The changes to the source tables are captured and stored into staging tables, as long as they have not been replicated to the target (there is of course a mechanism to avoid an unlimited growth of the staging tables). • The change capture activity requires that additional information be logged into the log files. • The target tables will probably be new tables. • The replication system will use additional work files called spill files. The next sections help you estimate this additional disk space utilization. Advice: The estimation of the future volume of the staging tables is often a difficult task, because most database administrators do not know how many updates, inserts and deletes are performed on the source tables. So, some are tempted to just "forget" this essential task. But you will not do this. You really will spend some time trying to estimate, even roughly, how often your source tables are updated. If it is really too difficult, you can choose the following approach: Install the Capture component of DProp on your source production system (or the capture emulation triggers if your source is a non-IBM database), a long time before you are ready to actually move the whole replication system to production, then simulate a full-refresh so that Capture really starts capturing the updates (the way to do this is explained in 8.4.9, “Initial Load of Data into
20
The IBM Data Replication Solution
the Data Warehouse” on page 261), and let Capture run during a few days. Then you simply have to stop Capture (or drop the capture emulation triggers) and count how many rows you have in each staging table. 2.4.1.1 Change Data Tables and Unit-Of-Work Table Sizing The updates that are done to the source tables are captured and stored into staging tables. Change data tables (CD) for DB2 sources, consistent change data tables (CCD) for non-IBM sources. When you try to estimate the disk space the staging tables will use, it is not only important to know the size of the source tables. You must also know how many insert/delete/update operations will be made to the source tables, not in average but as a maximum. For example, imagine you have a source table that contains 1 million rows, and your daily applications only update one percent of the rows. You will have 10,000 new rows each day in the staging table. If the table is replicated regularly (several times a day for example), the staging table pruning mechanism will be able to remove rows regularly from the staging table, and so the staging table will never contain more than 10,000 rows. But now, imagine that for this table you have a new monthly application that updates all the rows. When the changes are captured, the staging table will contain 1 million rows. This illustrates the fact that you need to know both the size of your source tables and the maximum percentage of rows that are updated during one replication cycle. For some small tables (1000 rows or less; of course it also depends on the length of each row!) that are globally updated by batch programs each day and also propagated once a day, you might even consider that capturing and replicating the updates is not the best approach. You can configure DProp so that it should replicate this table in ’full-refresh’ mode only. The updates will not be captured nor stored in staging tables, and the Apply component of DProp will simply copy the whole content of the source table towards the target tables. This runmode should of course only be used in exceptional cases, because the main advantage of DProp is to provide change replication. When the source is a DB2 table, the capture component of DProp also inserts rows into a table called the unit-of-work table. A row is inserted in the unit-of-work table each time an application transaction issues a COMMIT and the transaction had executed an SQL insert/delete/update statement against a registered replication source table.
Planning
21
Except if you are using the technique explained in the remark above, precisely estimating the size of the staging tables could only be done after you have chosen all the replication parameters. But for the moment you only need to do a rough estimation, using some simplified formulas (see below). To size the staging tables, use the following simplified formula (the result is in bytes), then add a 50% security margin: (21 bytes + sum(length(registered columns)) x estimated number of inserts, updates, deletes to be captured during max(pruning interval,replication interval), for a busy day such as a month-end for example
If the source is a non-IBM database (staging table is CCD): Same formula as above, but replace 21 by 57.
To size the unit-of-work table, use the following simplified formula (the result is in bytes), then add a 50% security margin: 79 bites x estimated number of commits to be captured during max(pruning interval,replication interval), for a busy day such as a month-end for example
Remark: The formulas above assume that all the Apply processes will replicate with the same frequency. If you are in a configuration where one Apply could run very infrequently (this is the case for mobile replication environments for example), the effective pruning of the staging tables will be done according to another parameter that is called retention limit, and the size of the staging tables will probably be larger. Remark: With IBM’s replication solution, the definition of new replication targets does not cause any additional rows to be inserted into the staging tables. Capture inserts a source table change into the staging table once for all targets, and so additional disk space or log space is not needed. But please keep in mind that Capture will not prune a row from a staging table until it is replicated to all targets. If one of the targets is replicated less frequently, or stopped replicating, the change data table will grow larger with records that are retained for replication to this target. 2.4.1.2 Log Impact On most platforms, only the log space of the source system will increase. If the target platform is an AS/400, you will also need to consider the log space (journal receivers space) of the target tables. Several AS/400 customers do not usually journalize their AS/400 physical files; but DProp requires that both the source and the target AS/400 tables be journalled (logged). (On other platforms you do not have this choice; all DB2 tables are logged.)
22
The IBM Data Replication Solution
Log Impact for DB2 Source Systems DB2 source tables must have the DATA CAPTURE CHANGES attribute. Without this attribute, only the before/after values of changed columns are logged. With this attribute, the before/after values of all the columns are logged. This is why the definition of a DB2 table as a replication source might increase the log space. The increase in log space needed for your replication source tables will depend on the number of replication sources defined, the row length of the replication sources, the number of changes to those tables, and the number of columns updated by the application. As a rule-of-thumb, you can estimate that the log space needed for the replication source tables, after you have set the DATA CAPTURE CHANGES attribute, will be three times larger than the original log space needed for these tables. You also need to consider the increase in log space needed because the staging tables and the unit-of-work table are DB2 tables and so they are also logged.
Log Impact for Non-IBM Source Systems For non-IBM source systems, there is no capture component and the capture functions are emulated by triggers attached to the source tables. So there is no additional logging for the source tables. There is also no unit-of-work table. So the only log space increase is due to the logging of the staging tables. Log Impact for a Target AS/400 Since journal receivers for target AS/400 tables can be created with the MNGRCV(*SYSTEM) and DLTRCV(*YES) parameters, and since you only need to journal the after-image columns, use the following formula to estimate the volume of the journal receivers for the AS/400 target tables: Target table row length x Journal receiver threshold
2.4.1.3 Target Tables The target tables will probably be newly created tables, so you must estimate their volume. The estimation will mostly depend on: • The target tables’ type (complete or non-complete, condensed or non-condensed) • The number of rows of the source tables, and the rows subsetting criteria that you will define • The length of each row
Planning
23
2.4.1.4 Spill Files The Apply component of DProp uses work files called spill files when it fetches the data from either the source tables or the staging tables. Spill files can be large when there are many updates to replicate, or when the initial full refresh of a target table is performed. Refer to 5.5.11, “Using Memory Rather Than Disk for the Spill File” on page 126, and to the DPROPR Planning and Design Guide, SG24-4771 for more complete details about the spill files sizing estimation.
2.4.2 About CPU, Memory, and Network Sizing The CPU, memory, and network utilization will depend so much on the architecture and characteristics of your heterogeneous replication system (placement of components, number of sources and targets, speed of communication links, frequency of replication, number of updates), that we cannot give a rule-of-thumb sizing formula for these elements. Generally, the Capture component of DProp is not a high CPU consuming process, except during the pruning activity (that is why we recommend deferring pruning to off-peak hours), and it can be run with a low priority. Performance considerations about the capture emulation triggers, for a non-IBM replication source, are given in 3.4, “Performance Considerations for Capture Triggers” on page 55. The Apply component of DProp is a more CPU-consuming process than the Capture component. Apply is an SQL application that runs partly on the source server to fetch the updates from the staging tables, and partly on the target server to apply the updates to the target tables. The CPU consumption can be high on both sides. The easiest way to reduce the CPU consumption is to increase the replication interval, and you can also schedule the replication to run at off-peak hours. Apart from that, since Apply is an SQL application, tuning Apply is identical to tuning any other SQL application. You should check in particular that you have the correct indexes defined on the staging tables, on the unit-of-work table and on the target tables. See 5.5, “Tuning Replication Performance” on page 117 for more details. Most of all, the network capacity has a very significant impact on the overall performance of any replication system. Do not neglect it! The network is often the bottleneck of the whole system. These recommendations will help you design and implement the most efficient architecture with respect to your business and organizational requirements. Then, the best thing you can do to optimize the CPU, memory 24
The IBM Data Replication Solution
and network utilization, is to measure them with the appropriate tools, and have specialists study the results and tune the different performance parameters. Remarks: • You can also find useful CPU, memory and network sizing information in the DPROPR Planning and Design Guide, SG24-4771. Although it was written for DProp version 1, most of the guidelines it contains remain true for DProp version 5. • Testing the performance of your heterogeneous replication system on your test system will not necessarily be meaningful unless you are able to have: • A real pre-production environment with similar characteristics as the production environment • An automation tool to reproduce the workload of the production environment onto the pre-production environment.
2.5 Summary In this chapter we focused on the planning topics that you should study before you really start designing and implementing your heterogeneous replication system: • Organize your project as any other IT project. Do not forget to involve users and application specialists as early as possible. • Gather the detailed business requirements and determine the list of targets and the corresponding sources. To help you achieve this task we provided you with a checklist of the questions you should ask users, and yourself. After that you should be able to: • Draw a business oriented picture of the future replication system • Write a document describing the future target tables, and the origin of the data for all the columns • Estimate the impact of data replication on the IT environment You are now ready to go to the next phase and design the architecture of your heterogeneous replication system. See Chapter 3, “System and Replication Design—Architecture” on page 27.
Planning
25
26
The IBM Data Replication Solution
Chapter 3. System and Replication Design—Architecture After you have been through the planning phase of your heterogeneous replication project (see Chapter 2, “Planning” on page 13), and before you really start implementing the components of the technical solution (see Chapter 4, “General Implementation Guidelines” on page 61), you should spend some time thinking about the architectural aspects of your heterogeneous replication system. Within this chapter we will provide you with enough information to help you choose between the different options that are available when you build this architecture. Figure 5 shows where we are in the sequence of planning, designing, implementing, and operating heterogeneous replication.
Approaching Data Replication
Replication Planning
Principles of Heterogeneous Replication
Replication Design
Implementation Guidelines
System Design Options
Replication Operation & Maintenance
Replication Design Options
Figure 5. Reposition Yourself: Chapter 3—Overview
We will focus on the following topics: • Principles of heterogeneous replication: • What you can and cannot do, in regard to your business requirements • Overview of common replication architectures
© Copyright IBM Corp. 1999
27
• System design options The system design options section deals with the different possibilities for the placement of the software components (DataJoiner, DProp Capture, DProp Apply) and of the DProp control tables. • Replication design options The replication design options section deals with the replication options that you still have to choose once you have positioned your software components and chosen the control tables location. The most important options are: What target tables types you will choose, and how often you will propagate.
3.1 Principles of Heterogeneous Replication In Chapter 1.4, “Technical Warm-Up” on page 7, we provided an overview of the DProp and DataJoiner products: • DProp is the central product of IBM’s replication solution. It enables you to create and run powerful homogeneous replication systems. This means that you can propagate data from any DB2 database located on any platform (such as OS/390, VM, VSE, AS/400, AIX, OS/2, Windows NT, Windows 95, or Windows 98) to any other DB2 database. • DataJoiner is IBM’s strategic gateway to enable transparent access to non-IBM relational databases. With DataJoiner, the DProp Apply component can transparently access non-IBM relational databases as if they were DB2 databases. That is why the combination of DProp and DataJoiner can propagate data from any DB2 database towards non-IBM relational databases. Basically, DataJoiner can be considered as a DB2 engine (so that you can create databases and tables in DataJoiner) with an enhanced catalog that enables DataJoiner to play the role of a gateway towards other database systems. In the catalog you define nicknames that refer to objects (such as tables or procedures) that are located in the non-IBM database. You then use the nicknames each time you want to access the non-IBM objects. • DataJoiner Replication Administration (DJRA), the administration component that is included in DataJoiner, compensates for the absence of the Capture component on the non-IBM databases, by generating triggers that emulate the capture functions. That way, it is possible to propagate from non-IBM databases to any DB2 database. The IBM replication solution enables you to replicate data from (nearly) any relational database to (nearly) any other relational database. Since DProp
28
The IBM Data Replication Solution
and DataJoiner enable you to propagate from DB2 to non-IBM databases, and also from non-IBM databases to DB2, you can of course also set up a replication system from a non-IBM database towards another non-IBM database. For example, you can propagate from Oracle to Informix, from Informix to Sybase, from Microsoft SQL Server to Oracle, or from Informix to Informix— just be creative! In the case studies (in Part 2 of this book) we will only describe how to set up data replication between DB2 and non-IBM databases, but you can simply combine the various examples (non-IBM source, non-IBM target) to create other scenarios. There is only one case where you do not need the full functions of DataJoiner to replicate between DB2 and a non-IBM database: Microsoft Access. See Chapter 9, “Case Study 4—Sales Force Automation, Insurance” on page 271, where such a replication scenario is explained in detail. Data replication between a relational database and non-relational databases is not within the scope of this book, but solutions exist. For example: • You can use DProp and Lotus Notes Pump to propagate data between any DB2 database and Lotus Notes databases. You just need to create complete Consistent Change Data (CCD) tables in the DB2 database and feed these CCDs using the Apply component of DProp. Lotus Notes Pump will retrieve the updates from the CCDs and propagate them to the Lotus Notes databases. Refer to the IBM Redbook Lotus Solutions for the Enterprise, Volume 5 NotesPump: The Enterprise Data Mover, SG24-5255 for details. • You can use DataPropagator NonRelational or the Classic Connect feature of DataJoiner to access data from an IMS hierarchical database. DataPropagator NonRelational can feed a CCD table in DB2 for OS/390, that is then used by the DProp Apply component to propagate IMS database changes to DB2, or via DataJoiner to non-IBM relational databases. For instance, to replicate IMS changes to Oracle, use DataPropagator NonRelational to replicate the changes into a CCD table in DB2 for OS/390 and the Apply component of DataJoiner to replicate from the CCD table to Oracle. DataJoiner is not only used in heterogeneous replication systems. It can also be used in environments where there is a need for heterogeneous distributed data access (read and write). DataJoiner enables users and applications to perform distributed SQL queries over tables that are located in separate, multi-vendor databases. For example, you can execute a single SQL statement that joins a table located in an Oracle database, with a table
System and Replication Design—Architecture
29
located in a Sybase database, with a third table located in a DB2 for AS/400 database, and all this with tremendous performance (thanks to DataJoiner’s global optimizer)! Could you easily do that without DataJoiner? Let us forget all the other marvellous capabilities of DataJoiner for a while and go back to our heterogeneous replication topic.
3.1.1 Extending DProp Replication to Multi-Vendor Replication When used in a DB2-only environment, DProp has many powerful features that enable you to fulfill lots of business-driven replication requirements. They include: • Update-anywhere replication, with conflict detection and referential integrity constraints between the tables • Denormalization (including replication from join views) • Data transformation (columns-subsetting, use of calculated columns) • Data aggregation and temporal histories • Data distribution (one source to many targets), with the following possibilities: • Subsetting of rows using where-clauses • Use of updatable partitioning keys (updates are captured as delete+insert pairs) • Data consolidation (many sources to one target) • Additional logic with invoking SQL statements or stored procedures during replication • Multi-tier replication, using Consistent Change Data (CCD) staging tables • Support for occasionally connected, mobile environment • Large answer sets regulation (use of blocking factor) • Flexibility in the placement of the control tables • Flexibility in the placement of the Apply program (pull or push replication) Please refer to the DB2 Replication Guide and Reference,SR5H-0999 for more details on all these capabilities. Now you are probably wondering which of these DProp features can also be used when you add DataJoiner into the picture to propagate between DB2 and a non-IBM database.
30
The IBM Data Replication Solution
Table 1 shows which of the various features are available in a heterogeneous replication environment, and which are not. As you can see, nearly all the features are supported. Update-anywhere is only available for Microsoft Access. Some other features are available in all cases except Microsoft Access, because the replication between DB2 and Microsoft Access must always be configured as an update-anywhere replication, and these other features are not compatible with update-anywhere processing. The last column of this table indicates where you can find examples, in this book, to implement these features. Table 1. Available Replication Features in a Heterogeneous Environment
Replication Feature
Available if Source is non-IBM
Available if Target is non-IBM
Example Reference
Update-anywhere + conflict detection + referential integrity constraints support
Only with MS Access
Only with MS Access
Chapter 9, “Case Study 4—Sales Force Automation, Insurance” on page 271
Denormalization (use of Join views as sources)
N
Y
Chapter 7, “Case Study 2—Product Data Distribution, Retail” on page 173 Chapter 8, “Case Study 3—Feeding a Data Warehouse” on page 203 Chapter 9, “Case Study 4—Sales Force Automation, Insurance” on page 271
Data transformation (columns subsetting, use of calculated columns)
Y(*)
Y(*)
Chapter 8, “Case Study 3—Feeding a Data Warehouse” on page 203
Data aggregation
Y (*)
Y (*)
Chapter 8, “Case Study 3—Feeding a Data Warehouse” on page 203
Temporal histories
Y
Y
Chapter 8, “Case Study 3—Feeding a Data Warehouse” on page 203
Replication from non-IBM database
Y
Y
Chapter 6, “Case Study 1—Point of Sale Data Consolidation, Retail” on page 139
System and Replication Design—Architecture
31
Replication Feature
Available if Source is non-IBM
Available if Target is non-IBM
Example Reference
Rows subsetting
Y
Y
Chapter 7, “Case Study 2—Product Data Distribution, Retail” on page 173 Chapter 9, “Case Study 4—Sales Force Automation, Insurance” on page 271
Capture of updates as delete + insert pairs
Y
Y
Chapter 9, “Case Study 4—Sales Force Automation, Insurance” on page 271
Data consolidation
Y
Y
Chapter 6, “Case Study 1—Point of Sale Data Consolidation, Retail” on page 139
Run SQL statements or stored procedures
Y (*)
Y (*)
Chapter 7, “Case Study 2—Product Data Distribution, Retail” on page 173
Use of CCDs
Y
Y
Chapter 7, “Case Study 2—Product Data Distribution, Retail” on page 173 Chapter 8, “Case Study 3—Feeding a Data Warehouse” on page 203
Mobile environment
MS Access only
MS Access only
Chapter 9, “Case Study 4—Sales Force Automation, Insurance” on page 271
Large answer sets regulation
Not recommended
Y
None
Flexibility in the location of the Control tables
Y (*)
Y (*)
None
Flexibility in the location of the Apply program (pull, push)
Y (*)
Y (*)
None
Note: The (*) in the table above means ’except with MS Access’. Now that we have seen what you can do (and what you cannot do) according to your source database systems and your target database systems, let us have a look at some of the most common replication environments.
32
The IBM Data Replication Solution
3.1.2 Overview of the Most Common Replication Architectures The purpose of this section is to show how DProp and DataJoiner components can be used in different replication scenarios. The figures described in this section detail the components of the most common heterogeneous replication environments. It is important to understand that many more configurations are possible. 3.1.2.1 Replication from DB2 to a Non-IBM Target If you want to propagate from a DB2 database towards a non-IBM target database, you will need to have DataJoiner installed as a middleware component (except if the non-IBM database is Microsoft Access; see Chapter 9, “Case Study 4—Sales Force Automation, Insurance” on page 271). The most common configuration in this case is: • The Capture component of DProp is co-located with the DB2 Source. • The Apply component of DProp is co-located with DataJoiner. • The control tables used by Apply are stored in the DataJoiner database. Figure 6 on page 34 shows an example with a non-IBM target database and a DB2 source database. Of course there are possible alternatives, but let us keep it simple for the moment.
System and Replication Design—Architecture
33
DB2 Source Database
DataJoiner Database
Multi-Vendor Target Database
DataJoiner Global Catalog
Source Table
Target Nickname
CAPTURE
APPLY
CD Table
Target Table
UOW Table Apply Control Tables :
Register Table Pruncntl Table Reg_Synch Table
ASN.IBMSNAP_SUBS_SET ASN.IBMSNAP_SUBS_MEMBR ASN.IBMSNAP_SUBS_COLS ASN.IBMSNAP_SUBS_STMTS ASN.IBMSNAP_SUBS_EVENTS
Replication Direction
Figure 6. Replication to a Non-IBM Target (Components and Placement)
Apply will access the target table through a nickname that is defined in the DataJoiner database. 3.1.2.2 Replication from a Non-IBM Source Towards DB2 If you want to propagate from a non-IBM source database towards a DB2 database, you will also need to have DataJoiner installed (except if the non-IBM database is Microsoft Access; see Chapter 9, “Case Study 4—Sales Force Automation, Insurance” on page 271). The most common configuration in this case is: • There is no Capture component. The Capture functions are emulated by triggers at the source. All the necessary triggers will be automatically generated by DataJoiner Replication Administration (DJRA). • The Apply component of DProp is located at the DB2 target server.
34
The IBM Data Replication Solution
• The control tables used by Apply are stored in the target database. This configuration is shown in Figure 7.
Multi-Vendor Source Database
DataJoiner Database DataJoiner Global Catalog
DB2 Target Database
Insert Update Delete
Source Table
Source Nickname
Target Table
APPLY CCD Table
CCD Nickname
Prune Control Tables : Pruning Control
Pruning Control Nickname
Register
Register Nickname
ASN.IBMSNAP_SUBS_SET ASN.IBMSNAP_SUBS_MEMBR ASN.IBMSNAP_SUBS_COLS ASN.IBMSNAP_SUBS_STMTS ASN.IBMSNAP_SUBS_EVENTS ASN.IBMSNAP_APPLYTRAIL ASN.IBMSNAP_CCPPARMS ASN.IBMSNAP_UOW
Reg_Synch
Reg_Synch
Reg_Synch Nickname
Replication Direction
Figure 7. Replication from a Non-IBM Source (Components and Placement)
3.1.2.3 Two-Way Replication Between DB2 and a Non-IBM Database As we have said before, it is not possible to set up an update-anywhere configuration when either the source or the target is not a DB2 database (except with Microsoft Access). This means that you cannot propagate both ways to and from the same tables. But you can easily propagate some tables one way, and other tables the other way.
System and Replication Design—Architecture
35
For example, let us imagine you want to propagate data between an AS/400 and Microsoft SQL Server on Windows NT by replicating: • Some tables from the AS/400 towards Microsoft SQL Server • Some other tables from Microsoft SQL Server to the AS/400 To do this you just have to combine the environments discussed in the two previous sections. You can simplify the setup since the Apply program is able to run in both Pull and Push modes. Therefore, you only need to have a single Apply instance running in the DataJoiner Server, pulling the data from the AS/400 to Microsoft SQL Server, and pushing the data from Microsoft SQL Server to the AS/400. To avoid confusion when you define the replication sources and targets, it is better to define two DataJoiner databases, one for replicating data in each direction. The global picture would look like Figure 8:
36
The IBM Data Replication Solution
DB2 Database
Source Table 1
DataJoiner DJDB1
Nickname Target 1
Multi-Vendor Database
Target Table 1
CAPTURE Apply Control Tables : CD Table
UOW Table
Register Table Pruncntl Table Reg_Synch Table
ASN.IBMSNAP_SUBS_SET ASN.IBMSNAP_SUBS_MEMBR ASN.IBMSNAP_SUBS_COLS ASN.IBMSNAP_SUBS_STMTS ASN.IBMSNAP_SUBS_EVENTS
APPLY
AQLY1
Pull Mode
APPLY
AQLY2
Insert
Push Mode
Update
Target Table 2
Delete Source Table 2
DJDB2
CCD Table Nickname Source 2 Nickname CCD Nickname Register Nickname Pruncntl Nickname Reg_Synch
Prune
Pruning Control
Register
Apply Control Tables :
Reg_Synch
ASN.IBMSNAP_SUBS_SET ASN.IBMSNAP_SUBS_MEMBR ASN.IBMSNAP_SUBS_COLS ASN.IBMSNAP_SUBS_STMTS ASN.IBMSNAP_SUBS_EVENTS
Reg_Synch
Figure 8. Replication Both Ways Between DB2 and a Non-IBM Database
System and Replication Design—Architecture
37
3.1.2.4 The Administration Component The previous pictures show the runtime components, but not the administration component that is used to define the replication sources and the replication subscriptions. In fact, the administration component, DataJoiner Replication Administration (DJRA), is a Graphical User Interface (GUI) that does not need to be present when the replication is running, and that is why it does not appear in the previous examples. It can be installed on a Windows 95 or Windows 98 workstation, or on a Windows NT system. DJRA must be configured in such a way that it can access both the non-IBM databases and the DB2 databases: • To access the non-IBM databases, DJRA will connect to the DataJoiner database and DataJoiner will act as a gateway towards the non-IBM databases using the defined server mappings and user mappings (see Figure 9, A) . • To access the DB2 databases: Depending on the type of DB2 database, DJRA will: • Connect directly to the DB2 database: This is the case if the DB2 database is DB2 UDB or DB2 Common Server on Intel or RISC platforms (see Figure 9, C). • Connect to the DB2 database through DataJoiner: This is the case if the DB2 database is DB2 for OS/390, DB2 for AS/400, or DB2 for VSE. No server mapping is necessary, only the Distributed Database Connection Services (DDCS) function of DataJoiner is used (see Figure 9, B). Remark: It is also possible to create server mappings and nicknames for DB2 databases and tables. This DataJoiner feature is used, for example, when a DB2 table and an Oracle table must be joined together. DB2 access through nicknames is not recommended when you use DB2 objects for replication only. See Figure 9 for an overview of how the administration component relates to the DataJoiner server. For a more detailed discussion of the setup tasks necessary for the administration component, refer to Chapter 4, “General Implementation Guidelines” on page 61.
38
The IBM Data Replication Solution
DataJoiner
DataJoiner Database A
Server Mappings User Mappings
B
DDCS
A
Non-IBM Target Databases - Informix - Oracle - Microsoft SQL Server - Sybase SQL Server - SQL Anywhere
DJRA DB2 CAE
Replication Administration Workstation
C
Node Directory DB Directory DCS Directory
B
DataJoiner Server
IBM Target Databases - DB2 for OS/390 - DB2 for AS/400 - DB2 for VSE
- DB2 UDB - DB2 CS V2
Figure 9. DJRA and DataJoiner Database Connectivity
Remark: In a DB2-only environment, you have the choice of configuring your replication system using either: • DJRA • The Control Center of DB2 UDB In a multi-vendor database environment, you must use DJRA.
3.2 System Design Options This section describes the following system design options: • Apply program placement: pull or push • DataJoiner placement: centralized or decentralized • Control table placement: centralized or decentralized
3.2.1 Apply Program Placement: Pull or Push When Apply is running on the same machine as the target database, it pulls the data from the source. When Apply is running remotely from the target database, it pushes the data to the target. A pull configuration is typically
System and Replication Design—Architecture
39
between 10 and 30 times more efficient than a push configuration. The push versus pull performance difference is due to the design of the underlying database connectivity protocol (it is not a consequence of the DProp or DataJoiner internal designs). It is a trade-off between Apply being able to use block-fetch of an answer set over the network, and Apply performing single insert/update/delete operations over the network (push mode). Single insert/update/delete operations across the network are far more expensive than fetching an answer set in blocks (that is, multiple records can be retrieved by Apply in one network I/O operation). Therefore, the goal for good performance is to perform the insert/update/delete operations locally. This means co-locating both Apply and DataJoiner with the target. In most cases we recommend the use of a pull replication scenario. It is only when you have source tables that are infrequently updated that you can consider using a push scenario. This means that you should place the Apply program close to the target database. Of course, it is also possible to have a scenario in which DataJoiner and Apply are located on a separate machine from both source and target. In this case, the performance benefit of block fetching the data from the source will be negated by the performance degradation of applying the insert/update/delete operations over the network.
3.2.2 DataJoiner Placement A very important design consideration when implementing a heterogeneous replication system is the placement of the DataJoiner middleware server. An additional task is to evaluate how many DataJoiner instances and how many DataJoiner databases are necessary to implement the design. Let us spend a minute discussing what generally influences the decision of where to place the DataJoiner middleware. Definitely, we want to achieve the following goals: • Good performance: If you want to propagate towards a non-IBM replication target, for example, you will use the Apply code that is included in DataJoiner, and so it will be better to place DataJoiner as close as possible to the non-IBM replication target system, or even on the same platform when it is possible. If you have many different target databases, this means you will also have several DataJoiner instances and databases. Since Apply is a DB2 application, it needs to connect to non-IBM relational databases through DataJoiner which, with its nicknames, makes the target tables appear to Apply to be in a DB2 database. Apply can be no closer to
40
The IBM Data Replication Solution
the target tables than DataJoiner can be. If DataJoiner is installed on the same server as the non-IBM database that has the target table, then the Apply insert/update/delete operations will be done locally. If DataJoiner is installed on a different server than the non-IBM database, then the Apply insert/update/delete operations will have to be executed over the network. If DataJoiner needs to be on a different server than the target tables, you should look for ways to tune the network hardware, the network software, and the data source and its network client software (for example, Oracle SQL*Net, or Sybase Open Client on the DataJoiner server) for best performance. • Ease of administration: The more database middleware instances involved, the more administration overhead you will have to manage. That means, that if your concern is to minimize the administration overhead, you should minimize the number of DataJoiner instances and databases. The two goals are conflicting, and you will have to find a good compromise. Furthermore, the best solution is not necessarily the same whether you intend to propagate from a non-IBM database or whether you intend to propagate to a non-IBM database. The following two sections provide you with the background information to help you decide where to place the DataJoiner middleware server(s), and how many DataJoiner databases you should use. We will consider data distribution to non-IBM databases and data consolidation from non-IBM database systems, separately. 3.2.2.1 Data Distribution to Non-IBM Targets Considering a data distribution scenario with non-IBM target systems, the two main rules for placing the database middleware seem to be contradictory: • Good performance would be achieved by adding one DataJoiner instance to every remote non-IBM database server. DProp Apply would be running along with every DataJoiner instance. The DProp control tables could be decentralized or centralized. • The lowest administration and monitoring costs would be achieved by setting up one dedicated middleware server, providing nicknames for all the target tables and additionally containing the DProp control information for all the replication targets. Figure 10 visualizes the two different approaches:
System and Replication Design—Architecture
41
Data Distribution Replication Source
Replication Source
IBM DJ & APPLY IBM DJ & APPLY
IBM DJ & APPLY
Non-IBM DBMS
Non-IBM DBMS
PERFORMANCE
Non-IBM DBMS
Non-IBM DBMS
EASE OF ADMINISTRATION
Figure 10. DataJoiner Placement—Data Distribution to Non-IBM Targets
One DataJoiner Instance for Each Non-IBM Target Placing the DataJoiner instances as close as possible to the non-IBM target gives you the choice between two major options: • Option 1: Install DataJoiner on the same workstation as the non-IBM database. • Option 2: Install DataJoiner onto a separate machine, most likely placed within the same LAN as the non-IBM database server. We recommend Option 1 for those system platforms that DataJoiner natively supports (at the time this book was edited, DataJoiner Version 2 was available for AIX and Windows NT). For example, if your target system is Microsoft SQL Server on Windows NT, install DataJoiner for Windows NT on the same machine. If your target system is Oracle on AIX, install DataJoiner for AIX on the same machine. Option 2 should only be used if the non-IBM target databases are located on operating system platforms that DataJoiner does not yet natively support. Example: To access Oracle on SUN Solaris, use a separate machine (either AIX or Windows NT) and place this machine in the same LAN as the SUN Solaris machine.
42
The IBM Data Replication Solution
For mobile systems, having one DataJoiner instance per target database would not be a good idea! (But who could be so smart as to put MS SQL Server, for example, onto a mobile computer!) Please refer to Chapter 9, “Case Study 4—Sales Force Automation, Insurance” on page 271, to see what IBM currently recommends for infrequently connected mobile replication target systems. Additionally, watch out for solutions that focus on light DB2 systems to be introduced with DB2 UDB Version 6. If you choose to have one DataJoiner instance per non-IBM target, you will only need one DataJoiner database in each DataJoiner instance.
One Central DataJoiner Instance One central database middleware server in a data distribution scenario has the disadvantage of sub-optimal replication performance, because remote SQL operations (inserts, updates, deletes) over a network are much slower than remote fetches. On the other hand, a single point of administration will reduce systems management overhead. If you choose to have one central DataJoiner instance, you can choose to have either: • Only one DataJoiner database, common for all the non-IBM target databases. • Several DataJoiner databases. The only reason why you would want to create several DataJoiner databases is if you want the nicknames to be stored separately for security reasons. As a first approach, just consider that a single DataJoiner database for all the non-IBM databases is a good solution.
Final Recommendations for Data Distribution Which option to choose for a data distribution scenario will definitely depend on the kind of replication system you are planning to introduce. Factors that will influence your decision will be: • The amount of data to be replicated per replication interval • The length of one replication interval (for example, replication every minute or once a day) • The number of non-IBM target systems • The batch-window available for replication
System and Replication Design—Architecture
43
Generally, based on performance testing, we recommend one DataJoiner instance close to every non-IBM target system when the volume of data to be replicated is high. Automated systems management components can be used to assist you in monitoring large numbers of database middleware servers. Only in situations where the data flows are small and the replication cycles are long, do we recommend the use of a central middleware server to reduce complexity. In both cases, we recommend using only one DataJoiner database per instance. Refer to Chapter 7, “Case Study 2—Product Data Distribution, Retail” on page 173 for a deeper insight into a data distribution scenario with Microsoft SQL Server replication targets. 3.2.2.2 Data Consolidation from Non-IBM Sources Topology recommendations for data consolidation scenarios which replicate data from non-IBM source databases are much easier. Again, we are looking for a system setup that provides: • Good performance • Easy administration Figure 11 shows that a central database middleware instance is sufficient to deliver the best performance as well as ease of administration for data consolidation scenarios.
44
The IBM Data Replication Solution
Data Consolidation Replication Target / APPLY
Replication Target / APPLY
IBM DataJoiner
IBM DataJoiner
IBM DataJoiner
Non-IBM DBMS
Non-IBM DBMS
Non-IBM DBMS
Non-IBM DBMS
EASE OF ADMINISTRATION & PERFORMANCE Figure 11. DataJoiner Placement—Data Consolidation from Non-IBM Sources
Number of DataJoiner Instances Placing one DataJoiner instance besides every non-IBM replication source would have no performance advantages compared to a central database middleware server. Therefore we will not discuss this setup in detail. One central DataJoiner instance enables both best performance as well as ease of administration. DProp Apply will most likely be running at the replication target system, and connect to all the non-IBM replication source systems through the DataJoiner middleware instance. So, Apply will run in pull mode, which is the best approach for performance. Additionally, the centralized middleware approach will minimize administration and monitoring cost. Refer to Chapter 6, “Case Study 1—Point of Sale Data Consolidation, Retail” on page 139 for a deeper insight into a data consolidation scenario with Informix Dynamic Server Version 7.3 replication source systems.
System and Replication Design—Architecture
45
Number of DataJoiner Databases The number of DataJoiner databases that you have to create in the DataJoiner instance is also very easy to determine: You must create as many DataJoiner databases as there are non-IBM source servers. For example, if you want to propagate data from three Informix systems, you will have to create three DataJoiner databases. This is because for non-IBM source servers, DJRA creates the REGISTER, PRUNCNTL and REG_SYNCH tables in the source servers, and creates nicknames called ASN.IBMSNAP_REGISTER, ASN.IBMSNAP_PRUNCNTL and ASN.IBMSNAP_REG_SYNCH in the DataJoiner databases for these tables. This is illustrated in Figure 12:
Wrong :
Multi-Vendor Source Database 1
Right :
DataJoiner
Multi-Vendor Source Database 1
DJDB1
DataJoiner
Source Table
Source Nickname
DJDB1
DJRA
DJRA
Source 1 Nickname
Create Replication Control CREATE NICKN. Tables
ASN.IBMSNAP_PRUNCNTL ASN.IBMSNAP_REGISTER ASN.IBMSNAP_REG_SYNCH
CREATE TABLE
Nicknames ASN.IBMSNAP_PRUNCNTL ASN.IBMSNAP_REGISTER ASN.IBMSNAP_REG_SYNCH
?!
Multi-Vendor Source Database 2
Create Replication Control Tables
C
E AT RE
CK NI
NA
M
E
Nicknames
ASN.IBMSNAP_PRUNCNTL ASN.IBMSNAP_REGISTER ASN.IBMSNAP_REG_SYNCH
ASN.IBMSNAP_PRUNCNTL ASN.IBMSNAP_REGISTER ASN.IBMSNAP_REG_SYNCH
CREATE TABLE
Multi-Vendor Source Database 2
DJDB2 CR
Source Table
Source Table
EA TE
NIC
KN
AM
Source Nickname
Source Table
E
Source 2 Nickname
ASN.IBMSNAP_PRUNCNTL ASN.IBMSNAP_REGISTER ASN.IBMSNAP_REG_SYNCH
CREATE TABLE
Nicknames ASN.IBMSNAP_PRUNCNTL ASN.IBMSNAP_REGISTER ASN.IBMSNAP_REG_SYNCH
ASN.IBMSNAP_PRUNCNTL ASN.IBMSNAP_REGISTER ASN.IBMSNAP_REG_SYNCH
CREATE TABLE
Figure 12. Why One DataJoiner Database for Each Non-IBM Source Server?
3.2.3 Control Tables Placement The purpose of this section is to study the different possible placement options for the DProp control tables. When a source database is a DB2 database, the Capture program needs to have its control tables within the replication source database. This is also true for the triggers that emulate the capture functions. The REGISTER, PRUNCNTL, and REG_SYNCH tables must be physically stored in the non-IBM source, even if nicknames are created for them in the DataJoiner database.
46
The IBM Data Replication Solution
So in fact when we are discussing the placement options for the control tables, we are only interested in the control tables of the Apply program. The placement of Capture’s control tables (or the capture-emulation triggers’ control tables) is predetermined. The Apply program can have its control tables (for example, SUBS_SET, SUBS_MEMBR) located locally or remotely. The location chosen to hold the control tables is known as the Control Server. As we already explained, it is also possible to run Apply in push mode or in pull mode. This means that you have several possible configurations (see Figure 13):
Source
Source
Source
Control Server
Control Server
DataJoiner
Source
DataJoiner
APPLY
APPLY
DataJoiner
DataJoiner
Control Server
Control Server
APPLY
APPLY
Non-IBM Target
Non-IBM Target
Non-IBM Target
Non-IBM Target
Configuration 1 Pull
Configuration 2 Pull
Configuration 3 Push
Configuration 4 Push
Figure 13. Apply and Control Server Placement
And since you probably will have several Apply instances, you can even have combinations of the above configurations. But remember that if you want to keep your configuration manageable, you had better keep it simple!
System and Replication Design—Architecture
47
Configuration 1: This is the solution we would recommend if you have either: • Few target servers and one DataJoiner instance for each target server • One central DataJoiner instance Configuration 2: This is the solution we would recommend if you have either: • Many target servers and one DataJoiner instance for each target server; the performance penalty of having the control server remote from the Apply program is largely compensated by the ease of administration. • A production environment (mostly limited by your staff skills and the monitoring tools available) that makes it really preferable to have the control server co-located with the source server. Configuration 3: This configuration is possible, but: • It would be a push configuration (with poor data transfer performance). • You should only consider this kind of configuration if you have only few updates to propagate and if you run the Apply program infrequently. Configuration 4: This configuration should not be used, because: • It would have poor performance and uneasy administration. So far, we have been discussing the placement of the software components and of the control tables in a heterogeneous replication environment, but we have not discussed some options that you will have to study before you begin the real implementation. The next section helps you think about the additional design options to be considered.
3.3 Replication Design Options We will discuss the following topics in this section: • The target table types you can use • The replication frequency (replication timing) • The blocking factor In fact, most of these replication design options are directly driven by your business requirements (refer to Chapter 2, “Planning” on page 13, to see which tasks you need to go through to assess these business requirements). You will have to check that these business requirements can really be achieved, considering the comments and restrictions that are explained here.
48
The IBM Data Replication Solution
3.3.1 Target Table Types In a DB2-only replication environment, the following target table types are available: • Read-only target tables: usercopy, point-in-time (PIT), consistent change data (CCD), base aggregate, change aggregate • Updatable target tables: replica, user tables (source tables in an update-anywhere scenario) In a heterogeneous data replication environment, not all of these target table types are available. Read-only target table types are supported, except for CCDs. DJRA does not currently allow the creation of a CCD table in a non-IBM target database, but this restriction will probably soon be removed, and there is a workaround anyway (see Chapter 8, “Case Study 3—Feeding a Data Warehouse” on page 203, for more details about this workaround). Update-anywhere is currently supported for DB2 target tables only (target type Replica), or for Microsoft Access target tables (target type Row-Replica). This means that you cannot have an update-anywhere scenario if the source database is not DB2 and if the target databases are not DB2 or Microsoft Access. 3.3.1.1 Referential Integrity Considerations Even in a DB2-only replication environment, you can not define referential integrity (RI) constraints between your target tables if they are read-only target tables (usercopy, point-in-time, base aggregate, change aggregate, or CCD). In fact constraints are only needed when there are application updates. It is useless to define constraints on read-only targets, because the updates made against the source tables already satisfied the RI constraints defined at the source, and Apply will logically maintain these original RI constraints on the target. For better performance, DProp Apply assumes some freedom with respect to read-only copies. Updates to read-only copies within a subscription set cycle will be performed one member at a time, with all updates related to one subscription member issued before updating the target associated with the next member. Still, all members within a subscription set are replicated within one unit-of-work. By taking a global upper transaction boundary (global SYNCHPOINT) for the complete set into account, all the target tables are
System and Replication Design—Architecture
49
transactionally and referentially consistent at the end of the replication cycle. This is what is called transaction consistent replication . In a DB2-only replication environment, if you really want to define RI constraints between your target tables, you must define the target tables as replicas. In this case you can implement the same RI constraints on the target tables as the ones that you defined at the source (master). Apply will no longer replicate the captured changes on a table-by-table basis (as above), because that may violate the RI constraints. Instead, Apply will issue the updates to the target tables in their original interleave sequence, across all the subscription members concurrently. If the subscription set is defined large enough, that means if the set includes all the tables that are referentially related, the RI constraints defined on the targets will be respected because Apply will have applied the updates in their original execution order. This is known as transaction based replication . 3.3.1.2 Read-only Copies with Referential Integrity If your target tables are non-IBM (or if they are DB2 tables but you do not want to define them as replicas) and you want to define RI constraints between your target tables, use the following advanced technique: 1. Define your replication source tables at the source server. 2. At the source server, disable conflict detection for all the replication sources: UPDATE ASN.IBMSNAP_REGISTER SET CONFLICT_LEVEL = 0 WHERE SOURCE_OWNER = ’<source_owner>’ AND SOURCE_TABLE = ’<source_table>’;
3. Define a subscription set and add members using the target table type usercopy, just as you would do for any read-only replication target. 4. Then change the target table type under the covers, changing it from usercopy (TARGET_STRUCTURE = 8) to replica (TARGET_STRUCTURE = 7). To do this, change the target table type within the pruning control table at the source server for all the members of the subscription set: UPDATE SET WHERE AND AND AND
50
ASN.IBMSNAP_PRUNCNTL TARGET_STRUCTURE = 7 APPLY_QUAL = ’
’ SET_NAME = ’<set_name>’ TARGET_OWNER = ’’ TARGET_TABLE = ’’;
The IBM Data Replication Solution
5. At the control server, change the target table type within the subscription target member table for all the members of the subscription set: UPDATE SET WHERE AND AND AND
ASN.IBMSNAP_SUBS_MEMBR TARGET_STRUCTURE = 7 APPLY_QUAL = ’’ SET_NAME = ’<set_name>’ TARGET_OWNER = ’’ TARGET_TABLE = ’’;
6. Start Capture and Apply, and implement the RI constraints between the target tables after Apply has full-refreshed all the tables. This is because no interleave sequence can be detected when performing the initial full-refresh. If you want to be able to define the RI constraints before the full-refresh, then you must: • Disable full-refresh before starting Capture and Apply • Then use an adapted technique to load the tables and preserve the RI constraints (use the ASNLOAD exit with EXPORT/LOAD, followed by a CHECK CONSTRAINTS command for example, if the target is DB2). Refer to Chapter 8.4.9, “Initial Load of Data into the Data Warehouse” on page 261 for examples of the use of such techniques.
3.3.2 Replication Timing When you design your replication system, an important aspect is to determine when the the subscription sets will actually be processed. You can choose to have the replication sets processed at regular time intervals (this is called relative timing processing) or according to external events that can be of any nature (this is called event based scheduling) or both. In fact the replication timing definition is not peculiar to a heterogeneous replication system. You would have to study exactly the same question in the same way, if you were in a DB2-only replication environment. This is because the replication timing exclusively deals with the Apply program (the Capture program or the change capture triggers are not involved here), and it is always the same Apply program that you use, whether you propagate towards a DB2 database or towards a non-IBM database through DataJoiner. So you could simply refer to the DB2 Replication Guide and Reference book, S95H-0999. Nevertheless we want to recall some important notions here. We also want to point out that there is nothing particular in replication timing for a heterogeneous replication system compared to a DB2-only replication system.
System and Replication Design—Architecture
51
3.3.2.1 Relative Timing When you define a subscription set, you can indicate a replication interval, in minutes, hours, days, or even weeks. You can also indicate that you want the subscription set to be processed continuously, meaning that just after Apply has finished processing the subscription set, it will process it again. This does not mean, of course, that you have transformed DProp into a synchronous replication system. There is still a delay between the time the update is done at the source and the time the update is applied to the target. But this delay will be short. Remark: The timing information is defined at a subscription set level. So, all the members in a set will be processed with the same frequency. Now you must be aware that if you have many tables to propagate, and you choose a very short interval (1 minute, for example), Apply will do its best to meet your requirement, but the actual interval will probably be higher than what you indicated. This depends mainly on the available system resources (for example, CPU power, and network capacity). 3.3.2.2 Event Based Timing Event based scheduling was introduced in DProp V5. It enables you to schedule the processing of a subscription set according to external events. An event can be, for example, the end of a batch job, or a precise date and time. In fact, when you want a subscription set to be scheduled by events, you simply indicate an event name (any character string is fine; EVENTXX, for example) when you define the subscription set. You have probably noticed that there is a control cable that is called ASN.IBMSNAP_SUBS_EVENT. To create an event that will trigger the processing of your subscription set, you simply need to insert a row into the ASN.IBMSNAP_SUBS_EVENT table, with the following column values: • The name of the event (EVENTXX in our example). • A timestamp that indicates when the subscription set must be processed. There is an optional third column when you add a row into the events table: Refer to the DProp documentation to find more details about its use. Several subscription sets can share the same event name. This means that if you wish to trigger several subscription sets together from the same event, you only need to indicate the same event name in the subscription sets definition. But if you intend to do this, perhaps you should consider grouping
52
The IBM Data Replication Solution
all the members into a single subscription set, rather than having separate subscription sets. Remark: For a subscription set, you can indicate both a replication interval and an event name, but in general we recommend not mixing these two processing modes. Remember: Try to keep things simple! Example: If you want to propagate only once a day (for example, in the evening at 8 pm) you have several possibilities: • Use relative timing, with a 1-day frequency. • You can even indicate a smaller interval (15 minutes, for example) so that you can start additional replications during the day if needed. You can either stop Apply once all the subscriptions sets have been processed at least once, or deactivate all subscription sets processed by this Apply instance by updating the control tables with the following statement: UPDATE ASN.IBMSNAP_SUBS_SET SET ACTIVATE=0 WHERE APPLY_QUAL=’’
• Use events: Insert as many events as you wish in the ASN.IBMSNAP_SUBS_EVENT table, one for each day that you want the subscription sets to be processed. • Or use the advanced event based scheduling technique that is described in the next section. 3.3.2.3 Advanced Event Based Scheduling The technique that is explained here enables you to automatically schedule the processing of a subscription set once a day. First, create your subscription sets with the event name ’WEEKDAY’ (or any other name you like). You will need to drop the existing ASN.IBMSNAP_SUBS_EVENT table and recreate it with another name. Then create the following view: CREATE VIEW ASN.IBMSNAP_SUBS_EVENT (EVENT_NAME,EVENT_TIME,END_OF_PERIOD) AS SELECT SUBSTR(’WEEKDAY ’ , 1), TIMESTAMP(DATE(CURRENT TIMESTAMP) , ’00.00.00’), NULLIF(1 , 1) FROM SYSIBM.SYSDUMMY1 WHERE((DAYS(CURRENT TIMESTAMP)-((DAYS(CURRENT TIMESTAMP)/7)*7))+1) BETWEEN 2 AND 6
System and Replication Design—Architecture
53
SYSIBM.SYSDUMMY1 can be substituted with the name of any 1-row table. WEEKDAY is the name of the event. If you created the subscription sets with another event name, just replace ’WEEKDAY’ above in the view definition by the actual event name you chose. The view above will generate a transparent event (in fact, no event will actually be inserted in any real table; the view will just generate a temporary event for Apply, when Apply accesses what it thinks is the ASN.IBMSNAP_SUBS_EVENT table). The event will be visible each day of the week from Monday through Friday, at midnight. On Saturdays and Sundays nothing will happen (of course, you can change this: you simply need to change the BETWEEN 2 AND 6 clause). When Apply runs on Monday, for example, it will access the view and believe that there is an event for that day at midnight, and so it will process the subscription set. The next time Apply evaluates the view, the view will generate the same transparent event, but since the LASTSUCCESS column in the SUBS_SET table indicates that the subscription set has already been processed that day, it will of course not process the subscription set again. The event generator (the ASN.IBMSNAP_EVENT view) is a good idea. But you could go further. You could, for example, modify the view so that it performs a union with the original (renamed) event table. This would allow you to continue creating traditional events, and still benefit from the automatic event generator.
3.3.3 Using Blocking Factor In a DB2-only replication environment, you can indicate a blocking factor when you define replication sets. This feature is there to regulate the large answer sets. We will explain what this means below. The blocking factor limits the number of rows that will be propagated by Apply. It determines a maximum number of minutes of changes that Apply can process when it reads the change data tables. If the rows that are present in the change data tables (and that have not yet been processed) represent a number of minutes of changes that is above the blocking factor value, Apply will automatically split the fetched answer set into smaller pieces, and it will process the subscription set as several mini-subscriptions, in several mini-cycles. This is an important feature. If you have defined a blocking factor value, then when Apply encounters an environment problem (logs full in the target database, for example), Apply will automatically try to split the answer set so that the subscription is reprocessed in several mini-subscriptions. If you do
54
The IBM Data Replication Solution
not define any blocking factor value (that is to say, if you left the 0 default value), Apply will not be able to use this mini-subscriptions technique in case of environment problems. In a heterogeneous replication environment: • If the target is a non-IBM database and the source is a DB2 database, the blocking factor feature can be used. Use a blocking factor that is large enough, so that in most cases Apply will be able to process all the rows in the change data tables in only one replication cycle. • If the source is a non-IBM database, choosing a blocking factor for a subscription set to limit the amount of data to be replicated within one subscription cycle (ASN.IBMSNAP_SUBS_SET.MAX_SYNCH_MINUTES <> NULL) does not take transaction boundaries into account (as it does for DB2 replication sources). That means that target database consistency could possibly be affected while Apply is replicating all change data using mini-cycle processing. After Apply has finished processing, target database consistency is, of course, reestablished. Recommendation: We recommend not using a subscription set blocking factor when replicating from a non-IBM replication source, if some applications can query the replication target tables while Apply is running. If you are replicating during off-hours only, then a blocking factor can be used.
3.4 Performance Considerations for Capture Triggers In contrast to the asynchronous, log-based change capture mechanism that IBM DProp provides for all DB2 replication sources, trigger based change capture is a synchronous solution. That means, all change data is committed into the change data tables (CCD tables in fact) in the same transaction along with the source application. Planning ahead for trigger based change capture, therefore, has to take the following considerations into account: • Source application transactions will slow down, because the transaction’s workload is actually doubled. • Source applications will need more log space, because writing change data into the change data tables will happen within the same commit scope as changing the application tables. • If a Capture trigger cannot insert a change record into the CCD table, such as because there is no longer space for the new record, then the
System and Replication Design—Architecture
55
transaction attempted by the user or application program cannot complete successfully. The impact on large batch-style applications will also need to be studied. To better evaluate the impact of synchronous trigger based change capture, we set up a performance examination for two non-IBM replication source systems, namely Informix Dynamic Server 7.3 and Oracle 8.
3.4.1 Description of the Performance Test Setup To make different test setups comparable, we installed both Oracle 8 and Informix Dynamic Server 7.3 on the same RS/6000 machine, model J50, running IBM AIX Version 4.3. In both systems, we natively created the same table (same number of columns, same data types). Both tables contained a column comparable to the DB2 data type TIMESTAMP (Informix data type DATETIME, Oracle data type DATE). A unique index was created for both tables. We created two test jobs, both containing 27340 SQL INSERT statements, grouped together into 100 INSERTs per transaction (1 COMMIT after 100 INSERTs). The TIMESTAMP column was populated in both cases using an SQL expression comparable to DB2’s current timestamp. The test jobs basically contained the same statements. The only differences were the syntactical representation of the CURRENT TIMESTAMP expression (Informix: CURRENT YEAR TO FRACTION (5), Oracle: CURRENT DATE) and the method used to execute the SQL script: • Informix: The Informix client program dbaccess was used to execute the SQL script. The following syntax shows the invocation of the test script. All output was redirected to /dev/null to prevent any slowdown by unnecessary screen output. dbaccess ifxdb1 insertsifx.sql > /dev/null
• Oracle: The Oracle client program SQL*Plus was used to execute the SQL script. The following syntax shows the invocation of the test script. All output was redirected to /dev/null to prevent any slowdown by unnecessary screen output. sqlplus user1/pwd1@oradb1 @insertora.sql > /dev/null
To measure the impact of the synchronous triggers, the insert job was executed before the tables were registered as replication sources (that is, before the capture triggers were created) and again after the tables were registered as replication sources.
56
The IBM Data Replication Solution
To eliminate unreasonable deviations, each separate test setup was executed three times. Before each test was repeated, the test table was dropped and recreated to guarantee identical test conditions. First test setup: The test script was executed without any triggers defined. Second test setup: The second test run only applied to Informix. The reason is that the Informix capture triggers generated by DJRA require the Informix system variable USEOSTIME to be set to 1. We wanted to find out whether this setting had any negative impact on Informix performance. Third test setup: The test script was executed with capture triggers defined.
3.4.2 Test Results The purpose of this test was not to determine which database system was faster. We only wanted to find out the impact that synchronous trigger-based change capture has on both the examined database systems. Please do not interpret the results in any other way. When interpreting the test results, please take into account the fact that the batch-style insert script we used represents an extreme workload of INSERTS. Figure 14 graphically represents the test results. The y-axis (vertical) displays the INSERT performance ratio comparing the different test setups, considering the performance without any triggers and without any replication based changes to the system settings as 100%. Please note that the ratio displayed in the graph does not compare the absolute INSERT performance observed comparing Informix and Oracle.
System and Replication Design—Architecture
57
120
100.0
96.6
60.6
2
3
Insert Ratio 100.0 69.7
100
80
60
40
20
1
4
5
0 1 2 Informix Dynamic Server3
4
5 Oracle 8
6
1 Informix: USEOSTIME = 0, No Capture Triggers 2 Informix: USEOSTIME = 1, No Capture Triggers 3 Informix: USEOSTIME = 1, With Capture Triggers 4 Oracle: No Capture Triggers 5 Oracle: With Capture Triggers
Figure 14. Performance Analysis for Trigger Based Change Capture
3.4.3 Conclusion of the Performance Test Synchronous trigger-based change capture will always have an impact on transaction throughput rates, especially for transactions which heavily update application source tables. That is the price for integrating non-IBM database systems into a multi-vendor change replication solution. What you gain is, of course, a change capture mechanism that enables out-of-the-box change replication for non-IBM database systems without having to change any application logic and without having to copy complete database tables when synchronizing source and target (which other vendors call snapshot replication).
58
The IBM Data Replication Solution
3.5 Summary In this chapter we first explained which components (DProp, DataJoiner, DJRA, control tables) are required to build a heterogeneous replication system, and how these components work together. We also provided you with a list of the replication features or techniques that you can use in a heterogeneous replication system, and some references to examples in this book that show how you can implement these techniques. Then we discussed the different options that are available when you build the system’s architecture. We divided these options into two separate categories: • The system design options, which essentially deal with the placement of the DataJoiner middleware, the placement of the control tables and the Apply program. • The replication design options, which essentially deal with the types of target tables and the replication timing. We also discussed the impact that the new replication system will have on your current production database(s) and illustrated this with some examples. Now, you have to correlate the information provided in this chapter with the preliminary information (that is, list of data sources, list of targets, volumes, for example) that you gathered during the planning phase of your project, and build a picture of your future heterogeneous replication system. When you draw this picture, you must precisely indicate how many DataJoiner servers you will use, how many DataJoiner databases you will create in each DataJoiner server, where the Apply control tables will be located, and where the Apply programs will run. You must also indicate on the picture the types of target tables you will use, and the timing options that will be used. You should also indicate the naming conventions that you will use for: • The databases names, including the DataJoiner databases • The Apply qualifier names • The subscription set names • The userids that will be used to access the non-IBM databases • The owner (high-level qualifier) of the target tables and of the nicknames
System and Replication Design—Architecture
59
Once all this information appears in the picture that you have drawn, and the picture has been approved, you can move to the next step: Implementing your heterogeneous replication system. The next chapter will tell you how you can efficiently achieve this implementation task.
60
The IBM Data Replication Solution
Chapter 4. General Implementation Guidelines This chapter introduces general implementation guidelines to set up a heterogeneous replication system. It is assumed that the replication system design is complete.
4.1 Overview Building on recommendations given in Chapter 3, the following decisions are made before implementing the solution: • Replication source server platform(s), either DB2 or non-IBM • Replication target server(s) platforms, either DB2 or non-IBM • DataJoiner platform(s) and DataJoiner placement • Placement of DProp Apply, either push or pull configuration • Control table location, either centralized or decentralized To assist you in implementing the replication system design, we developed a general implementation checklist which names and sequences all activities that are necessary before the first replication definition can be made. We used the checklist presented in this chapter throughout the implementation of all case studies that are detailed in the second part of the book—showing its general usability for different scenarios. Before we start, please reposition yourself by having a look at Figure 15:
© Copyright IBM Corp. 1999
61
Approaching Data Replication
Replication Planning
Replication Design
Setup Database Middleware Server
General Implementation Guidelines
Implementation of Replication Subcomponents
Create Replication Control Tables
Replication Operation & Maintenance
Setup Replication Administration Workstation
Bind DProp Capture & DProp Apply
Figure 15. Reposition Yourself: Chapter 4—Overview
Following the work breakdown approach that guides us through the complete book, we start to implement the replication solution after planning the project and after deciding about the overall replication design. Going into the details, the implementation of the replication solution has to deal with five major activities: 1. Set up the Database Middleware Server 2. Implement the Replication Subcomponents 3. Set up the Replication Administration Workstation 4. Create the Replication Control Tables 5. Bind DProp Capture and DProp Apply
62
The IBM Data Replication Solution
We will work along these main activities when naming and describing all setup steps that have to be successfully executed before heterogeneous replication sources can be registered and before subscriptions replicating towards IBM and non-IBM target tables can be defined.
4.2 Components of a Heterogeneous Replication System A distributed heterogeneous replication system consists of several autonomous components: • The replication source system, either DB2 or a non-IBM database • The replication target system, either DB2 or a non-IBM database • The database middleware gateway • The replication administration workstation, which is used to set up and monitor the replication environment We call a distributed multi-platform replication scenario "heterogeneous" when either the replication source or the replication target is a non-IBM database system. No limits apply. Using IBM DataPropagator and IBM DB2 DataJoiner, it is even possible to replicate change data from one non-IBM database system, such as Oracle, to another non-IBM database system, such as Sybase.
4.3 Setting up a Heterogeneous Replication System The main tasks when setting up a heterogeneous distributed replication system are: • Installation and configuration of all necessary software components • Setting up connectivity between all components of the replication system To guide you through these activities, the following sections provide a checklist that includes the end-to-end view on how to set up the replication and database middleware subcomponents.
4.3.1 The Implementation Checklist All setup activities named in the following implementation checklist are numbered. The numbers are used to refer to certain setup activities throughout the book:
General Implementation Guidelines
63
Set Up the Database Middleware Server Step 1—Install the non-IBM client code to access non-IBM data sources Step 2—Prepare and check native access to the remote data sources Step 3—Install the DataJoiner software Step 4—Prepare DataJoiner to access the remote data sources Step 5—Create a DataJoiner instance Step 6—Create the DataJoiner databases Step 7—Connect DataJoiner to other DB2 or DataJoiner databases Step 8—Enable DB2 clients to connect to the DataJoiner databases Step 9—Create Server Mappings for all non-IBM database systems Step 10—Create the Server Options Step 11—Create the User Mappings Implement the Replication Subcomponents (Capture, Apply) Step 12—Install and set up DProp Capture (if required) Step 13—Install and set up DProp Apply (if required) Set Up the Replication Administration Workstation Step 14—Install DB2 Client Application Enabler (DB2 CAE) Step 15—Establish DB2 connectivity Step 16—Install DJRA, the DataJoiner Replication Administration software Step 17—Set up DJRA to access the source and target databases Step 18—Modify the DJRA user exits (optional) Create the Replication Control Tables Step 19—Set up the DProp control tables at the replication source servers Step 20—Set up the DProp control tables at the replication control servers Bind DProp Capture and DProp Apply Step 21—Bind DProp Capture (if required) Step 22—Bind DProp Apply
64
The IBM Data Replication Solution
4.3.2 How to Use the Implementation Checklist We recommend that you follow the given sequence of setup activities, because most of the steps build on previously achieved results. After all steps named in the general implementation checklist are successfully completed, you are ready to define replication source tables and replication subscriptions. Once these are defined, you can start the replication subsystems DProp Capture and DProp Apply.
4.4 Detailed Description of the Implementation Tasks The setup activities provided in the previous checklist will now be expanded and explained in detail. Furthermore, we will refer to several supplementary IBM publications that provide additional information.
4.4.1 Set Up the Database Middleware Server The main task when setting up the database middleware server will be to install and configure a DataJoiner instance. Because DataJoiner’s role is to enable database access to a variety of remote database systems, setting up the database connectivity will be key. After the following steps are completed, you will be able to transparently access heterogeneous databases through DataJoiner:
Step 1: Install Non-IBM Client to Access Non-IBM Data Sources DataJoiner uses data access modules to access remote databases such as Oracle, Sybase, Informix, Microsoft SQL Server, or other ODBC data sources. These data access modules require native non-IBM client libraries to be installed onto the DataJoiner server. A basic step when setting up a DataJoiner middleware server is therefore to install the remote database’s client code, such as SQL*Net, ESQL/C, Sybase Open Client, or an ODBC driver onto the DataJoiner server. Please refer to the DataJoiner Planning, Installation and Configuration Guide, SC26-9150 (Chapter 3: Software Requirements) for more information about software prerequisites regarding non-IBM client software. Details about setting up client connectivity for all involved non-IBM database systems can be obtained from documentation provided by the particular database vendors. It is also always a good idea to contact a local DBA for assistance during this phase.
General Implementation Guidelines
65
Step 2: Prepare and Verify Native Access to Remote Data Sources Check the connectivity between the database client and the remote database server natively, before trying to access the database through DataJoiner. This eliminates potential setup errors as early as possible. Oracle: Set up an SQL*Net or Net8 client (for Oracle Version 8) to access an Oracle server and check if native connections work fine. Informix: Set up an ESQL/C client to access an Informix server and check if native connections work fine. Sybase: Set up a Sybase Open Client, using dblib or ctlib (ctlib is recommended), to access a Sybase server and check if native connections work fine. Microsoft: Set up an ODBC client to access a Microsoft SQL Server instance and check if native connections work fine. Check Appendix B, “Non-IBM Database Stuff” on page 325 for many useful details about non-IBM client software, including hints on how to set up non-IBM database clients and how to natively test connectivity.
Step 3: Install the IBM DataJoiner Software After you have successfully tested native connectivity between the non-IBM clients and the non-IBM servers, you can start to set up the DataJoiner server. Transfer the DataJoiner software from the installation media to the destination devices and consult the DataJoiner Planning, Installation and Configuration Guide, SC26-9150 for further details. As discussed in 3.2.2, “DataJoiner Placement” on page 40, DataJoiner can be installed on the same machine as the non-IBM database, or on a separate server. Refer to Case Study 1, section 6.3.2, “Configuration Tasks” on page 151 for a practical example.
Step 4: Prepare DataJoiner to Access the Remote Data Sources DataJoiner uses data access modules to access remote data sources. On UNIX platforms, these data access modules have to be built by link-editing DataJoiner libraries with the client libraries provided by the non-IBM client software. DataJoiner for Windows NT already provides data access modules for all natively supported non-IBM database systems. Therefore no specific action is required on NT.
66
The IBM Data Replication Solution
UNIX platforms: It is a good idea to build the data access modules before the first DataJoiner instances are created. All subsequently created instances will automatically contain those data access modules. However, it is possible to create new data access modules and add those to existing DataJoiner instances using the db2iupdt command (db2iupdt updates existing DataJoiner instances). The creation of the data access modules as well as the update of the DataJoiner instances has to be executed as root user. To create the data access modules for the remote databases, use the following guidelines: 1. Log on as root. 2. Set the remote client’s environment variables accordingly (for example, set the SYBASE variable when link-editing a dblib or ctlib data access module). 3. Change to the /usr/lpp/djx*/lib directory (the actual DataJoiner path depends on the DataJoiner version you are using, for example djx_02_01). 4. Execute the shell script djxlink.sh, to automatically create the necessary data access modules. 5. If the execution of djxlink.sh is not successful, build the data access modules by: 1. Editing djxlink.makefile 2. Creating the access modules you need by executing make -f djxlink.makefile
Please refer to the DataJoiner Planning, Installation and Configuration Guide for AIX, SC26-9145 (Chapter 6: Configuring Access to Data Sources) for more details. There you will find guidelines to create data access modules for all natively supported DataJoiner data sources. Additionally, check the DataJoiner homepage on the World Wide Web, especially http://www.software.ibm.com/data/datajoiner/faqv2.html, for hot updates regarding the support of the latest non-IBM client releases (Appendix H, “Related Publications” on page 397, has the details where to find Web sites referenced in this book).
Step 5: Create the DataJoiner Instance Basically, one DataJoiner instance will be sufficient regardless of how many DataJoiner databases you intend to create. Create multiple instances if you want to use different start-up parameters for different instances.
General Implementation Guidelines
67
Windows NT: One DataJoiner instance is automatically created when the DataJoiner code is installed. The instance, by default, is called DB2. UNIX platforms: Some setup tasks have to be completed before the instance can be successfully created: 1. Create a DataJoiner instance owner group. 2. Create a DataJoiner instance owner. 3. Change to the /usr/lpp/djx*/instance directory (the actual DataJoiner path depends on the DataJoiner version you are using, for example djx_02_01_01). 4. Create the instance using the db2icrt command. Further details can be found in the DataJoiner Planning, Installation and Configuration Guide for AIX, SC26-9145.
Step 6: Create a DataJoiner Database One single DataJoiner database can be used to access multiple backend database systems. Using DataJoiner as gateway within a replication scenario implies some additional considerations. Generally, one DataJoiner database is sufficient to support multiple heterogeneous replication targets (multiple DProp target servers). All replication control tables are stored within this DataJoiner database, and all the remote target tables are referenced by nicknames. If replicating to multiple non-IBM target databases, create one subscription set for each target database, to avoid distributed transactions. If multiple non-IBM databases are used as a replication source, one DataJoiner database has to be created for each non-IBM replication source database. Refer to 3.2.2, “DataJoiner Placement” on page 40 for background information.
Step 7: Connect DataJoiner to Other DB2 or DataJoiner Databases Set up the connectivity to other DB2 or DataJoiner databases (DB2 / DataJoiner on UNIX/Intel as well as DRDA servers such as DB2 for OS/390, DB2 for AS/400, DB2 for VM, or DB2 for VSE) by cataloging DB2 nodes and DB2 databases. Refer to the DB2 and DataJoiner documentation for more details.
68
The IBM Data Replication Solution
Step 8: Enable DB2 Clients to Connect to the DataJoiner Databases The activities necessary to enable remote DB2 clients to connect to the new DataJoiner databases depend upon the communication protocols that the DB2 clients will use. For each of the incoming communication protocols you wish to use (DataJoiner supports TCP/IP, APPC and NetBIOS) you have to: 1. Set up the underlying communication subsystem (for example, SNA Server, TCP/IP). 2. Update the DataJoiner instance with the appropriate communication settings, for example use: update database manager configuration using SVCENAME
when setting up the instance to support TCP/IP clients. 3. Set the DB2COMM environment variable to the appropriate value. Windows NT: Set the environment variable DB2COMM to all the communication protocols that the DataJoiner instance will support. For example: SET DB2COMM=TCPIP,APPC
UNIX platforms: Make use of the ./sqllib/db2profile file and set the DB2COMM setting there. Execute the db2profile within the instance owner’s .profile
Remark: These settings only take effect when you recycle ( db2stop db2start) the instance.
Step 9: Create Server Mappings for All the Non-IBM Databases Each non-IBM database that you want to access as either replication source or replication target has to be referenced through a DataJoiner server mapping. A server mapping specifies how DataJoiner will subsequently access a remote data source and is defined using the CREATE SERVER MAPPING command. Refer to the DataJoiner Planning, Installation and Configuration Guide, SC26-9150 (Chapter 6: Configuring Access to Data Sources) for further details about how to set up server mappings for remote data sources, or to the DataJoiner Application Programming and SQL Reference Supplement for details about the CREATE SERVER MAPPING DDL statement. Use the following SQL statement to query all successfully defined server mappings within a DataJoiner database: SELECT SERVER, NODE, DBNAME, SERVER_TYPE, SERVER_VERSION, SERVER_PROTOCOL FROM SYSCAT.SERVERS;
General Implementation Guidelines
69
Step 10: Create the Server Options Server options help optimize performance and control certain interactions between DataJoiner and the remote data sources. We recommend always setting the server option TWO_PHASE_COMMIT to NO when using DataJoiner as a gateway for DProp replication. Example: create server option TWO_PHASE_COMMIT for server setting ’N’;
Refer to the DataJoiner Application Programming and SQL Reference Supplement, SC26-9148 for more details on the server options which might be required for the specific release of the non-IBM system you are using.
Step 11: Create User Mappings In cases where your DataJoiner userids and your non-IBM userids are different, user mappings are used to map DataJoiner userids and passwords to non-IBM database userids and passwords. If the userids and passwords are identical, no user mappings are necessary to enable communication between DataJoiner and the non-IBM data sources. When using DataJoiner Replication Administration, DataJoiner user mappings have an additional value as they are also used to determine the table schema (table creator) for the tables that are created in the non-IBM databases. Recommendation (1): Create at least one user mapping before creating the replication control tables for a non-IBM replication source. The DataJoiner Replication Administration program determines the remote schema for control tables that are created within the remote data source from the REMOTE_AUTHID defined for the DataJoiner user that DJRA uses to connect to the DataJoiner database. Recommendation (2): Define user mappings for all the schemas that you are planning to use as table qualifiers for non-IBM replication target tables. Use the following SQL statement to query all successfully defined user mappings within a DataJoiner database: SELECT AUTHID, SERVER, REMOTE_AUTHID FROM SYSCAT.REMOTEUSERS;
70
The IBM Data Replication Solution
4.4.2 Implement the Replication Subcomponents (Capture, Apply) This section refers to the installation of DProp Capture and DProp Apply. To decide if either component is required for your replication setup, we have to distinguish replication source and target systems.
Step 12: Install and Set Up DProp Capture, if Required The installation of DProp Capture is only required for DB2 replication sources. For non-IBM replication sources, DProp Capture is emulated by triggers. Non-IBM Source: The change capture activity will be achieved using OEM triggers. There is no need to install additional software. IBM Source: Install IBM DProp Capture (On UNIX and Intel platforms, Capture is already pre-installed when setting up DB2 UDB). If the replication source servers are DB2 UDB databases on Intel or UNIX platforms, change the LOGRETAIN parameter of the source server’s database configuration to YES. Step 13: Install and Setup DProp Apply, if Required DProp Apply is used to replicate to both DB2 as well as non-IBM replication targets. Non-IBM Target: IBM DProp Apply will be used to replicate data to the non-IBM target databases. As an integrated component of DataJoiner for Windows NT, Apply is already pre-installed when setting up DataJoiner. On UNIX platforms, make sure you install DataJoiner’s Apply component when installing the DataJoiner software. IBM Target: Install IBM DProp Apply (On UNIX and Intel platforms, Apply is already pre-installed when setting up DB2 UDB). Refer to the DB2 Replication Guide and Reference, S95H-0999 for platform-specific issues on how to install DProp Capture or DProp Apply.
4.4.3 Set Up the Replication Administration Workstation The complete heterogeneous multi-platform replication system can be configured from one single PC (if necessary, you can use a couple of administration workstations). All information on replication source tables and replication subscriptions is stored within the replication control tables at either the replication source or the replication target database. Therefore, the replication Administration
General Implementation Guidelines
71
workstation can be added or removed without any effect on the runtime components of your replication system. DataJoiner Replication Administration (DJRA) is the GUI (Graphical User Interface) you will use to define and maintain replication sources and subscriptions. At the time this book was edited, DataJoiner Replication Administration was available for 32-bit Windows systems (Windows NT, ’95/’98) only. Therefore, a Windows workstation will be used as replication administration workstation. Administering your replication setup from the administration workstation consists of the following tasks: • Create the replication control tables • Define and maintain replication source tables (or views) • Define and maintain replication subscriptions • Promote replication sources, replication subscriptions and change data tables from the test systems to the production systems • Monitor the status of your replication subsystems We will now name and describe all the setup tasks that are necessary to configure the administration workstation. You can then start to define replication source tables and replication subscriptions from this replication administration workstation.
Step 14: Install DB2 Client Application Enabler (DB2 CAE) Install the DB2 UDB Client Application Enabler (CAE) for Windows, on the PC that you chose as the Administration workstation. Refer to the IBM DB2 UDB manual Installing and Configuring DB2 Clients on the World Wide Web (http://www.software.ibm.com/data/db2/performance/dprperf.htm) for further details. Step 15: Establish DB2 Connectivity Basically, the administration PC has to be able to connect to all DB2 or DataJoiner databases that are used as replication source server or replication target server. For DRDA connections (for example to access a DB2 for OS/390 replication source system), the DataJoiner instance (or any other DB2 CONNECT server) can be used as a DRDA gateway. Step 16: Install DataJoiner Replication Administration Software Install DJRA onto the administration workstation. As more and more functions are added to the replication administration software, it is a good idea to download the DJRA package from time to time from the Web. The DJRA
72
The IBM Data Replication Solution
package is available from both the IBM DataPropagator and the IBM DataJoiner homepages on the World Wide Web. Both homepages are listed in Appendix H, “Related Publications” on page 397. Note: Some older documentations refer to the Replication Administration software as DPRTools.
Step 17: Set Up DJRA Connectivity to DB2 / DataJoiner Databases Set up DJRA to access all DB2 and DataJoiner databases contributing to your replication scenario. The main topic here is to create a password file that will contain userids and passwords for all DB2/DataJoiner replication sources and targets. DJRA will use the password file when connecting to replication source and target databases. Use DJRA’s Preference menu (Option: Connectivity) to populate the password file. The password file will be stored in the DJRA working directory. Please note that you have to restart DJRA after cataloging additional databases into the administration workstation’s database directory. DJRA will pick up all databases from the database directory at startup time.
Step 18: Modify the DJRA User Exits (Optional) The DJRA user exits can be modified to customize DJRA. Separate user exits are available to define default-logic for replication definitions: • TBLSPACE.REX can be modified to influence tablespace definitions when the replication control tables are defined. • SRCESVR.REX: can be modified to change the default attributes and placement of CD and CCD tables when defining tables as replication source. • TARGSVR.REX and CNTLSVR.REX: can be modified to change the default attributes and placement of target tables when defining replication subscriptions. Customizing the user exits provided is a useful option, especially when the database objects generated during replication setup have to fulfill strict naming conventions, but it is not a requirement. The standard user exits named above include several examples that explain how to modify the defaults.
General Implementation Guidelines
73
4.4.4 Create the Replication Control Tables Basically, you use DataJoiner Replication Administration to generate and run SQL scripts. These SQL scripts will generate and populate the DProp control tables at either the replication sources or the replication target systems. Therefore, the first action after installing DJRA is to create the replication control tables at all the replication source servers and all the replication control servers.
Step 19: Create the Control Tables at Replication Source Servers Use the DJRA function "Create Replication Control Tables" to create all the DProp control tables at the replication source servers. Select a DataJoiner database in combination with a non-IBM data source name to create the control tables for a non-IBM replication source (in fact, you select the DataJoiner database, and then DJRA provides you with the list of non-IBM data sources that are accessible from this DataJoiner database). DJRA automatically generates CREATE NICKNAME statements for all control tables that are created at a non-IBM source server. Use the following SQL statement to query all successfully defined nicknames within a DataJoiner database: SELECT TABSCHEMA, TABNAME, REMOTE_TABSCHEMA, REMOTE_TABNAME, REMOTE_SERVER FROM SYSCAT.TABLES WHERE REMOTE_SERVER IS NOT NULL;
Advice: Use this select-statement to create a view called NICKNAMES.
Step 20: Create the Control Tables at Replication Control Servers Use the DJRA function "Create Replication Control Tables" to create all the DProp control tables at the replication control servers.
4.4.5 Bind DProp Capture and DProp Apply After the control tables are successfully created, you are able to bind the DProp Capture and DProp Apply programs to source and target databases.
Step 21: Bind DProp Capture (if Required) Generally you will want to bind DProp Capture against DB2 replication source servers. If all your source servers are non-IBM databases, there is, of course, no need to bind DProp Capture.
74
The IBM Data Replication Solution
Step 22: Bind DProp Apply Each DProp Apply instance has to be bound against all replication source servers, all replication target servers, and all replication control servers that will be accessed by that Apply instance. On the OS/390 platform, for example, considering that source server, control server and target server are not identical, all Apply packages have to be bound against all locations that Apply will connect to during replication: BIND PACKAGE(..<pakagename>)
Finally, the bind job has to include a BIND PLAN statement, including all different locations Apply is bound against. (Note that the following example is applicable to DB2 for OS/390 V5 only. Examples referring to other DB2 releases are included within the product documentation.) BIND PLAN(ASNAP510) PKLIST (loc1.collection-id.*,loc2.collection-id.*, ...)
Later on, if you are adding a new location to the replication scenario, just bind Apply’s packages to the new location and rebind the plan after adding the new location to the PKLIST. OS/390 Remark: A sample bind job is included within the DProp documentation as well as on the DProp installation media. Performance Advice: • Do not change the recommended isolation levels provided in the DProp documentation or in the sample bind jobs. • Refer to the DB2 Replication Guide and Reference, S95H-0999 for the syntax of the bind command appropriate to your platform. • Make use of the BLOCKING ALL bind parameter on UNIX and Intel platforms. This will enable Apply to use block fetch when fetching change data from the source systems. • Be aware that the default for the CURRENTDATA bind option valid for DB2 for OS/390 has changed from DB2 version 4 to version 5. With DB2 for OS/390 version 5, CURRENTDATA(YES) was introduced as the default bind option (until DB2 version 4, CURRENTDATA(NO) was the default). To enable block fetch for DB2 for OS/390, add the CURRENTDATA(NO) bind parameter to Apply’s bind job, if it is not already present. Refer to the DB2 Replication Guide and Reference, S95H-0999 for platform specific issues.
General Implementation Guidelines
75
4.4.6 Status After Implementing the System Design After performing the setup steps above in the given sequence, your replication system is ready for use! The next steps will be to: 1. Use DJRA to define your replication source tables (that is, register the replication sources). 2. Use DJRA to define your replication subscriptions. 3. If you are operating Apply on UNIX or Intel platforms, do not forget to create a password file for each of your Apply qualifiers before you start Apply. No password file is required for Capture. 4. If your replication source is DB2, start DProp Capture. (Refer to the DB2 Replication Guide and Reference, S95H-0999 for operational issues on how to start Capture on the system platform you are using) You will need to start Capture with at least the DProp Source Server parameter (telling Capture which DB2 database to monitor). 5. Start DProp Apply (Refer to the DB2 Replication Guide and Reference, S95H-0999 for operational issues on how to start Apply on the system platform you are using). You will need to start Apply with at least the following parameters:
Apply Qualifier (telling Apply which subscriptions are to be serviced by this Apply instance) DProp Control Server (telling Apply in which DB2/DataJoiner database the replication control tables are located) Advice: Be aware that the Apply Qualifier is a case-sensitive parameter. Pass it to Apply in upper case, if it is stored in upper case within the DProp control tables. Refer to the case studies detailed in the second part of the book to see how the checklist can be practically used during the implementation phase of a replication project. If you want to learn more about replicating from non-IBM source databases, refer to “Case Study 1—Point of Sale Data Consolidation, Retail” on page 139 (Informix replication sources). To get a deeper insight into replication examples replicating to multi-vendor databases, have a look at “Case Study 2—Product Data Distribution, Retail” on page 173 (Microsoft SQL Server) or “Case Study 3—Feeding a Data Warehouse” on page 203 (Oracle).
76
The IBM Data Replication Solution
4.5 Next Steps—Implementing the Replication Design The replication design is implemented by defining replication source tables and replication subscriptions. Following the general implementation checklist, the replication administration workstations are already configured, so that the DataJoiner Replication Administration can be used to set up the required definitions. Although we will not go into too much detail, we want to use the remaining sections of this chapter to give you an overview of the next setup steps. We will be dealing separately with: • Implementing the Replication Design for Multi-Vendor Target Servers • Implementing the Replication Design for Multi-Vendor Source Servers For all further details about using DJRA, please refer to the DB2 Replication Guide and Reference, S95H-0999 and to the DataJoiner Planning, Installation and Configuration Guide, SC26-9150 (Starting and Using DJRA).
4.5.1 Replication Design for Multi-Vendor Target Servers The following steps are necessary to implement a replication design towards non-IBM target databases. 4.5.1.1 Register the Replication Sources Define your DB2 replication sources first. 4.5.1.2 Define an Empty Subscription Set When you choose to replicate to a non-IBM database, you define the DataJoiner database as the DProp target server. This DataJoiner database needs to have a server mapping and at least one user mapping to the non-IBM database that will finally contain the replication target tables. 4.5.1.3 Add Subscription Members to the Set If you want to replicate to a non-IBM database, you need to specify the server mapping of the non-IBM target database when you add a member to a subscription set (DJRA lists all the non-IBM databases that have a server mapping defined within the DataJoiner database) . Remark: To avoid distributed transactions all the non-IBM target tables grouped together within one subscription set must be located in the same non-IBM target database. All the changes to all the target tables of one subscription set will be applied within the same unit-of-work.
General Implementation Guidelines
77
Figure 16 shows the replication control information and database objects which are created when adding a member to a subscription set, if the target table is a non-IBM database table:
DataJoiner Database
Multi-Vendor Database
Create Table
DJRA
Add a Member to Subscription Sets
Create Nickname
Target Nickname
Target Table
Insert SUBS_MEMBR
Insert SUBS_COLS
Figure 16. Define a Non-IBM Table as a Replication Target
If there is no nickname defined within the DataJoiner database that corresponds to the target table qualifier and target table name specified when adding the member to the subscription set, DJRA will generate DDL to natively create the target table within the non-IBM target database. Additionally, one DataJoiner nickname is automatically created to enable transparent access to the new target table. It is also possible to replicate into an existing non-IBM target table. To achieve this, all you have to do is manually create a nickname on the existing table, before using DJRA to add the member to the subscription set.
78
The IBM Data Replication Solution
Data Type Compatibility
DJRA creates all the non-IBM replication target tables using DataJoiner’s PASSTHRU mode. DataJoiner therefore automatically uses forward type mappings to map all heterogeneous data types into DB2 data types when creating the nickname. Because DJRA has knowledge of all the data types of all the table columns that are going to be replicated, DJRA automatically creates data type fixups for the target table nickname when the source server’s data types and the type mappings chosen by DataJoiner are not completely compatible. Type fixups can be necessary for DATE, TIME, and TIMESTAMP columns, for example. To create these data types fixups, DJRA generates ALTER NICKNAME statements. It is a good idea to let DJRA create a target table (including possible data type fixups), even when the non-IBM target table already exists (use a different table name, for example). Compare the created data types, including any fixups, with the data types of the already existing table. Advice: Alternatively, you can use the transparent DDL feature of DataJoiner V2.1.1 to direct a CREATE TABLE statement to the non-IBM database. Using this method, no datatype fixups are necessary. For more information refer to the DataJoiner SQL and Application Programming Reference Supplement, SC26-9148. 4.5.2 Replication Design for Multi-Vendor Source Servers The following steps are necessary to implement a replication design from non-IBM source databases. As we already introduced, change capture for non-IBM databases, such as Oracle, Informix, Microsoft SQL Server and Sybase SQL Server, is achieved by creating capture triggers upon the multi-vendor replication source tables. This section will help you understand how to set up non-IBM tables as sources for replication. 4.5.2.1 Create a Nickname for the Non-IBM Source Table Before you register a non-IBM database table, create a nickname for the non-IBM source table.
General Implementation Guidelines
79
4.5.2.2 Register the Nickname as a Replication Source Once the nickname is created, you can register it by using the DJRA functions Define a Table as Replication Source or Define Multiple Tables as Replication Sources. When you register a nickname that corresponds to a non-IBM source table, DJRA automatically creates a script including native CREATE TRIGGER statements for the following triggers: • Capture changes triggers • Pruning trigger Figure 17 shows the replication control information and database objects that are created when defining a non-IBM database table as a replication source:
DataJoiner Database
Multi-Vendor Database Insert Update
Source
Delete
Source Table
Create Triggers Nickname
DJRA
Create Nickname Create Table
Define a Table as Replication Source
CCD Nickname
Drop and Re-create Trigger
Pruning Control Nickname
CCD Table
Prune
Pruning Control
Insert Register Nickname
Reg_Synch Nickname
Figure 17. Define a Non-IBM Table as a Replication Source
80
The IBM Data Replication Solution
Register
reg_synch
Reg_Synch
As you can see, DJRA generates DDL to create a Change Data table in the non-IBM database for every replication source table—to be more precise, it is a Consistent Change Data (CCD) table—and a nickname for the CCD in the DataJoiner database. Additionally, DDL statements to create the capture triggers are generated. Remarks: • Notice that the REGISTER, the PRUNCNTL, and the REG_SYNCH table are already present in the non-IBM database, and that there is a nickname for each of those tables in the DataJoiner database. These tables and the corresponding nicknames are created when you create the Control Tables. • Some database systems, such as Informix Dynamic Server Version 7 or Microsoft SQL Server (without setting sp_dbcmptlevel to 70), support only one trigger per SQL operation on a database table. This means that, for one source table, you can create only: • One trigger for INSERT • One trigger for UPDATE • One trigger for DELETE Some of those systems do not even issue a warning (Informix, to their honor, does) when you create a new trigger (say, for INSERT) on a table that already has a trigger defined for this SQL operation. Therefore, the database administrator must be careful not to overwrite existing triggers. In this case all new trigger function required for change replication has to be manually integrated into the existing triggers. As you can imagine, DataJoiner Replication Administration will not compensate missing database functionality for those non-IBM database systems. But DJRA is smart enough to help the database administrator by issuing a WARNING whenever non-IBM native triggers are created or removed. DJRA does this for all supported non-IBM databases, regardless of whether multiple triggers per SQL action are supported or not. You will have to decide whether the Capture triggers can be created as generated, or whether the generated trigger logic has to be integrated into an existing trigger when a non-IBM table is defined as a replication source. The same is also true when you remove the definition of a replication source. Either remove the triggers or adapt the existing ones.
General Implementation Guidelines
81
4.6 Summary In this chapter, we focussed on all activities that are necessary to set up a multi-vendor, cross-platform replication system. We provided an implementation checklist that we recommend for use whenever a heterogeneous replication system is to be implemented (a one-page copy of the implementation checklist can be found in Appendix C, “General Implementation Checklist” on page 337). If you are setting up a multi-vendor replication system yourself, mark the checkboxes of all completed task until you are finished. After all setup tasks named in the general implementation checklist are completed and after the replication design has been defined and tested, your replication system is ready to use. The next steps would be to carry over all system components and all tested replication definitions to your production system environments. Before doing so, we will use Chapter 5, “Replication Operation, Monitoring and Tuning” on page 83 to discuss the major operation, monitoring and maintenance topics of a distributed replication system using IBM DProp and IBM DataJoiner.
82
The IBM Data Replication Solution
Chapter 5. Replication Operation, Monitoring and Tuning Before carrying the tested replication system over into your production environment, we will spend some time discussing operational tasks of a heterogeneous replication system.
5.1 Overview But before we start, we want you to reposition yourself again by having a look at the introductory diagram shown in Figure 18:
Approaching Data Replication
Replication Planning
DProp Replication Initial and Repetitive Tasks
Replication Design
General Implementation Guidelines
Operation, Monitoring & Tuning
Monitoring a Distributed Replication System
Tuning Replication Performance
Other Useful Techniques
Figure 18. Reposition Yourself: Chapter 5—Overview
Following the work breakdown approach that guides us through the complete book, we will now discuss operational issues. In the previous chapter we already gained first experiences on a multi-vendor replication system while designing and implementing a first solution. That means, we can already assume some expertise on working with distributed replication systems. This chapter contains a lot of detailed information. It is natural, that you will not follow every thought while browsing through the different parts of this chapter for the first time. But the more time you spend on tuning and
© Copyright IBM Corp. 1999
83
enhancing your replication solution, the more questions you will raise. Therefore, take this chapter as a reference. If you do not get the point at once, come back to those sections when appropriate. Figure 18 on page 83 displays the major sections described throughout this chapter: • Initial tasks: The most important initial tasks will be to start the capture process (considering a DB2 replication source system) and to initialize your replication subscriptions. We will provide some advanced techniques, especially on how to initialize the replication subscriptions outside of IBM DProp. • Repetitive tasks: Some of the housekeeping tasks we are going to introduce are automatically performed by IBM DProp; others will result in the extension of existing housekeeping tasks for your replication source and target databases. • Monitoring: Before carrying a tested data replication application over to a production system, the replication system has to fulfill operational requirements. The ideal case is, as always, an unattended, self-repairing operational mode. Due to its importance, we have devoted a separate section in this chapter to discuss monitoring aspects. • Tuning: A multi-vendor, cross-platform relational data replication system has many autonomous components. All components are connected using some network link. All together, this is reason enough to admit a separate section within this chapter for the discussion of tuning considerations. • Other useful techniques: Looking at the billboards of our database competition along the highways, smart techniques are becoming more and more important these days. We can reassure you that we have lots of those. Some are revealed in the latter portion of this chapter.
5.2 Operating and Maintaining DProp Replication—Initial Tasks Before changes can be replicated from your source databases to your target databases, a consistent starting point for the replication system has to be established. The contents of the target databases have to be synchronized with the source databases, and the starting conditions have to be established in order for the Capture and Apply components to work properly. In this section we describe the necessary tasks and give you some background information on how the components work together.
84
The IBM Data Replication Solution
5.2.1 Initialization and Operation of the Capture Task The Capture process should be treated as a key system task. If DB2 is active, Capture should also be active. If Capture is not active, there will be no new change data available for Apply, regardless of how often Apply connects to the source system. OS/390 Remark: In most known installations, Capture for OS/390 is operated as a started task. AS/400 Remark: Capture can be started at each IPL. The best way to do this is to include the start of the QZDNDPR sub-system and of Capture in the QSTRUP program. Nevertheless, it is possible to stop Capture, for example, to perform certain housekeeping tasks that need exclusive access to database objects maintained by Capture (Refer to 5.3.1, “Database Related Housekeeping” on page 91 for examples). When Capture stops, it writes the log sequence number of the last successfully captured DB2 log record (or AS/400 journals) into one of the DProp control tables (ASN.IBMSNAP_WARM_START), so that it can easily determine a safe restart point. Even in those cases where it was impossible for Capture to write the WARM start information when shutting down (for example, after a DB2 shutdown, using MODE FORCE), Capture is able to determine a safe restart by evaluating SYNCHPOINT values stored in other replication control tables, such as the register table. (The only assumption is: Capture has been successfully capturing changes before the hard shutdown.) To avoid a Capture COLD start in a production environment, always start Capture using start option WARMNS, which means that Capture either resumes capturing at the position of the DB2 log (or AS/400 journal) where it had stopped previously, or terminates if this is not possible. Without start option WARMNS, Capture will automatically switch to a COLD start if a WARM start, for what ever reason, is impossible. Mobile or Satellite Remark: If IBM DProp is used on mobile systems, both Capture and Apply can be started on demand. For mobile application platforms, which synchronize with central servers at certain times only, Capture can be invoked in mobile mode. In this mode, change capture is started at the last successfully captured log record, and stopped automatically if the end of the DB2 log is reached.
Replication Operation, Monitoring and Tuning
85
5.2.2 Initialization of Replication Subscriptions When DProp Apply services the subscription sets that you previously defined, it will always check for replication consistency first. When Apply detects that it is servicing a subscription set for the first time ever, Apply will automatically initialize the replication target tables by copying all required data directly from the source tables to the target tables. Using DProp terminology, we call this a full-refresh (we are assuming here that the target tables are complete). Considering huge source tables, a full-refresh can be quite an expensive activity. Also take into account that Apply has to access the replication source tables (your application tables) directly during a full refresh. Remark: Apply always inserts the data that it fetched from the replication source server within a single transaction into the target tables, to guarantee target site transaction consistency at subscription set level. This also applies to the full refresh. To let you control the replication initialization, DProp offers a lot of freedom and flexibility in how the initial full refresh task is performed. 5.2.2.1 Initial Refresh Maintained by Apply If you decide to let Apply perform the initial refresh automatically (this is the default), you can skip over the following paragraphs regarding initial refresh. Nonetheless, if you are interested, you will find a lot of useful information to better understand how IBM DProp guarantees data consistency. Before skipping over the following paragraphs, let us just recall that an automatically maintained initial refresh consists of the following two steps: 1. Handshake between Capture and Apply 2. Moving data from the source tables to the target tables OK, now it is safe to go ahead and jump to 5.2.2.3, “Manual Refresh / Off-line Load” on page 89! 5.2.2.2 Initial Refresh - A Look Behind the Curtain DProp maintains copy integrity by an information exchange protocol that lets Capture and Apply communicate with each other. This protocol assures that Apply can detect when copy integrity is compromised, and that Capture knows when to start capturing, and when and to what extent it is safe to prune rows that are no longer needed from the change data tables. The information interchange is facilitated by independent, asynchronous updates to the DProp control tables.
86
The IBM Data Replication Solution
We will reveal some details about DProp’s full refresh protocol here, because this background knowledge will be helpful when you decide to perform the initialization of your target tables yourself.
How Apply Initiates the Handshake Consider that Apply has already decided that a subscription set needs to be full-refreshed. As already mentioned, moving data is not the only activity during the full refresh. Even more interesting is how Capture and Apply perform the initial handshake, because this handshake has to be replayed when initializing the replication target tables manually. Before fetching data from the replication source tables, Apply lets Capture know that it is starting to perform the initial refresh. Apply actually initiates the handshake by setting the SYNCHPOINT value of the corresponding row of the pruning control table ASN.IBMSNAP_PRUNCNTL to hexadecimal zero (x’00000000000000000000’), which is used as a trigger value. Figure 19 references this step as step (1). Apply updates the pruning control table for every member belonging to the subscription set that is being initialized. DB2 logs these updates, as usual, which is visualized in step (2).
How Capture Detects a Handshake Request Capture detects Apply’s handshake request while monitoring the DB2 log (or AS/400 journal). The pruning control table is created with the DATA CAPTURE CHANGES attribute, so that Capture can read changes to the pruning control table from the DB2 log. Refer to step (3) in Figure 19. Remark: On an AS/400, starting the journalization for a physical file is equivalent to setting the Data Capture Changes attribute on the other platforms. Upon seeing a DB2 log record indicating that the SYNCHPOINT column has been updated to hexadecimal zero for a subscription member, Capture immediately translates the hex zero synchpoint into the actual log sequence number of the log record read. The log sequence number value is retrieved from the header of the log record that contains the update to x’00000000000000000000’. See step (4) in Figure 19.
Replication Operation, Monitoring and Tuning
87
Pruning Control Table SYNCHPOINT Column LSN (e.g.: x ’0000...1234’)
x ’0000...0000’
4
1
APPLY
2
CAPTURE
(or user)
DB2 Log Record
3
Synchpoint set to x ’0000...0000’
Figure 19. Initial Handshake between Capture and Apply
Apply will take the translated synchpoints into account when it performs the next replication cycle (for that subscription set). The translated synchpoint tells Apply exactly, when it initiates the initial refresh. Apply now knows that all CD table records with a higher log sequence number (LSN) are awaiting replication, and that all updates with a lower log sequence number have already been included within the initial refresh. If Capture is not already capturing changes to a replication source, it immediately starts capturing all the changes to that source table that were logged after Apply’s handshake request. That, of course, includes all the updates to a source table that were made while Apply was fetching data during the full refresh. (This is OK—Apply will make use of its rework capability if it replicates data that has already been included in the initial refresh.) Capture acknowledges every successful handshake by issuing the following message in the Capture trace table (ASN.IBMSNAP_TRACE) and to the log file (or joblog): ASN0104I: Change capture started for owner ""; table name is "" ...,
This message is also known as Capture’s GOCAPT message.
88
The IBM Data Replication Solution
Remark: Apply’s automatic full refresh capability is available for all supported replication source and replication target databases, and that implies, of course, cross-platform, multi-vendor full refresh capability. 5.2.2.3 Manual Refresh / Off-line Load After understanding how Apply initiates a full refresh for a target table, we will now better understand what is necessary to perform the full refresh manually. Basically, if you decide to perform the initial load of your replication targets yourself, your responsibilities will be: • To guarantee that replication source and replication target are synchronized (by loading the target tables), and • To let DProp know about it, by updating the replication control tables as explained it in 5.2.2.2, “Initial Refresh - A Look Behind the Curtain” on page 86. The DataJoiner Replication Administration component will guide you through this process and will generate all SQL statements necessary to update the replication control tables for you. The only thing you have to do is to execute the generated SQL statements and to unload and load the data. DJRA will guide you through the correct sequence in which the above steps have to be executed.
Loading the Target Tables Manually The sequence of steps that DJRA will generate for you to manually load your replication target tables is: 1. Disable automatic refreshes from the source table. 2. Deactivate the subscription set and update the replication control tables to synchronize Capture and Apply (handshake request). 3. Unload the replication source tables. 4. Load the replication target tables. 5. Reactivate the subscription set. Some Background Information: The Capture-Apply synchronization should occur before the unload/export step. That way, any updates that occurred while the unload/export was in progress will be captured and will propagate with the next differential refresh, in the subscription cycle that follows the full refresh. If the Capture-Apply synchronization occurs after the unload/export, then there is a chance that some updates will never propagate to the target.
Replication Operation, Monitoring and Tuning
89
Capture interprets Apply’s handshake request as a starting signal to begin the change capture activity for a certain source table. If you want to update the replication control tables to simulate Apply’s handshake request after unloading the source data, you will need to quiesce the source applications during the refresh (or exclusively lock all replication source tables) to prevent any changes. Summarizing, if you follow the guidelines provided by DJRA and you synchronize Capture and Apply before you unload the source data, you can perform the manual full refresh while your source applications are running. Possibly, data that has been changed during the unload phase could be included in the unload data set (or file) as well as in the change data table. Apply will replicate those records again during the first replication cycle following the initial refresh: Due to its rework capability for condensed target tables, Apply will successfully re-replicate those changes. Inserts will automatically be reworked into updates, updates will be reworked into updates to itself, and deletes of rows that are not present just have no effect at all.
What if You Consider Your Copies Already Being Initialized? If you are sure that your replication source and your replication target tables are already synchronized, there is no obvious need for loading your target tables again. Nevertheless, DProp Capture and DProp Apply will not start their job unless you tell them that the full-refresh has been done (by initiating their initial handshake as explained in “Loading the Target Tables Manually” on page 89). Note: Without any manual interaction, Apply always tries to automatically perform the full refresh itself. Disabling automatic refresh by setting ASN.IBMSNAP_REGISTER.DISABLE_REFRESH = 1 as described in 5.6.2, “Selectively Preventing Automatic Full Refreshes” on page 129 will not be sufficient. Apply will issue the following error message instead: ASN1016I: Refresh copying has been disabled ...
Use the DataJoiner Replication Administration feature Off-Line Load as described in the previous sections, but omit the steps to unload and load the data. Just execute the generated SQL scripts to initialize Capture and Apply. Remarks for Heterogeneous Environments: The above mechanisms apply both to DB2 replication sources as well as to multi-vendor sources. As there is no Capture process at non-IBM replication sources, triggers will emulate Capture’s role during the initialization of a replication subscription.
90
The IBM Data Replication Solution
5.3 Operating and Maintaining DProp Replication—Repetitive Tasks As repetitive tasks we identified housekeeping activities to manage space allocated by change data tables and replication control tables. After discussing standard database related housekeeping tasks for change data tables, we will focus on how records that have already successfully been replicated are periodically removed from the change data tables again.
5.3.1 Database Related Housekeeping We generally recommend that you reorganize all volatile and reasonably sized database tables on a regular basis in order to reclaim space. In the DProp replication environment, this is especially applicable to the change data tables, the unit-of-work table, and for some of the replication control tables. 5.3.1.1 Reorganizing CD Tables and the Unit-of-Work Table The change data tables and the unit-of-work table receive heavy INSERTS during change capture and heavy DELETES during pruning. Those tables are never updated. The necessity for reorganizing the change data tables and the unit-of-work table (ASN.IBMSNAP_UOW ), of course, depends on the update rates against the replication source tables. As a rule of thumb, reorganize the change data tables and the unit-of-work table about once a week. On OS/390: If using DB2 for OS/390 Version 5 or higher, specify the PREFORMAT REORG option. Preformatting the tablespace will speed up Capture’s insert processing. On AS/400: Run the RGZPFM command on all the change data tables and on the unit-of-work table, once a week. 5.3.1.2 Reorganizing the DProp Control Tables Basically the same rule also applies to some of the DProp control tables, especially the Capture trace table ( ASN.IBMSNAP_TRACE) and the Apply trail table (ASN.IBMSNAP_APPLYTRAIL).
5.3.2 Pruning DProp terminology uses the term pruning for the process of removing records from the change data tables that have already been replicated to all targets. Capture performs the pruning for the change data tables, the unit-of-work table, and the Capture trace table. Manual pruning has to be established for
Replication Operation, Monitoring and Tuning
91
certain types of consistent change data (CCD) tables and for the Apply trail table. 5.3.2.1 Automatic Pruning of CCD Tables and Unit-of-Work Table Change data tables (and the unit-of-work table for DB2 replication source servers) have the potential of unlimited growth. Therefore, DProp has to provide a mechanism to prevent change data tables from running out of space. It is Capture’s duty to perform the pruning for change data tables and the unit-of-work table. For multi-vendor replication source systems, the pruning mechanism is provided by a trigger, defined on the pruning control table (refer to 6.7.1, “Using Triggers to Emulate Capture Functions” on page 166, for a deeper insight into how triggers are used to achieve change capture). Without going into the details, pruning can be considered to be a very CPU-consuming process. Therefore, we recommend that you defer pruning to off-hours by starting the Capture program using the NOPRUNE start option, then launch pruning when appropriate by using Capture’s PRUNE command. The same result can be achieved for multi-vendor replication sources, by temporarily disabling (or dropping) the pruning trigger. AS/400 Remark: On AS/400, there is no NOPRUNE parameter, but you can achieve the same result by specifying *NO for the Start Clean-Up parameter of the STRDPRCAP command. Since there is no PRUNE command, you will have to stop Capture and restart it with Start Clean-Up *IMMED when you want to start the pruning. 5.3.2.2 Pruning of CCD Tables Capture performs pruning operations only for those tables that it maintains itself (which are, as explained above, the change data tables and the unit-of-work-table). CCD tables (consistent change data tables) are maintained by Apply. They are not automatically pruned by Capture. For some types of CCD tables, by replication design, pruning is not required: • Complete condensed CCD tables are updated in place, so that they do not grow without bound. The only records that could be removed from these CCD tables are those with IBMSNAP_OPERATION equal to ’D’ (Delete) that have already been propagated to the dependent targets.
92
The IBM Data Replication Solution
• Non-condensed CCD tables that contain history, with the assumption that you wish all the data to be retained. On the other hand, CCD pruning is an issue for internal CCD tables. This type of table will grow if there is large update activity, and it could reach the size of a complete CCD table. Yet, there is no value in letting this table grow, as only the most recent changes will be fetched from it. To enable pruning for internal CCD tables, you might want to add an SQL-After statement to the internal CCD table's subscription to prune change data that has already been applied to all dependent targets. Instead of letting Apply launch the pruning statement via SQL-After processing, you could also add the pruning statement to every other automatic scheduling facility. A crude, but effective statement for internal CCD table pruning would be: DELETE FROM . WHERE IBMSNAP_COMMITSEQ <= (SELECT MIN(SYNCHPOINT) FROM ASN.IBMSNAP_PRUNCNTL);
This will prune behind the slowest of all the subscriptions, not just those subscriptions which refer to the source table associated with the internal CCD. You might want to improve the pruning precision by adding the replication source table to the subselect: DELETE FROM . WHERE IBMSNAP_COMMITSEQ <= (SELECT MIN(SYNCHPOINT) FROM ASN.IBMSNAP_PRUNCNTL WHERE PHYS_CHG_OWNER = ’’ AND PHYS_CHG_TABLE = ’’;
To find out all internal CCD tables together with their source and change data tables that are defined within your replication system, run the following query at the source server: SELECT SOURCE_OWNER, SOURCE_TABLE, PHYS_CHG_OWNER, PHYS_CHG_TABLE, CCD_OWNER, CCD_TABLE FROM ASN.IBMSNAP_REGISTER WHERE CCD_OWNER IS NOT NULL;
5.3.2.3 Pruning of the APPLYTRAIL Table At the end of each subscription cycle, DProp Apply reports subscription statistics in the Apply trail table (ASN.IBMSNAP_APPLYTRAIL). The Apply
Replication Operation, Monitoring and Tuning
93
trail table is located at the replication control server. For each subscription set and cycle, Apply will insert one single row into the Apply trail table. To keep the table from growing too large, these rows need to be deleted from time to time. When to delete these rows is entirely up to you. Apply writes to the ASN.IBMSNAP_APPLYTRAIL table, but never reads from it again. An easy way to manage the growth of this table is to add an SQL-After statement to one of your subscription sets. Alternatively, you could also add the pruning statement to every other automatic scheduling facility: DELETE FROM ASN.IBMSNAP_APPLYTRAIL WHERE LASTRUN < (CURRENT TIMESTAMP - 7 DAYS);
If you are one of those more sophisticated types of DBAs, your SQL statement could look like the following example instead: DELETE FROM ASN.IBMSNAP_APPLYTRAIL WHERE ( STATUS = 0 AND EFFECTIVE_MEMBERS = 0 AND LASTRUN < (CURRENT TIMESTAMP - 1 DAYS)) OR ( STATUS = 0 AND EFFECTIVE_MEMBERS > 0 AND LASTRUN < (CURRENT TIMESTAMP - 7 DAYS)) OR ( LASTRUN < (CURRENT TIMESTAMP - 14 DAYS)) ;
The statement shown above will prune the Apply trail table in stages: • All Apply status messages, reporting that nothing was replicated (EFFECTIVE_MEMBERS = 0), and also that no error occurred during replication (STATUS = 0), will be removed first (after 1 day). • All Apply status messages that report some replication action will stay in the table a little bit longer (for example, 7 days). We detect that data was actually replicated within one subscription cycle by specifying EFFECTIVE_MEMBERS > 0. • All other messages, possibly those reporting replication problems, will stay longer. We can prevent error messages from being pruned earlier by restricting the first two predicates to STATUS = 0. Feel free to adjust the time period that the statistics records stay in your Apply trail table. For example, if you are replicating continuously, you will
94
The IBM Data Replication Solution
want to prune the Apply trail table more frequently than if you are replicating just once a week. Remark: You can even occasionally delete everything from the Apply trail table. However, you had better not do that for one of the other replication control tables. So be careful when typing in the SQL statement! 5.3.2.4 Journals Management on AS/400 On the AS/400, the journal receivers used by the Capture program must be regularly deleted to reduce the used disk space, but you must not remove journal receivers that are still required by the Capture program. If you are running OS/400 V420 or a later version, a system exit prevents you from removing receivers that are still needed by the Capture program. We recommend that you indicate MNGRCV(*SYSTEM) when you create the journals, and that you indicate a threshold when you create the journal receivers. If you are running OS/400 V410, you must use the ANZDPRJRN command to safely remove the receivers that are no longer needed, and we recommend that you create the journals with MNGRCV(*USER).
5.3.3 Utility Operations Some DB2 related utility operations bypass DB2 logging when they manipulate data stored within replication source tables. Without logging, Capture is not able to recognize those changes. That means, the changes to replication source tables made by these utilities do not make their way into the change data tables (and will never replicate). In the following sections we will give an overview of utility operations that need special care taken when executed against database tables that are registered as replication sources. Remark: On the AS/400, the following operations on a source table will be detected by the Capture program, and will produce a full-refresh the next time the Apply program is run : restoration (RSTOBJ, RSTLIB), dejournalization (ENDJRNPF), clear file (CLRPFM). 5.3.3.1 Table Loads Performing LOAD operations against your DB2 replication source tables causes them to be out of synch from their copies. Depending on your application, this may or may not be a concern to you.
Replication Operation, Monitoring and Tuning
95
DB2 LOAD utilities do not perform change data capture logging. For better performance, LOAD does as little work as possible, leaving little indication of its presence. If you are periodically performing LOAD (REPLACE) or LOAD (RESUME) operations against your source tables, outside of replication control, you might want to continue the job and also perform these utility operations against your copies. Or, you may want to drive the replication software to re-initialize those subscription sets which refer to source tables that have recently had a LOAD operation. In any case, be aware that your existing operations may potentially cause data consistency errors with respect to the copies of the tables you load, and you may need to modify or expand your LOAD procedures. To see how to issue a re-synch request for your replication target tables, refer to 5.6.3, “Full Refresh on Demand” on page 132. 5.3.3.2 Recovery As with load processing, your RECOVER procedures should consider the effect on data consistency with copies derived from source tables that needed a RECOVER. You may want to expand your procedures to perform a coordinated recovery of a source table and all its copies, or you may want to drive the replication software to re-initialize those subscription sets which refer to source tables for which there was a recent RECOVER operation, or you may decide to tolerate any data consistency errors resulting from your RECOVER procedures. 5.3.3.3 Pseudo-ALTER Tools Third-party "Pseudo-ALTER TABLE" tools appear to add a number of functions to DB2's ALTER TABLE statement, among these are: • Rename a column • Change the length of a column • Change the data type of a column • Delete a column These tools do not actually alter a table, but rather unload, drop, re-create and load a new table in place of your existing table. Keep in mind that DB2 logs updates to a table based on internal identifiers, not by the names of the tables. From Capture's perspective (and DB2's) it is merely coincidental that this new table has a name matching the name of your old table. If you wish to continue using such tools, you will need to carefully coordinate the pseudo-alter processing:
96
The IBM Data Replication Solution
1. Stop all updating applications 2. Let Capture process all the log records written for the old table 3. Stop Capture 4. Run the pseudo-alter utility 5. Start Capture and restart your updating applications 5.3.3.4 Tablespace Reorganizations Customers using DB2 for OS/390 need to make only a slight, but important, change in their REORG procedures. Generally, DProp is not affected by tablespace reorganizations: • Capture is not affected, as the internal object identifiers do not change. • Apply is not affected, as key values, not internal row identifiers, are used to associate source and target rows. If you are using tablespace compression, it is very important that you specify the KEEPDICTIONARY REORG utility option, which is not the default. DB2 for OS/390 can keep at most one compression dictionary in memory per tablespace. Once a new compression dictionary is generated, such as during a REORG, it replaces the previous dictionary. The DB2 log interface cannot return log records written using an old compression dictionary. If Capture has already processed all log records written prior to the REORG, there is no problem. If Capture requests a range of log records written before the REORG, and at least one of the log records within the requested range was written using a compression dictionary that has changed as a result of a REORG, then DB2 will not return the requested range of log records through the log interface. Advice: If you plan on rebuilding your compression dictionaries, coordinate the REORG and Capture processing: 1. Stop all applications that usually update any table having the change data capture attribute and being created within a compressed tablespace (not just those tables registered as replication sources). 2. Let Capture process all the log records written using the ’old’ compression dictionary. 3. Stop Capture. 4. REORG, without specifying KEEPDICTIONARY.
Replication Operation, Monitoring and Tuning
97
5. Start Capture and restart your updating applications. Better yet, learn to live with the compression dictionaries you now have, resisting the urge to rebuild them.
5.4 Monitoring a Distributed Heterogeneous Replication System From an operator’s perspective, a new system application should run unattended and fix all unforeseen error situations automatically. The effort to deliver such a system (and we do not speak only of replication systems here), will be quite high. Before a replication application is carried over to the production environment, the replication applications should at least be able to detect error situations automatically. If errors cannot be solved autonomously, a message has to be sent out to ask for support. This section will give you an overview of replication monitoring issues and techniques. Before we go into details and focus on the several separate components of a distributed replication system, we will name all the components that will be subject to monitoring.
5.4.1 Components That Need Monitoring The following system components of a distributed replication system will have to be monitored to guarantee non-disruptive operation and promised target table latency: Source Database System: We generally assume that your replication source databases, either DB2 or multi-vendor, are already integrated into existing monitoring and systems management environments. Capture Process: If Capture is not running, no changes to the registered source tables will be captured at the replication source system. The monitoring system at least must be able to guarantee that Capture is running continuously. Some error situations could even be automatically resolved by the monitoring system itself. Section 5.4.3, “Monitoring Capture” on page 101 will demonstrate some advanced monitoring and troubleshooting techniques for the DProp Capture program. Tablespaces: Change data tables are stored in tablespaces. For some database systems or on some system platforms, those tablespaces can be created with a fixed size only. If the size limit is reached, Capture will not be able to capture any additional changes.
98
The IBM Data Replication Solution
Target Database System: We generally assume that your replication target databases, either DB2 or multi-vendor, are already integrated into existing monitoring and systems management environments. Apply Process: Especially in large data distribution environments, all Apply processes have to run mostly unattended. To make you aware of replication problems, the Apply processes have to be monitored. Section 5.4.4, “Monitoring Apply” on page 106 reveals common monitoring techniques for Apply. Middleware Server (IBM DataJoiner): A DataJoiner middleware server will be introduced when replicating to or from non-IBM database systems. When the DataJoiner instance is stopped, non-IBM sources or targets cannot be accessed by Apply. Refer to 5.4.5, “Monitoring the Database Middleware Server” on page 116 for appropriate monitoring techniques. Database Gateways: Assuming that a DRDA server is involved in a replication system (DB2 for OS/390, DB2 for VM/VSE or DB2 for AS/400), then a DRDA gateway is required to enable DB2 connectivity to those servers. DataJoiner’s built-in DRDA requester can be used, or a separate DRDA gateway can be installed, which will be used by several database applications on several servers. If a separate DRDA gateway is used, this is of course also subject to monitoring. Use the following sections to gain knowledge about recommended techniques on how to monitor the above system components.
5.4.2 DProp’s Open Monitoring Interface To fit into all different kinds of systems management and systems monitoring applications available on the marketplace, even considering that we are dealing with a distributed (multi-platform), heterogeneous (multi-vendor) environment, IBM DProp provides on open interface to all status information and error messages it produces. You may want to consider customizing the monitoring tools you are using for other applications to make use of DProp’s open interface. Most of the replication statistics are available within the replication control tables. We will use the following sections to introduce examples of how to make use of the information within the replication control tables to fulfill replication monitoring tasks.
Replication Operation, Monitoring and Tuning
99
For a detailed description of all DataPropagator control tables, including DDL for each of the different tables, refer to the DB2 Replication Guide and Reference, S95H-0999 Chapter "Table Structures". Additionally, the Replication Guide contains a listing of all the replication status and error messages that DProp might issue. 5.4.2.1 What Does the Interface Look Like? As already introduced, the DProp control tables contain replication status information. Additionally, and this statement is true for all operating system platforms, Capture and Apply issue some trace messages and all error messages into a log file or into the job log.
DProp Control Tables The easiest way to determine the current status of the replication system is to select the information available in the replication control tables. Some of the control tables can be used to determine the status of the change capture process, others are available to get an overview of the subscription status, the subscription latency, or to evaluate statistics about the data volume replicated within the most recent subscription cycles.
Log Files and Console Messages All error messages that are recorded in the replication control tables are also written into log files or into the job log, which make the messages available to those monitoring environments that are not capable of using database queries. Using automated operators to analyze the job log is quite a common technique on host-based operating system platforms, for example. In the following sections, we will provide you with queries against the DProp control tables, which extract useful monitoring information, and with techniques to work around replication problems.
Trace Finally, start Capture or Apply in trace mode, if the problem that is causing an error is not obvious: • Capture’s start option to enable trace mode is TRACE. Remark:This option is not available on AS/400 because the Capture/400 program writes a lot of information into the ASN.IBMSNAP_TRACE table. • Apply’s start option to enable trace mode is TRCFLOW. The generated traces contain a large amount of text output, including dumps and SQLCA information that can be used to determine the cause of the
100
The IBM Data Replication Solution
problem if the messages inserted into the replication control tables did not contain enough information. Advice: Only start Capture and Apply in trace mode, if you are investigating problems. The trace mode obviously slows the replication processes down and also generates a lot of output.
5.4.3 Monitoring Capture The Capture process is a very important task. If Capture is not running at all, no changes to the registered source tables will make their way into the change data tables. Additionally, Capture reads through the DB2 log sequentially. A problem with one replication source table can therefore delay change capture for the complete replication system. Considering this, the main monitoring tasks regarding the Capture program will fall into the following categories: • See if Capture is up and running • Detect and solve Capture problems as soon as possible 5.4.3.1 Monitoring the Capture Process Checking to see if Capture is up and running should be a task your monitoring system performs on a regular basis. Depending on the execution platform you are using, the facilities available to detect if a program is running will be different. The following example shows how to determine if Capture is running for UNIX operating systems: #!/bin/ksh ps -ef | grep ’asnccp <source_server>’ | grep -v grep | wc -l
The above command will return 1 if the Capture instance is active. On AS/400, you can check whether Capture is running by issuing the WRKSBS command, choose option 8 in front of the QZSNDPR sub-system, and check that job QDPRCTL5 is running. After the initial full-refresh has been done, you should also see several jobs having the same name as the journals. Remark: The QDPRCTL5 job is sometimes in a MSGATT status, but it does not necessarily mean that it is waiting for an answer to an error message. If it is waiting for an answer, there will also be an error message in the QSYSOPR message queue.
Replication Operation, Monitoring and Tuning
101
5.4.3.2 Detecting Capture Errors Capture errors can be detected by querying the Capture trace table ASN.IBMSNAP_TRACE. All available Capture messages can be retrieved by the following query (use descending order to see the most recent messages first): SELECT OPERATION, TRACE_TIME, DESCRIPTION FROM ASN.IBMSNAP_TRACE ORDER BY TRACE_TIME DESC;
Error messages only are retrieved by adding the following where-clause: SELECT OPERATION, TRACE_TIME, DESCRIPTION FROM ASN.IBMSNAP_TRACE WHERE OPERATION = ’MESSAGE’ AND SUBSTR(DESCRIPTION , 8 , 1) = ’E’ ORDER BY TRACE_TIME DESC;
Possible Capture errors could be caused by incorrect replication source definitions (for example, a replication source table is created without the data capture changes attribute, a change data table has been accidentally dropped, column types of change data table and source table do not match, or Capture does not have sufficient privileges to write into the change data table). All error messages dealing with incorrect replication definitions clearly identify the source table that is causing the problem. Correct the error or remove the failing registration to resume change capture for your source system. Capture records error messages, including a specific DProp message number, into the Capture trace table. A more detailed problem description, including possible solutions for all replication related problems, can be obtained from the DB2 Replication Guide and Reference, S95H-0999. 5.4.3.3 Capture Lag Log based change capture is an asynchronous process. Additionally, the Capture process can be scheduled to run in certain time windows only or to use a minor system priority so that it does not interfere with any source applications. Therefore, it is possible that Capture temporarily does not keep up with processing all log records as quickly as DB2 writes them. We call the time difference between the current timestamp and the timestamp of the last log record processed by Capture the Capture lag.
102
The IBM Data Replication Solution
With DProp V5, it has become very easy to determine the Capture lag, because Capture maintains a heartbeat within the register table. The register table contains a so-called global record, which is updated by Capture every time Capture commits its change capture activity. (Refer to 5.5.2, “Adjusting Capture Tuning Parameters” on page 118 to see how to customize Capture’s COMMIT interval.) Every time Capture commits, it updates the global record with the log sequence number (SYNCHPOINT) and the timestamp associated with the last processed log sequence number (SYNCHTIME). The Capture lag therefore can be calculated by comparing the CURRENT TIMESTAMP with the SYNCHTIME of the last processed log record: SELECT SECOND (CURRENT TIMESTAMP ((MINUTE (CURRENT TIMESTAMP) ((HOUR (CURRENT TIMESTAMP) ((DAYS (CURRENT TIMESTAMP) AS CAPTURE_LAG FROM ASN.IBMSNAP_REGISTER WHERE GLOBAL_RECORD=’Y’;
SYNCHTIME) + - MINUTE(SYNCHTIME)) * 60) + - HOUR (SYNCHTIME)) * 3600) + - DAYS (SYNCHTIME)) * 86400)
According to this query, the Capture lag is displayed in seconds. To see the actual timestamp of the log record most recently processed by Capture, just select the global record from the register table: SELECT SYNCHPOINT, SYNCHTIME FROM ASN.IBMSNAP_REGISTER WHERE GLOBAL_RECORD=’Y’;
Remark for Heterogeneous Environments: Capture triggers are synchronous. They commit within the same transaction as the source application. Therefore, there is no need to monitor a Capture lag for multi-vendor replication sources. 5.4.3.4 Resolving a Gap We can consider Capture to be a very robust process. Nevertheless, Capture has a severe problem when DB2 cannot offer the log records that Capture is requesting. Just to skip those log records would of course compromise the replication consistency. So what does Capture do if the DB2 log interface cannot deliver the log records requested by Capture? Right, Capture stops and issues an error message. In DProp terminology, we call this status a gap (that is, some piece of the DB2 log is missing).
Replication Operation, Monitoring and Tuning
103
If the error persists after restarting Capture, there are two options available to resume replication. Both options are a little bit tricky: 1. Capture COLD start: This option is tricky, because it will automatically cause a full refresh for all replication target tables. 2. Manually help Capture over the gap: This option is tricky, because it needs very sensitive manual interaction. But, if successfully performed, the consequences are much smaller than in option 1. In regard to unavailable log records, please consider that DB2 also might be in trouble if certain log records are no longer available. Also consider that, since Capture is able to process archived log records, a Capture gap because of unavailable log records should never happen. But to be prepared, we nevertheless want to go into more detail.
Resolving the Gap with a Capture COLD Start A Capture COLD start is what you probably want to avoid in a productive and highly-tuned distributed replication system. A COLD start will certainly resolve the gap, but a COLD start will also force all replication target tables to perform a new initial refresh (because the replication consistency is compromised): • Performing a COLD start, Capture ignores all information about previously captured log records. Capture will resume the change capture with the most current log record DB2 can provide. • During a COLD start, Capture removes all previously captured log records from all change data tables and the unit-of-work table. A COLD start can be considered a big cleanup task. Some production environments can tolerate a Capture COLD start with subsequently automatically performed full-refreshes. Others, especially those creating history tables by the means of data replication (Refer to Chapter 8, “Case Study 3—Feeding a Data Warehouse” on page 203 for several examples) cannot. Remark: On AS/400 the way to start Capture in COLD mode is to indicate the RESTART(*NO) parameter in the STRDPRCAP command.
Resolving the Gap Manually If a Capture COLD start cannot be tolerated, Capture has to be provided with a log sequence number from where Capture can successfully resume capturing. Warning: Using this advanced technique could cause some source transactions not to replicate! That is what you have to be aware of. In certain 104
The IBM Data Replication Solution
scenarios, this might be less of a pain than the refresh of all replication targets. If you started Capture using the WARMNS start option (which means WARM start or no start), Capture will terminate when a requested log record cannot be provided by the DB2 log interface. Capture will issue an error message into the Capture Trace table ( ASN.IBMSNAP_TRACE) and Capture will write at least one WARM start record into the WARM start table (ASN.IBMSNAP_WARM_START). The WARM start table could look like the following example: SEQ ----------------------x’00000000485E57D60000’ x’00000000485E82F60000’ x’00000000485107C80000’ x’00000000480135D20000’
AUTHTKN AUTHID CAPTURED UOWTIME ------------ ---------------- -------- ----------0 APPLY01 DB2RES5 N -1307724304 APPLY01 DB2RES5 N -1307736417 APPLY01 DB2RES5 N 0
If a restart attempt fails again with the same error message, you need to provide Capture with a different restart point (a different WARM start log sequence). Do so by following the guidelines below: 1. Delete all records from the existing WARM start table (back up the table before doing this). 2. Determine a valid log sequence number using utilities available for the DB2 platform you are using. 3. Insert one record into the WARM start table, using the following values: SEQ AUTHTKN AUTHID CAPTURED UOWTIME
the log sequence number determined in step 2 leave this column NULL leave this column NULL leave this column NULL set this column to 0 (zero)
4. Restart Capture OS/390 Remark: A valid log sequence number can be derived by first using the DSNJU004 utility to find an active log range, and then by running the DSN1LOGP utility with this given log range (or a smaller subset) as an input. The DSN1LOGP utility will show the actual log record numbers within the given range. Choose a BEGIN UR or COMMIT log record, avoid UNDO or REDO records.
Replication Operation, Monitoring and Tuning
105
Again, Capture errors while processing the DB2 log are not something you will be dealing with every day. The procedure given above could be treated as an emergency guideline, but it is best to be prepared.
5.4.4 Monitoring Apply Basically, the same monitoring issues that apply to Capture are also recommended for Apply. That means, the main monitoring tasks for Apply will also fall into the following categories: • Check if the Apply process is up and running. • Detect and solve Apply problems as soon as possible. 5.4.4.1 Monitoring Apply Processes If your Apply process is assumed to be running continuously, then checking if Apply is up and running will be a task your monitoring system has to perform on a regular basis. In contrast to Capture (which is assumed to be a continuous process), it is possible to start Apply only when replication actions are supposed to happen (for example, if Apply is scheduled to run once a day). In those cases, of course, regular monitoring is obsolete. Depending on the execution platform you are using, the facilities available to check if a program is running will be different. The following example shows how to determine if Apply is running for UNIX operating systems: #!/bin/ksh ps -ef | grep ’asnapply’ | grep -v grep | wc -l
The above command will return 1 if the Apply instance is active. If you are using several Apply processes on the same machine, all using a different Apply qualifier or a different control server, the command to check if Apply is running could even distinguish between several processes: #!/bin/ksh ps -ef | grep ’asnapply ’ | grep -v grep | wc -l
If you are running Apply on an AS/400, you can check whether it is running by issuing the WRKSBS command, choose option 8 in front of the QZSNDPR sub-system, and you should see a job having the name of the Apply Qualifier. 5.4.4.2 Monitoring the Subscription Status Apply reports the status of each subscription cycle in the subscription set table by updating the status column. As the final step of each subscription cycle, Apply inserts one record into the Apply trail table, which includes additional statistics about previous replication cycles.
106
The IBM Data Replication Solution
A look at the subscription set table (ASN.IBMSNAP_SUBS_SET), located at the replication control server, is sufficient to check whether the status of a subscription set is as expected, or whether problems prevented the most recent replication attempt. Table 2 displays all possible states of a subscription set, taking the ACTIVATE column and the STATUS column of the subscription set table into account. Table 2. Determining the Status of Subscription Sets
Activate
Status
Description
0
0
The subscription set has never been run (initial setting after defining an empty set), or the subscription set has been manually deactivated.
1
0
The subscription set is active and the subscription status is fine. Have a look at the timestamp columns provided within the subscription set table (SUBS_SET) to see if the subscription has been run.
1
-1
The execution of this subscription set ended in error the last time the subscription was processed. Refer to the Apply trail table (ASN.IBMSNAP_APPLYTRAIL) to determine the reason for the failure. Note that Apply retries failing subscription sets every 5 minutes.
0
-1
Apply reported an error when processing the set before the subscription was deactivated.
1
1
This subscription set is currently being serviced by Apply.
1
2
This subscription set is currently being serviced by Apply. Apply uses mini-cycles (see blocking factor) to service the set.
5.4.4.3 Detecting Failing Subscription Sets Considering the above explanations, subscription sets currently in error can be determined by using the following query: SELECT ACTIVATE, STATUS, APPLY_QUAL, SET_NAME, WHOS_ON_FIRST FROM ASN.IBMSNAP_SUBS_SET WHERE STATUS = -1;
Keep in mind that no data is replicated to any of the replication target tables of a set whenever a subscription set fails. If a subscription error occurs after changes were already inserted into some of the target tables, those changes will be rolled back immediately.
Replication Operation, Monitoring and Tuning
107
5.4.4.4 Monitoring Subscription Set Latency The status of a subscription set is not the only means to detect if the replication system is working properly. The next level of collecting subscription status information is to compare the timestamp of when the subscription set should have been run with the timestamp the subscription set was actually processed. The timestamp columns that we will use for this comparison are all available from the subscription set table, ASN.IBMSNAP_SUBS_SET(see Table 3). Table 3. Timestamp Information Available from the Subscription Set Table
Column Name
Meaning
LASTRUN
Control server timestamp, revealing the most recent time a subscription set was processed. Apply advances LASTRUN for successful and for unsuccessful subscription attempts.
LASTSUCCESS
Control server timestamp, revealing the most recent time a subscription set was successfully processed.
SYNCHTIME
All replication target tables belonging to a subscription set contain all changes, captured at the replication source server, that were committed before SYNCHTIME. SYNCHTIME is the timestamp associated with the log sequence number of a captured log record (the log sequence number is stored within the SYNCHPOINT column). Apply obtains SYNCHPOINT and SYNCHTIME from the REGISTER table (global record) for each subscription cycle. Apply uses the global SYNCHPOINT as upper limit when fetching data from the change data tables when it processes a subscription cycle (SYNCHTIME therefore contains the source server timestamp associated with this upper limit).
Use the following query, to compare the three subscription timestamp columns with the current timestamp to calculate the subscription lag (or subscription latency). Issue the query while connected to the replication control server. SELECT ACTIVATE, STATUS, APPLY_QUAL, SET_NAME, WHOS_ON_FIRST, SECOND(CURRENT TIMESTAMP - LASTRUN) + ((MINUTE(CURRENT TIMESTAMP) - MINUTE(LASTRUN)) * 60) + ((HOUR (CURRENT TIMESTAMP) - HOUR (LASTRUN)) * 3600) + ((DAYS (CURRENT TIMESTAMP) - DAYS (LASTRUN)) * 86400) AS SET_RUN_LAG, SECOND(CURRENT TIMESTAMP - LASTSUCCESS) + ((MINUTE(CURRENT TIMESTAMP) - MINUTE(LASTSUCCESS)) * 60) + ((HOUR (CURRENT TIMESTAMP) - HOUR (LASTSUCCESS)) * 3600) + ((DAYS (CURRENT TIMESTAMP) - DAYS (LASTSUCCESS)) * 86400)
108
The IBM Data Replication Solution
AS SET_SUCCESS_LAG, SECOND(CURRENT TIMESTAMP - SYNCHTIME) + ((MINUTE(CURRENT TIMESTAMP) - MINUTE(SYNCHTIME)) * 60) + ((HOUR (CURRENT TIMESTAMP) - HOUR (SYNCHTIME)) * 3600) + ((DAYS (CURRENT TIMESTAMP) - DAYS (SYNCHTIME)) * 86400) AS SET_LATENCY FROM ASN.IBMSNAP_SUBS_SET WHERE APPLY_QUAL = ’’ AND SET_NAME = ’<set_name>’;
Remark: Please notice that you are comparing a control server timestamp (current timestamp) with a source server timestamp (SYNCHTIME). This query could cause unexpected results, if the control server and the source server are placed within different time zones, for example. 5.4.4.5 Identifying Members for a Given Subscription Set Which members are defined for a subscription set can easily be determined by joining the subscription set and the subscription members table: SELECT A.STATUS, A.APPLY_QUAL, A.SET_NAME, A.WHOS_ON_FIRST, B.SOURCE_OWNER, B.SOURCE_TABLE, B.SOURCE_VIEW_QUAL, B.TARGET_OWNER, B.TARGET_TABLE, B.TARGET_COMPLETE, B.TARGET_CONDENSED FROM ASN.IBMSNAP_SUBS_SET A, ASN.IBMSNAP_SUBS_MEMBR B WHERE A.APPLY_QUAL = B.APPLY_QUAL AND A.SET_NAME = B.SET_NAME AND A.WHOS_ON_FIRST = B.WHOS_ON_FIRST AND A.APPLY_QUAL = ’’ AND A.SET_NAME = ’<set_name>’;
5.4.4.6 Looking for Details in the Apply Trail Table Apply records the details of each subscription cycle in the DProp control table ASN.IBMSNAP_APPLYTRAIL. The table content can be used: • To determine the reason for replication errors (if the status of a subscription set switched to -1) • To collect replication statistics, for example to check how many records are replicated during one day Use the following query example to select data from the Apply trail table: SELECT APPLY_QUAL, SET_NAME, WHOS_ON_FIRST, STATUS, LASTRUN, LASTSUCCESS, SYNCHTIME, MASS_DELETE, EFFECTIVE_MEMBERS, SET_INSERTED, SET_DELETED, SET_UPDATED, SET_REWORKED, SET_REJECTED_TRXS, SQLCODE, SUBSTR(APPERRM , 1 , 8) AS ASNMSG, APPERRM
Replication Operation, Monitoring and Tuning
109
FROM ASN.IBMSNAP_APPLYTRAIL WHERE APPLY_QUAL = ’’ AND SET_NAME = ’<set_name>’ ORDER BY LASTRUN;
Modify the statement to determine the most recent Apply trail record for a subscription set which was not successful: SELECT SQLCODE, APPERRM FROM ASN.IBMSNAP_APPLYTRAIL WHERE APPLY_QUAL = ’’ AND SET_NAME = ’<set_name>’ AND WHOS_ON_FIRST = ’<whos_on_first>’ AND STATUS = -1 AND LASTRUN = (SELECT LASTRUN FROM ASN.IBMSNAP_SUBS_SET WHERE APPLY_QUAL = ’’ AND SET_NAME = ’<set_name>’ AND WHOS_ON_FIRST = ’<whos_on_first>’);
As an additional idea, you could easily define a trigger on the Apply trail table that would always execute (and perhaps sends a message) if a failing subscription is reported into the Apply trail table ( STATUS = -1). Remark: For all known error situations, a more detailed description and possible solutions can be obtained from the DB2 Replication Guide and Reference, S95H-0999. Remark for Heterogeneous Environments: DataJoiner attempts to map all SQL codes or SQL states reported from remote non-IBM database systems into known DB2 SQL code and SQL states. If such a mapping is not possible, DataJoiner issues SQL code -1822. The SQLMSG (SQL message) contains the error message as obtained from the remote data source. DataJoiner SQLCODE -1822 SQL1822N Unexpected error code "<error code>" received from data source "". Associated text and tokens are "". Cause: While referencing a data source, DataJoiner received an unexpected error code from the data source that does not map to a DB2 equivalent.
If the Apply trail table (or the Apply trace) reveals that a problem had originally occurred at a non-IBM database system (by showing SQLCODE -1822), you have to refer to column APPERRM of the Apply trail table for more details. This column will contain the complete SQL message (at least as
110
The IBM Data Replication Solution
much as DataJoiner could get) as issued from the remote data source. Then use the techniques described in Appendix B, “Non-IBM Database Stuff” on page 325 to obtain vendor specific error information. 5.4.4.7 Utilizing Apply’s ASNDONE User Exit With IBM DProp V5, a lot of freedom and flexibility was introduced to add customized logic during or after a subscription cycle.
Apply’s Interfaces for Customized Logic The most important interfaces for customized logic that Apply offers are: • SQL statements or stored procedures (statement type "G"), executed at the replication source server, before Apply reads the register table to determine which change data table corresponds to which replication source table. • SQL statements or stored procedures (statement type "S"), executed at the replication source before Apply fetches the answer set from the change data tables (for an SQL statement example, refer to 8.4.7.2, “Maintaining a Base Aggregate Table from a Change Aggregate Subscription” on page 257). • SQL statements or stored procedures (statement type "B"), executed at the replication target server, before Apply applies the answer set to the target tables (for a stored procedure example, executed at Mircosoft SQL Server, refer to 7.2.2.3, “Invoking Stored Procedures at the Target Database” on page 185). • SQL statements or stored procedures (statement type "A"), executed at the replication target server after Apply has applied the answer set to the target tables (for an SQL statement example, refer to 8.4.8, “Pushing Down the Replication Status to Oracle” on page 259). • Last but not least, a user exit, called ASNDONE, can be called after Apply has completed a replication cycle. The C source code of a sample program is included within the Apply package. In this section, we will provide you with some guidelines on how to customize and use the sample program ASNDONE.
The ASNDONE User Exit Apply is capable of invoking a program, called ASNDONE, after it has finished a replication cycle. Apply does so, if it is started with the NOTIFY start option (if the NOTIFY start parameter is omitted, the ASNDONE program will not be invoked).
Replication Operation, Monitoring and Tuning
111
Parameters passed to ASNDONE Apply invokes the ASNDONE program with several parameters to let the ASNDONE program know which subscription it was servicing before invoking the user exit. The ASNDONE program can use the passed parameters, but does not necessarily have to. Parameters passed to the ASNDONE user exit: •Set_Name •Apply_Qualifier •Whos_On_First •Control Server •Trace Option •Status
Useful ASNDONE Logic ASNDONE is invoked by the Apply program after the subscription set processing has completed, regardless of success or failure. You can modify ASNDONE to meet the requirements of your installation. The following are some very useful examples where ASNDONE could be used: • If a subscription cycle has not completed successfully (which can be determined from the STATUS value passed to the ASNDONE program), an automated monitoring system could be notified. • If a subscription cycle has not completed successfully, an e-mail could automatically be sent to the replication operator. • Depending on the reason causing a problem, ASNDONE could deactivate the subscription set causing the problem. • In Update-Anywhere scenarios (updates to the replication target tables are replicated back to the replication source table), ASNDONE can be used to react on compensated transactions. Capture marks every transaction that was compensated at the replica site by adding a compensation code to the unit-of-work record that was captured into the unit-of-work table (and rejected transactions will only be pruned by retention limit pruning so that they are available for additional processing). ASNDONE could make use of the rejection code provided by Capture to notify users or to automatically reinsert compensated transactions.
After Changing ASNDONE After you have made changes to the sample ASNDONE C source code, you will need to recompile, link and bind the program. If the modified source will
112
The IBM Data Replication Solution
run on any platform but MVS and include SQL statements, then you must install the DB2 UDB Software Developer’s Toolkit (DB2 UDB SDK) on the system where the code is compiled. Keep in mind, that ASNDONE (and this also applies to stored procedures) is called from the Apply program. Therefore, if the user exit uses compiled SQL statements, the user exit must fulfill the following requirements: • Use CONNECT TYPE 1 only • If executed on OS/390, link with DB2 CAF • Static SQL packages must be bound against the databases/locations where the SQL will execute • If called from OS/390, the packages must be included in the Apply plan PKLIST
Are Programming Languages Other Than C Supported ? The compiled sample user exit, compiled from C source code, can be substituted with any other compiled program named ASNDONE. Even REXX user exits can be used to substitute the compiled versions. On OS/2, substitute the ASNDONE program in %DB2PATH%\bin with your REXX exec code. On Windows NT/95, a REXX exec is called by issuing " REXX execname parameters". To implement a Windows REXX ASNDONE exit, simply provide a NT batch file within %DB2PATH/bin, named ASNDONE.BAT, that invokes the REXX exec, using the above syntax: 1. Create a batch file named ASNDONE.BAT with one line: @REXX ASNDONE.REX %1 %2 %3 %4 %5 %6
2. Create a REXX exec named ASNDONE.REX with your logic. 3. Rename the provided ASNDONE sample program. 4. Place ASNDONE.BAT and ASNDONE.REX in path %DB2PATH%\bin..
REXX Sample for the ASNDONE User Exit We are going to show you a small ASNDONE program, coded in REXX, to give you an idea of how ASNDONE could be used. The easy example below just switches off (deactivates) failing subscriptions. In a more sophisticated approach, some more logic could be added to automatically fix certain problems or to notify an administrator or a monitoring system.
Replication Operation, Monitoring and Tuning
113
Remark: The feature in which subscriptions can modify some of their own attributes (like ACTIVATE in our example) was introduced within the first quarter of 1999. Be sure to install an Apply release that supports this feature if you want to use the following example. We tested this feature using Apply for NT, build 0074 (you can check the Apply build by invoking Apply with option TRCFLOW—the build number will be given in the trace output). /**********************************************************************/ /* */ /* ASNDONE SAMPLE REXX EXEC */ /* */ /* This sample program just deactivates a subscription set after a */ /* failing subscription cycle (it is just an example, feel free to */ /* include more useful logic ...). */ /* */ /* The parameters passed to ASNDONE are as follows: */ /* -----------------------------------------------*/ /* - set name */ /* - apply qualifier */ /* - whos_on_first value */ /* - control server */ /* - trace option */ /* - status value */ /* */ /**********************************************************************/ /* get parameters passed */ PARSE ARG SET_NAME APPLY_QUAL WHOS_ON_FIRST CNTL_SERVER TRACEON STATUS; /* init return code RC = 0;
*/
/* print parameter info if trace is on */ if traceon = "yes" then do say ’ DONE: The following parameters were passed by APPLY ...’; say ’ DONE: APPLY_QUAL = ’ apply_qual; say ’ DONE: SET_NAME = ’ set_name; say ’ DONE: WHOS_ON_FIRST = ’ whos_on_first; say ’ DONE: CNTL_SERVER = ’ cntl_server; say ’ DONE: STATUS = ’ status; end; if status = 0 then SIGNAL GO_EXIT
/*******************************/ /* Load Rexx DB2 functions */ /*******************************/ STMT = ’LOADING DB2 REXX FUNCTION’ if Rxfuncquery(’sqlexec’) \= 0 then do rcy = RxFuncAdd(’SQLEXEC’, ’DB2AR’, ’SQLEXEC’); if rcy \= 0 then do if TRACEON = ’yes’ then say ’ DONE:’ STMT ’ FAILED’; RC = RCY; SIGNAL GO_EXIT; end;
114
The IBM Data Replication Solution
if TRACEON=’yes’ then say ’ end;
DONE:’ STMT ’ SUCCESSFUL’
/*************************/ /* CONNECT TO CNTLSERVER */ /*************************/ STMT = ’CONNECT TO’ CNTL_SERVER; call SQLEXEC ’CONNECT TO’ CNTL_SERVER; if SQLCA.SQLCODE \= 0 then SIGNAL SQL_ERROR; if TRACEON = ’yes’ then say ’ DONE:’ STMT ’ SUCCESSFUL’;
/*---------------------------------------------------------------------*/ /* INVESTIGATE ON THE REASON FOR THE PROBLEM */ /*---------------------------------------------------------------------*/ /* Use the following query to determine the reason why the /* subscription was failing ...
*/ */
/* trail_stmt = "SELECT SQLCODE, APPERRM FROM ASN.IBMSNAP_APPLYTRAIL", /* "WHERE APPLY_QUAL = ’"apply_qual"’", /* "AND SET_NAME = ’"set_name"’", /* "AND WHOS_ON_FIRST = ’"whos_on_first"’", /* "AND STATUS = -1", /* "AND LASTRUN = (SELECT LASTRUN", /* "FROM ASN.IBMSNAP_SUBS_SET", /* "WHERE APPLY_QUAL = ’"apply_qual"’", /* "AND SET_NAME = ’"set_name"’", /* "AND WHOS_ON_FIRST = ’"whos_on_first"’)"
*/ */ */ */ */ */ */ */ */ */
/*---------------------------------------------------------------------*/ /* TRY TO AUTOMATICALLY FIX THE PROBLEM */ /*---------------------------------------------------------------------*/ /* ... */ /*---------------------------------------------------------------------*/ /* DEACTIVATE SUBSCRIPTION, IF PROBLEM CANNOT BE FIXED */ /*---------------------------------------------------------------------*/ if status = -1 then do sql_stmt = "UPDATE ASN.IBMSNAP_SUBS_SET", " SET ACTIVATE = 0", " WHERE SET_NAME = ’"set_name"’", " AND APPLY_QUAL = ’"apply_qual"’", " AND WHOS_ON_FIRST = ’"whos_on_first"’"; /*********************/ /* EXECUTE IMMEDIATE */ /*********************/ STMT = ’UPDATE ASN.IBMSNAP_SUBS_SET SET ACTIVATE = 0 for’, apply_qual’/’set_name’/’whos_on_first; call SQLEXEC ’EXECUTE IMMEDIATE :sql_stmt’; if SQLCA.SQLCODE \= 0 then
Replication Operation, Monitoring and Tuning
115
SIGNAL SQL_ERROR; call SQLEXEC ’COMMIT’; if TRACEON = ’yes’ then say ’ DONE:’ STMT ’ SUCCESSFUL’; end; /*---------------------------------------------------------------------*/ /* SEND AN EMAIL TO THE REPLICATION OPERATOR */ /*---------------------------------------------------------------------*/ /* ... */ SIGNAL GO_EXIT /* END OF PROGRAM LOGIC */
/*********************/ /* SQL ERROR HANDLER */ /*********************/ SQL_ERROR: if TRACEON = ’yes’ then do SAY ’ DONE:’ STMT ’ FAILED’ SAY ’ DONE: WITH A SQLCODE OF ’ SQLCA.SQLCODE SAY ’ DONE: SQLERRMC = ’ SQLCA.SQLERRMC SAY ’ DONE: SQLSTATE = ’ SQLCA.SQLSTATE end RC = SQLCA.SQLCODE go_exit: return RC
Start the Apply instance with the TRCFLOW start option to find all trace messages issued by ASNDONE in Apply’s trace output.
ASNDONE on the AS/400 If you are using Apply on an AS/400, you can have as many ASNDONE programs as you like, and you can even change the name of the ASNDONE program. In fact, when you start one Apply instance using the STRDPRAPY command, you can indicate the name and the library of the user exit.
5.4.5 Monitoring the Database Middleware Server Once started, the most important issue while monitoring DataJoiner is to check if the main DataJoiner engine is running. Basically, all monitoring options that apply to DB2 Version 2.1.2 are also available for DataJoiner.
116
The IBM Data Replication Solution
5.4.5.1 Monitoring DataJoiner Processes Depending upon whether you are running the DataJoiner gateway on Windows NT or on a UNIX platform, the facilities available to check if the DataJoiner engine is running will be different. Considering DataJoiner for UNIX operating systems, the DataJoiner process model consists of several autonomous subcomponents. The most important process is the process of the DataJoiner engine, called db2sysc. The following example shows how to determine if the DataJoiner engine is running for UNIX operating systems: #!/bin/ksh ps -ef | grep | grep db2sysc | grep -v grep | wc -l
The above command will return a value greater or equal to 1 if the DataJoiner instance is active.
5.5 Tuning Replication Performance Basically, apart from some techniques that are unique to data replication with IBM DProp, we can consider Capture, Apply, and also the heterogeneous change capture triggers, to be database applications or database tasks. So, most of the performance techniques introduced within this chapter are common database tuning techniques, applied to change data tables, database logs, or static and dynamic SQL. The following dedicated DProp tuning techniques will be introduced within this section: • Running Capture with the appropriate priority • Adjusting the Capture tuning parameters • Using separate tablespaces • Choosing appropriate lock rules • Using the proposed change data indexes • Updating database statistics • Making use of subscription sets • Using pull rather than push replication • Using multiple Apply processes in parallel • Using high performance full refresh techniques • Using memory rather than disk for the spill file
Replication Operation, Monitoring and Tuning
117
• Enabling block fetch for Apply • Tuning pruning • Optimizing network performance Other useful performance recommendations can be found in the DataPropagator Relational Performance Measurement Series , available on the World Wide Web (http://www.software.ibm.com/data/db2/performance/dprperf.htm , subsection "library").
5.5.1 Running Capture with the Appropriate Priority Although the Capture process is the most critical task within a replication system, it does not have to be run as a high priority task. Depending on the latency you are planning to allow for the replication target tables, adjust the priority of the Capture process accordingly. For example, if application performance in the source system is more important than replicating the changes as soon as possible, just lower the priority of the Capture process (on those platforms where such scheduling is possible). During periods of high system load (for example, during heavy batch processing), Capture will simply lag behind. During periods with a lower system load, Capture will catch up again. Remark: The above scheduling recommendation is only applicable to DB2 replication source systems using DProp Capture, and not for non-IBM replication sources, where change capture is established by synchronous database triggers.
5.5.2 Adjusting Capture Tuning Parameters Four tuning parameters are available to customize the change capture process (for DB2 replication sources). The tuning parameters are stored within the DProp control table ASN.IBMSNAP_CCPPARMS. This table contains only one row. The tuning parameters are: • Capture Commit Interval • Capture Lag Limit • Capture Pruning Interval • Capture Retention Limit Tuning parameter adjustments take effect after recycling (stopping/starting) Capture or re-initializing Capture using the REINIT command.
118
The IBM Data Replication Solution
5.5.2.1 Capture Commit Interval Capture continuously monitors the DB2 log. When Capture reads a change for one of the registered replication source tables, it inserts a row into a change data table (but Capture does not commit every insert at once). The Capture commit interval determines how often Capture commits while capturing changes to the change data tables. The default for the Capture commit interval is 30 seconds. A higher commit interval reduces the cost of change capture, but also might increase the latency of very frequently running subscriptions (for example, for continuously running subscriptions). The commit interval is specified in seconds. Recommendation: On platforms where dynamic SQL caching is available, the low threshold is 10 seconds. Under 10 seconds, the commit overhead impacts throughput. Where dynamic SQL caching is not available, the low threshold is 20 seconds. 5.5.2.2 Capture Lag Limit The Capture lag limit is used to prevent Capture from acquiring very old DB2 archived log datasets (files). For production environments, the lag limit will probably never be reached. But consider test systems, that are from time to time used to check out new replication techniques. Capture might have been stopped for some time (say weeks). When it is re-started with the WARM start option (which is the default), Capture would request all DB2 log datasets from the time is was stopped. If those datasets are still available, they would be mounted. You probably do not want this to happen. The general advice is to start Capture in COLD mode in test environments, if Capture has not been running for a while. If it is accidentally started in WARM mode, the lag limit will let Capture switch to a COLD start (or stop, if WARMNS is used), if the log records that Capture would require to perform a WARM start are older than the lag limit. The lag limit parameter is specified in minutes. 5.5.2.3 Capture Pruning Interval Pruning is an expensive process, and therefore control over that process should be in your hands. The general advice is to start Capture specifying the NOPRUNE start option and to launch pruning (by issuing the Capture PRUNE command) when system resources are available. Refer to 5.3.2, “Pruning” on page 91 for the full story about pruning.
Replication Operation, Monitoring and Tuning
119
If you choose not to follow this advice and you are starting Capture without the NOPRUNE option, pruning will automatically interrupt the change capture activity on a regular basis. To control how often this will happen, make use of the pruning interval parameter. We strongly recommend a value higher than the default of 300 seconds. The pruning interval parameter is specified in seconds. 5.5.2.4 Capture Retention Limit The most infrequently replicating subscription determines which records can be pruned from the change data tables and which must stay there. The retention limit prevents change data records from staying there forever, if one of the replication targets no longer connects to the replication source system. Remark: Retention limit pruning can destroy replication consistency for those subscriptions that did not connect for a long time. Those subscriptions will automatically do a full-refresh when starting the next subscription cycle. The retention limit is used during pruning, to prune all transactions from the change data tables that are older than CURRENT TIMESTAMP - RETENTION_LIMIT minutes. Replication environments with very infrequently connected mobile systems will probably need a longer retention limit than systems with frequently connecting Apply processes. The retention limit parameter is specified in minutes.
5.5.3 Using Separate Tablespaces The following recommendations apply to tablespaces containing DProp control tables or change data tables for DB2 for OS/390 replication source servers: • Use one single tablespace for each of the change data tables. These tables might become quite big and may need some special housekeeping treatment. • Use one single tablespace for the unit-of-work table. This table also might become quite big and need some special housekeeping treatment. Additionally, you will avoid locking contention problems on the UOW and REGISTER table, if those tables are stored in separate tablespaces. • It is OK to put all other replication control tables into one tablespace. Remark (DB2 UDB for Intel and UNIX): Excellent performance can be achieved by placing multiple change data tables into one single tablespace (for example, using disk striping accross multiple disks for that tablespace).
120
The IBM Data Replication Solution
Therefore, the above advice to use separate tablespaces does not necessarily apply to DB2 on Intel and RISC platforms.
5.5.4 Choosing Appropriate Lock Rules When using DB2 for OS/390 as a replication source server, we recommend setting the LOCKSIZE for the UOW tablespace and every CD tablespace to TABLE. A lower granularity of locks will create unnecessary resource utilization and overhead. Since DProp V5 introduced isolation level uncommitted read (UR), Capture holds an exclusive lock on the changed data tables it is using, but Apply is nevertheless able to read previously committed data using the new isolation level. (Capture tells Apply up to which log sequence number the changes have been committed.) The LOCKSIZE of the tablespace containing the replication control tables (such as the register table) should be defined with a higher granularity (for example LOCKSIZE ROW), if multiple Apply processes are accessing the control tables in parallel.
5.5.5 Using the Proposed Change Data Indexes Be sure to create the index generated for the change data table as a UNIQUE INDEX (do not customize this index attribute). To guarantee optimal performance, the one and only change data table index should look like the following example (DB2 for OS/390 syntax): CREATE TYPE 2 UNIQUE INDEX ON . (IBMSNAP_UOWID ASC, IBMSNAP_INTENTSEQ ASC) USING STOGROUP <stogroup> PRIQTY SECQTY <mm> FREEPAGE 0 PCTFREE 10;
The unit-of-work table index should look like the following example (DB2 for OS/390 syntax): CREATE TYPE 2 UNIQUE INDEX ON ASN.IBMSNAP_UOW (IBMSNAP_COMMITSEQ ASC, IBMSNAP_UOWID ASC, IBMSNAP_LOGMARKER ASC) USING STOGROUP <stogroup> PRIQTY SECQTY <mm> FREEPAGE 0 PCTFREE 0 ;
OS/390 Remark: Make sure that all indexes on change data tables, the unit-of-work table and all other replication control tables are defined as TYPE 2 indexes. DB2 ignores TYPE 1 indexes when using isolation UR. Note for all DPRTools V1 users and all DJRA early users: Make sure that there is only 1 unique CD index and one unique UOW index after migrating to
Replication Operation, Monitoring and Tuning
121
DProp V5. Previously proposed indexes are still supported, but have a negative performance impact. AS/400 Remark: In DPropR/400 V1, the indexes were different from those of the other platforms. With DPropR/400 V5, use the same indexes as those described above (except the TYPE 2, FREEPAGE and PCTFREE parameters that do not exist on AS/400).
5.5.6 Updating Database Statistics The change data tables contents and the unit-of-work table contents will vary in size from the initial zero rows to the maximum size right before Capture prunes the change data tables. This means that the timing of RUNSTATS is critical. RUNSTATS must be run at a time when the change data tables contain sufficient data so that the carefully chosen indexes on change data tables and on the unit-of-work table will be used by Apply and Capture. It is not necessary to update the statistics again, once the catalog tables show that there is an advantage to using the indexes. The SQL against the changed data tables is dynamic, using parameter marker values, and therefore default filter factors will be used. The cardinality of the tables will affect the default filter factor values, but the fact that the high and low values are old will not have any effect. Rebind the Capture and Apply packages after the RUNSTATS has been performed, so that the static SQL contained in these packages can benefit from the updated statistics. Remark: There is no equivalent to the RUNSTATS command on the AS/400.
5.5.7 Making Use of Subscription Sets With DProp V5, subscription sets were introduced in order to group multiple subscription members together. Subscription sets have several advantages compared to single subscriptions (subscription sets containing only one table). For example, all target tables refreshed within the same subscription set are transactionally consistent, meaning, all the members of a subscription set contain all the changes captured at the replication source database up to the same log sequence number (up to the same source site transactions).
122
The IBM Data Replication Solution
Additionally, subscription sets provide an immense performance advantage compared to replicating every table separately. To show this, let us take a closer look at what happens when Apply services a subscription set. To keep things simple, we will consider read-only target tables (no update-anywhere). Figure 20 shows that Apply performs several single tasks during a subscription cycle. Some tasks are executed at the replication control server, others are executed at the replication source server, and finally others at the replication target server. Of course, Apply needs database connections in order to perform the tasks at the different databases.
2. Pick up Recent Changes 5. Report Subscription Progress for Pruning
Source Server
Target Server
3. Apply foreign Changes to the Target Tables
APPLY 1. Look for Work 4. Update Subscription Status
Control Server
By never acquiring locks in two databases at the same time, APPLY avoids the possibility of participating in a distributed deadlock; a nasty possibility when working with distributed databases.
Figure 20. Apply Cycle at a Glance
Looking at Figure 20, we can identify at least the following Apply tasks, which execute in the following sequence: 1. Control Server:
Look for work and determine subscription set details
2. Source Server:
Fetch changes from change data tables into the spill file
3. Target Server:
Apply changes from the spill file to target tables
4. Control Server:
Update subscription statistics
5. Source Server:
Advance pruning control synchpoint to enable pruning
All these tasks need database connections, and are executed at subscription set level. The number of database connections (established one at a time) that are needed to replicate a given number of tables can be dramatically reduced by
Replication Operation, Monitoring and Tuning
123
grouping the target tables together in subscription sets. Table 4, showing alternatives for replicating 100 tables, stresses this observation: Table 4. Number of Connections Needed to Fulfill Replication Task
Number of Subscription Sets
Number of Connections
100 Sets / 1 Member per Set
500 Connections
50 Sets / 2 Members per Set
250 Connections
10 Sets / 10 Members per Set
50 Connections
2 Sets / 50 Members per Set
10 Connections
1 Set / 100 Members in the Set
5 Connections
To see the performance boost that can be achieved by grouping subscription members together, no further comment is required! The only impact of having big subscription sets is that the transactions needed to replicate data into the target tables can become quite large (all changes within one subscription set are applied within one transaction). Be sure to allocate enough log space and enough space for the spill file. To prevent database log and spill file overflow, DProp offers another technique to keep target transactions small. To use this technique, you have to add a blocking factor (also referred to as the MAX_SYNCH_MINUTES feature) to the subscription set. This also guarantees transaction consistency at set level, but lets Apply replicate changes in multiple smaller mini-cycles rather than in one big transaction. Refer to 3.3.3, “Using Blocking Factor” on page 54 for the details. As you see, we are dealing with a classic trade-off situation here. We generally recommend that multiple subscription members be grouped into one subscription set. On the other hand, we also recommend use of the blocking factor, to set some kind of upper transaction boundary. Recommendation: Customize your system to be generally capable of replicating one subscription cycle within one target site transaction. Choose a blocking factor that takes effect in extraordinary situations, for example, during days with extremely high change rates, or after Apply comes up for the first time after a long maintenance window.
124
The IBM Data Replication Solution
5.5.8 Using Pull Rather Than Push Replication In Chapter 3, “System and Replication Design—Architecture”, 3.2.1, “Apply Program Placement: Pull or Push” on page 39, we explained the differences between setting up Apply in what is called push mode and pull mode. As a reminder, pull means that DProp Apply is running at the target server, fetching data from the replication source server, usually over a network, and inserting all the fetched changes into the target tables locally. In push mode, DProp Apply is running at a site other than the target server (probably at the source server), inserting all the changes into the target tables remotely over the network. From a performance perspective, it is better to design and configure your replication system so that it uses pull mode, because Apply will be able to make use of DB2/DRDA block fetch in these cases. Selecting data from a remote site over a network using block fetch capability is much (!) faster than inserting data over a network (without the possibility of blocking multiple inserts together).
5.5.9 Using Multiple Apply Processes in Parallel With DProp V5, the terms Apply qualifier and subscription set were introduced: • One Apply Instance is started for each Apply qualifier. • Multiple subscription sets can be defined for each Apply qualifier. When Apply is started for one Apply qualifier, it immediately calculates, based on the subscription timing that you defined, if subscription sets need to be serviced. If several subscription sets are awaiting replication, Apply always services the most overdue one first. That means, a single Apply process always services subscription sets sequentially. If you want to have subscription sets serviced in parallel, choose to have multiple Apply qualifiers. For each Apply qualifier, a single Apply process can be started subsequently. Multiple Apply processes obviously can be advantageous from a performance perspective, because the work is done in parallel.
5.5.10 Using High Performance Full Refresh Techniques To speed up the initial full refresh process for huge source tables, use one of the following advanced techniques:
Replication Operation, Monitoring and Tuning
125
1. Customize and use Apply’s ASNLOAD user exit to speed up the automatic full refresh using EXPORT and LOAD. 2. Initialize your replication target tables manually. • Disable full refresh from your source tables. Refer to 5.6.2, “Selectively Preventing Automatic Full Refreshes” on page 129 for the details. • Use the DBA unload and load utilities you are familiar with and that deliver best performance. Refer to 5.2.2.3, “Manual Refresh / Off-line Load” on page 89 for the full details about performing initial refreshes manually.
5.5.11 Using Memory Rather Than Disk for the Spill File When using Apply for MVS, Apply provides an option to create the spill file in memory rather than on disk. There is an obvious advantage in using memory for the spill file rather than using disk storage (refer to 5.5.7, “Making Use of Subscription Sets” on page 122 to see when and where the spill file is created). If your replication cycles are short, the amount of data to be replicated may be appropriate for creating spill files in memory.
5.5.12 Enabling Block Fetch for Apply In order to enable the performance boost that we want to achieve using a replication pull design, we have to ensure that Apply really does use block fetch when selecting data from a remote source server. Whether Apply will actually use DB2/DRDA block fetch depends on the bind options that were used when binding the Apply packages against the replication source server, either DB2 or DataJoiner. For details about binding Apply, refer to the general implementation checklist, “Step 22—Bind DProp Apply” on page 64. 5.5.12.1 UNIX and Intel Platforms Specify the bind option BLOCKING ALL when binding Apply for Intel or UNIX platforms against the remote replication source server. Refer to the DProp installation documentation (DB2 Replication Guide and Reference, S95H-0999) for further details. 5.5.12.2 OS/390 Specify the bind option CURRENTDATA(NO) when binding the packages of Apply for OS/390 against the remote replication source server.
126
The IBM Data Replication Solution
Remark: Be aware that the default for the CURRENTDATA bind option changed from DB2 Version 4 to Version 5. With DB2 for OS/390 Version 5, CURRENTDATA(YES) was introduced as default bind option (until DB2 Version 4, CURRENTDATA(NO) was the default). To enable block fetch for DB2 for OS/390, it is necessary to add the CURRENTDATA(NO) bind parameter to Apply’s BIND job, if not already present. 5.5.12.3 AS/400 Nothing special needs to be done for the AS/400.
5.5.13 Tuning Pruning Without going into the details, pruning can be assumed to be quite a CPU-consuming process. On the other hand, change data tables and the unit-of-work deferring pruning to off-peak hours. For for details about pruning, please refer to 5.3.2, “Pruning” on page 91. 5.5.13.1 Start DProp Capture With the NOPRUNE Start Option Starting Capture with the NOPRUNE start option will dramatically reduce Capture’s CPU utilization, because no pruning will actually happen until Capture is requested to initiate the pruning operation. To request pruning on demand, DProp offers the possibility to send Capture a pruning command. Refer to the DB2 Replication Guide and Reference, S95H-0999 for the syntax of Capture commands depending on the operating system platform you are using. 5.5.13.2 How to Defer Pruning for Multi-Vendor Sources For all non-IBM replication source platforms, pruning is initiated by a trigger, defined on the pruning control table. The trigger executes every time the pruning control table is updated by Apply after successfully performing a replication cycle. Especially for those systems using short Apply cycles, this could be a costly activity. As you can imagine, there is no PRUNE command available for the trigger-based solution. Nonetheless, you may also want to defer the pruning action to off-peak hours or at least have the pruning occur less frequently. The solution here is to disable the pruning trigger during peak hours and to enable it when appropriate. Some of the supported multi-vendor database systems provide the option to simply deactivate triggers. We are showing two examples here:
Replication Operation, Monitoring and Tuning
127
Oracle Syntax Example: -- temporarily disable pruning control trigger ALTER TRIGGER <schema>.PRUNCNTL_TRIGGER DISABLE; -- enable pruning control trigger ALTER TRIGGER <schema>.PRUNCNTL_TRIGGER ENABLE;
Informix Syntax Example: -- temporarily disable pruning control trigger SET TRIGGERS <schema>.PRUNCNTL_TRIGGER DISABLED; -- enable pruning control trigger SET TRIGGERS <schema>.PRUNCNTL_TRIGGER ENABLED;
For all other database systems, check the documentation for the database system you are using as replication source to check if triggers can be temporarily disabled. If disabling of triggers is not supported, use the DROP TRIGGER and CREATE TRIGGER statements instead: -- temporarily drop pruning control trigger DROP TRIGGER <schema>.PRUNCNTL_TRIGGER; -- recreate pruning control trigger CREATE TRIGGER <schema>.PRUNCNTL_TRIGGER ...
Important: Copy the DDL to create the pruning control trigger from the SQL script generated by DJRA. Be sure to copy the CREATE TRIGGER statement from the SQL script of the source registration that you created last, because the trigger body of the pruning control trigger changes with every registered source table.
5.5.14 Optimizing Network Performance Well, you might already be wondering why we are not focusing on network performance in the distributed, cross-platform world of data replication. Right, network setup and performance could easily fill another book. Please refer to existing SNA, TCP/IP, DRDA, and multi-vendor literature.
5.5.15 DB2 for OS/390 Data Sharing Remarks For remarks and very valuable guidelines on how to tune IBM DProp in a DB2 DataSharing environment, please refer to the DataPropagator Relational Performance Measurement Series available on the World Wide Web (http://www.software.ibm.com/data/db2/performance/dprperf.htm ).
128
The IBM Data Replication Solution
5.6 Other Useful Techniques During the rest of this chapter we will introduce some techniques that again make use of the open interface that DProp provides by storing all control information in DB2 or DataJoiner database tables.
5.6.1 Deactivating Subscription Sets While testing different replication setup alternatives, but also in production environments (for example, in case of certain error situations), it could be advisable to temporarily deactivate subscription sets. To achieve this, connect to the replication control server and issue the following statement: UPDATE ASN.IBMSNAP_SUBS_SET SET ACTIVATE = 0 WHERE APPLY_QUAL = ’’ AND SET_NAME = ’<set_name>’ AND WHOS_ON_FIRST = ’<whos_on_first>’;
To reactivate disabled subscription sets, just reset the ACTIVATE column to 1 again. UPDATE ASN.IBMSNAP_SUBS_SET SET ACTIVATE = 1 WHERE APPLY_QUAL = ’’ AND SET_NAME = ’<set_name>’ AND WHOS_ON_FIRST = ’<whos_on_first>’;
5.6.2 Selectively Preventing Automatic Full Refreshes As we have already discussed in 5.2.2, “Initialization of Replication Subscriptions” on page 86, a full refresh for all tables of a subscription set could be an expensive task. Additionally, consider non-condensed target tables (histories, for example), which would be destroyed if the history were to be replaced with a copy of the source table at a certain point in time. To gain control, DProp allows you to disable any automatic full refresh for certain source tables. 5.6.2.1 Disable Full Refresh for All Subscriptions Full refresh is always disabled (or enabled) at the replication source server. That means, you can allow full refresh for all replication subscriptions replicating from a certain source table or for none. Use the following SQL statement to disable any automatic full refresh for a certain source table. Issue the statement while you are connected to the
Replication Operation, Monitoring and Tuning
129
replication source server. The statement sets the DISABLE_REFRESH column of the register table to 1: UPDATE SET WHERE AND
ASN.IBMSNAP_REGISTER DISABLE_REFRESH = 1 SOURCE_OWNER = ’<source_owner>’ SOURCE_TABLE = ’<source_table>’;
Use the following statement to enable automatic full refreshes for a replication source table: UPDATE SET WHERE AND
ASN.IBMSNAP_REGISTER DISABLE_REFRESH = 0 SOURCE_OWNER = ’<source_owner>’ SOURCE_TABLE = ’<source_table>’;
5.6.2.2 Allow Full Refresh for Certain Subscriptions Imagine several subscriptions replicating from the same source table. Some might copy the complete huge tabl; others might be restricted to copy small segments only by using restrictive subscription predicates. Using the technique described above, full refresh can only be disabled (or enabled) for all the subscriptions that use the source table, because the disable refresh attribute is set at the replication source table level. Use the following technique to generally disable full refresh from a replication source, but to open the door for certain subscriptions only. We will make use of Apply’s capability to issue SQL statements while performing a replication cycle.
Using SQL Before Statements Two different types of SQL Before statements are available to execute at the replication source server: • Statement type G: The statement is executed before Apply reads the register table (ASN.IBMSNAP_REGISTER). • Statement type S: The statement is executed after Apply has read the register table and before Apply prepares the cursors to read data from the change data tables (or source tables when performing the initial refresh, respectively). The only thing that Apply does between executing SQL Before statements of type G and type S is reading the register table. Therefore, this time window, and the chance that other subscriptions (which should not perform the refresh automatically) are reading the register table in parallel, is more than acceptably small.
130
The IBM Data Replication Solution
You are already anticipating what we are going to do? Right, we will use the SQL Before statements (that execute on set level) to let Apply update the DISABLE_REFRESH column of the register table to 0, which means automatic refreshes are temporarily enabled, and we will let Apply reset the same DISABLE_REFRESH value to 1 again right after it has read through the register table.
Add a Statement to Your Set to Enable Full Refresh Add an SQL Before statement similar to the following example to a subscription set to temporarily enable full refresh from replication source tables. Because SQL Before statements execute at set level, the statements switching the DISABLE_REFRESH value must include all the source tables that correspond to the subscription set members. Note: The statement enabling full refresh has to be of statement type ’G’. -- TEMPORARILTY ENABLE FULL REFRESH -- Statement to temporarily enable full refresh for all members -- of the subscription set - BEFORE_OR_AFTER = G INSERT INTO ASN.IBMSNAP_SUBS_STMTS(APPLY_QUAL,SET_NAME,WHOS_ON_FIRST, BEFORE_OR_AFTER,STMT_NUMBER,EI_OR_CALL,SQL_STMT,ACCEPT_SQLSTATES) VALUES(’’,’<set_name>’,’<whos_on_first>’,’G’, 1 ,’E’, ’UPDATE ASN.IBMSNAP_REGISTER SET DISABLE_REFRESH=0 WHERE (SOURCE_OWNER=’’<source_owner_1>’’ AND SOURCE_TABLE=’’<source_table_1>’’) OR (SOURCE_OWNER=’’<source_owner_n>’’ AND SOURCE_TABLE=’’<source_table_n>’’)’ ,’0000002000’); -- increment the AUX_STMTS counter in IBMSNAP_SUBS_SET UPDATE ASN.IBMSNAP_SUBS_SET SET AUX_STMTS = AUX_STMTS + 1 WHERE APPLY_QUAL = ’’ AND SET_NAME = ’<set_name>’ AND WHOS_ON_FIRST = ’<whos_on_first>’;
Add a Statement to Your Set to Disable Full Refresh Again Add an SQL Before statement similar to the following example to the subscription set to reset DISABLE_REFRESH to 1 again. Include the same source tables in the where-clause of the statement as the ones you specified for the first statement. Note: The statement disabling full refresh again has to be of statement type ’S’. -- DISABLE FULL REFRESH AGAIN -- Statement to disable full refresh again for all members
Replication Operation, Monitoring and Tuning
131
-- of the subscription set - BEFORE_OR_AFTER = S INSERT INTO ASN.IBMSNAP_SUBS_STMTS(APPLY_QUAL,SET_NAME,WHOS_ON_FIRST, BEFORE_OR_AFTER,STMT_NUMBER,EI_OR_CALL,SQL_STMT,ACCEPT_SQLSTATES) VALUES(’’,’<set_name>’,’<whos_on_first>’,’S’, 2 ,’E’, ’UPDATE ASN.IBMSNAP_REGISTER SET DISABLE_REFRESH=1 WHERE (SOURCE_OWNER=’’<source_owner_1>’’ AND SOURCE_TABLE=’’<source_table_1>’’) OR (SOURCE_OWNER=’’<source_owner_n>’’ AND SOURCE_TABLE=’’<source_table_n>’’)’ ,’0000002000’); -- increment the AUX_STMTS counter in IBMSNAP_SUBS_SET UPDATE ASN.IBMSNAP_SUBS_SET SET AUX_STMTS = AUX_STMTS + 1 WHERE APPLY_QUAL = ’’ AND SET_NAME = ’<set_name>’ AND WHOS_ON_FIRST = ’<whos_on_first>’;
5.6.3 Full Refresh on Demand Apply will automatically perform an initial refresh for a subscription set, in the following cases: • If such a refresh has never occurred before • If Apply has detected a gap for at least one of the members of the subscription set. If you, for whatever reason, want to persuade Apply to perform a full refresh the next time it processes the set, the following three techniques are available. Please notice the different scopes of each technique. Select the technique that is most suitable for your needs. 5.6.3.1 Forcing a Full Refresh for a Certain Subscription Set The following very simple statement can be used to force a full refresh for a single subscription set. Connect to the replication control server in order to execute the statement: UPDATE ASN.IBMSNAP_SUBS_SET SET LASTSUCCESS = NULL, SYNCHPOINT = NULL, SYNCHTIME = NULL WHERE APPLY_QUAL = ’’ AND SET_NAME = ’<set_name>’ AND WHOS_ON_FIRST = ’<whos_on_first>’;
The statement resets certain columns of the subscription set table to their initial values.
132
The IBM Data Replication Solution
5.6.3.2 Forcing a Refresh for All Sets Reading From a Source Table The following statement can be used to force a full refresh for all subscription sets reading from a certain replication source table. Connect to the replication source server in order to execute the statement: UPDATE ASN.IBMSNAP_PRUNCNTL SET SYNCHPOINT = NULL , SYNCHTIME = NULL WHERE SOURCE_OWNER = ’<source_owner>’ AND SOURCE_TABLE = ’<source_table>’;
The statement will reset the SYNCHPOINT and SYNCHTIME columns, for all subscriptions replicating from the source table, to NULL. It has the same effect as a Capture COLD start, but limited to only one replication source table. Advice: Doing so could cause a lot of network traffic. Also, replication targets maintaining histories might lose data. Think twice! 5.6.3.3 Forcing a Refresh for All Sets Reading from a Source Server Start Capture in COLD mode. This is the ’brute force’ method. We strongly recommend never to COLD start Capture within a production environment. Capture performs an overall cleanup when starting in COLD mode. For example, Capture removes all the records from all the change data tables. Refer to the DB2 Replication Guide and Reference, S95H-0999 for more details about Capture COLD starts.
5.6.4 Dropping Unnecessary Capture Triggers for Non-IBM Sources Change capture triggers are always automatically generated for the three possible operations (insert, update, delete). Therefore, the definition of a non-IBM table as a replication source will always result in the creation of three change capture triggers: • 1 trigger for INSERT • 1 trigger for UPDATE • 1 trigger for DELETE There may be replication requirements—perhaps data warehouse requirements—that make it necessary to exclude DELETES, for example, from replication. For DB2 replication definitions, we do this by defining a subscription predicate like IBMSNAP_OPERATION IN (’I’, ’U’), because DProp Capture always captures all changes to a DB2 table into the change data table.
Replication Operation, Monitoring and Tuning
133
For non-IBM replication sources, we could just drop (or disable) the DELETE trigger to achieve the same result. As an additional advantage, the workload of regularly executed delete jobs would be reduced, because the deletes would not be unnecessarily captured by the triggers. Refer to Chapter 8, “Case Study 3—Feeding a Data Warehouse”, especially to 8.4.5.2, “Subscribe to the Sales Table” on page 248 for a useful business example on how to limit the data to be replicated to INSERTS only.
5.6.5 Modifying Triggers for Non-IBM Sources Modify triggers to include more logic, if required. For example, you can modify the triggers to capture certain changes only. Make use of what the non-IBM database delivers.
5.6.6 Changing Apply Qualifier or Set Name for a Subscription Set If you want to add already existing subscription sets to another Apply Qualifier, or if you simply want to rename a subscription, you can either: • Drop and re-create the subscription set (and initialize it again with a full refresh). • Simply change the Apply Qualifier and the subscription set name without dropping and redefining all subscription members, and without the need to initialize existing target tables. To change either the Apply Qualifier or the subscription set name, follow the procedure below: 1. Stop the Apply process servicing the Apply Qualifier that you want to change. 2. Update all tables at the control server to change the Apply Qualifier and set name. -- Change APPLY_QUAL / SET_NAME within the Subscription Set Table UPDATE ASN.IBMSNAP_SUBS_SET SET APPLY_QUAL = ’’, SET_NAME = ’’ WHERE APPLY_QUAL = ’’ AND SET_NAME = ’<set_name>’ AND WHOS_ON_FIRST = ’<whos_on_first>’; -- Change APPLY_QUAL / SET_NAME within the Subscription Member Table UPDATE ASN.IBMSNAP_SUBS_MEMBR SET APPLY_QUAL = ’’, SET_NAME = ’’
134
The IBM Data Replication Solution
WHERE APPLY_QUAL = ’’ AND SET_NAME = ’<set_name>’ AND WHOS_ON_FIRST = ’<whos_on_first>’; -- Change APPLY_QUAL / SET_NAME within the Subscription Columns Table UPDATE ASN.IBMSNAP_SUBS_COLS SET APPLY_QUAL = ’’, SET_NAME = ’’ WHERE APPLY_QUAL = ’’ AND SET_NAME = ’<set_name>’ AND WHOS_ON_FIRST = ’<whos_on_first>’; -- Change APPLY_QUAL / SET_NAME within the Subscription Statements Table UPDATE ASN.IBMSNAP_SUBS_STMTS SET APPLY_QUAL = ’’, SET_NAME = ’’ WHERE APPLY_QUAL = ’’ AND SET_NAME = ’<set_name>’ AND WHOS_ON_FIRST = ’<whos_on_first>’;
3. Update the pruning control table at the replication source server to change the Apply Qualifier and set name. -- Change APPLY_QUAL / SET_NAME within the Pruning Control Table UPDATE ASN.IBMSNAP_PRUNCNTL SET APPLY_QUAL = ’’, SET_NAME = ’’ WHERE APPLY_QUAL = ’’ AND SET_NAME = ’<set_name>’ AND CNTL_ALIAS = ’’ AND TARGET_SERVER = ’’;
4. Restart Apply. Remember to provide a new password file if Apply is running on Windows or UNIX platforms, because the Apply Qualifier is part of the name of the password file.
Replication Operation, Monitoring and Tuning
135
5.7 Summary Wow, that was a lot of stuff. You, as a replication administrator, do not have to know everything mentioned in this chapter right from the beginning, but the more you know, the more comfortable you will feel. As a reminder, we will summarize all the activities that the replication administrator has to deal with, in a cross-platform, multi-vendor, high-performance distributed relational data replication system: • Initialize the replication system • Perform repetitive tasks, such as database and replication housekeeping • Guarantee optimal distributed performance • Monitor replication • React on replication setup change requests In Part 2 of this redbook, we will use several of the discussed design alternatives and setup strategies in four case studies. All case studies deal with a different business problem. All case studies integrate a different non-IBM database system, either used as replication source or as replication target. And now: Happy replication!
136
The IBM Data Replication Solution
Part 2. Heterogeneous Data Replication—Case Studies
© Copyright IBM Corp. 1999
137
138
The IBM Data Replication Solution
Chapter 6. Case Study 1—Point of Sale Data Consolidation, Retail The case study introduced in this chapter gives an implementation example for projects that are using multi-vendor databases as a source for replication. For the specific business application we are using in this example, we have chosen Informix Dynamic Server (V7.3) as the replication source database, but the techniques that we are going to use are applicable as well to other non-IBM source databases, such as Oracle, Microsoft SQL Server, or Sybase SQL Server, too. As a business example we chose a retail application that aims to consolidate sales data, which originates at several distributed branch offices, into a central data store. The central replication target platform is DB2 for OS/390. The overall objectives of this case study are to: • Show an example for replication from multi-vendor source databases. • Provide a general solution for all related business requirements that have to consolidate distributed data to a central site. • Use Apply for OS/390 to replicate Informix source data. In accordance with the guidelines presented in Part 1 of this book, this chapter is structured into the following phases: Planning: The planning section contains the description of the business requirements and an overview of the solution we want to demonstrate. Design: This section is used to highlight the design options that are most appropriate to implement this data consolidation application. We will give additional recommendations on how to scale the application to a large number of replication source servers. Implementation: Working along the general implementation checklist that was provided in Chapter 4, we will demonstate how we set up the test system to prove the chosen design. Especially, those setup steps are described in detail, that are unique for this case study. For example, we a providing a lot of interesting details about how we connected DataJoiner to the Informix instances that we used as replication source server, and how we set up the replication subscriptions to consolidate the distributed data into one central table. Finally, we are going to reveal some details about how the capture triggers are used to emulate all functions, that, for DB2 replication sources, are
© Copyright IBM Corp. 1999
139
provided by DProp Capture. This general section is applicable to all non-IBM replication source databases.
6.1 The Business Problem A retail company with a number of branches throughout the country uses a central DB2 for OS/390 Data Sharing Group at the company’s head office and Informix Servers on AIX in the remote branches. The major business applications are maintained on OS/390, whereas electronic point-of-sales (EPOS) systems are maintained on AIX servers at each of the branch offices. Figure 21 displays the high level system architecture used by the "retail" company. So far, no database connectivity exists between the Informix EPOS systems and the mainframe DB2 data sharing group. Until today, data has only been exchanged through FTP using the existing TCP/IP network that connects all branch offices to the company’s headquarters.
Company Headquater DB2 Data Sharing Group DB2I
TCP/IP
Branch 01
Branch 02
Branch 03
Branch nn
Informix
Informix
Informix
Informix
Figure 21. Case Study 1—High Level System Architecture
Considering a work flow approach, the business system we are architecting has to deal with two major information flows: • One information flow has to supply all branches with product and price information. The product set and also the prices can vary from branch to branch. From a data replication perspective we call this a data distribution scenario, most likely using advanced subsetting techniques to deliver only the information to a branch that belongs to the particular branch’s data scope.
140
The IBM Data Replication Solution
• The second information flow consolidates data, which is autonomously created at the branches, back to the company’s head office. In this example, we consider this data to be SALES data, generated by electronic point-of-sales (EPOS) applications. From a data replication perspective we call this a data consolidation scenario, most likely using advanced aggregation techniques to condense the amount of data that has to be replicated back to the head office each day. Figure 22 visualizes both information flows. In this chapter, we will focus on data consolidation techniques, with the additional consideration that the branch offices are using Informix Dynamic Server as their database management system. To gain knowledge about data distribution techniques, please refer to Chapter 6, “Case Study 1—Point of Sale Data Consolidation, Retail” on page 139, which explicitly focuses on a data distribution scenario.
DB2 for OS/390 V5 PRODUCT / PRICES Business Applications
EPOS System
SALES DETAILS
Informix IDS V7.3
Figure 22. Major Information Flows
Related Business Problems The replication techniques subsequently introduced in this case study generally apply to all scenarios consolidating multiple identically structured source tables into one single target table. In addition to the retail case study introduced in this chapter, the following examples are candidates for data consolidation through replication:
Case Study 1—Point of Sale Data Consolidation, Retail
141
• System Management applications, heterogeneous or not, that store configuration data regarding all client stations at distributed LAN servers. To enable central User Help Desk (UHD) applications, it is required to consolidate the configuration data into a central data store. • Banks use replication techniques to consolidate customer data, recorded at branch offices, into the central customer information system. • Manufacturing companies, whick operate several independent production plants, consolidate process information recorded at plant level into central production planning and control systems. Remark: All data replication and consolidation techniques introduced in this chapter are of course applicable to all DB2 replication source systems as well as to all other non-IBM replication source platforms.
6.2 Architecting the Replication Solution Following the task guidelines that we recommended in part one of the book, we will design architectural options before actually starting to implement the solution. As explained in Chapter 3, “System and Replication Design—Architecture” , the discussion of basic architectural options and recommendations can be divided into the following topics: • Architecting the system design • Architecting the replication design We will follow this approach while designing the replication solution for this case study.
6.2.1 Data Consolidation—System Design In Chapter 3 we discussed all system design alternatives and design options in detail. For replication applications used to consolidate data from several distributed system components to a central location, we clearly elaborated the following recommendations with regard to the placement of the different system components. 6.2.1.1 Placement of the System Components Design decisions have to be made fot the location of DProp Apply, the DProp control tables, and the DataJoiner middleware server.
142
The IBM Data Replication Solution
DProp Apply: In the data consolidation application that we are going to implement, we are dealing with one central replication target server. The number of source servers may be large. Both cost of administration as well as best performance can be optimized by locating Apply centrally at the replication target server. Control Table Placement: The control tables that coordinate change capture always have to be created at the replication source server. Apply’s control tables, the control server tables , can be placed anywhere in the network. As we decided to locate Apply centrally at the replication target server database, we also will create Apply’s control tables within the replication target server database. All subscription information can be retrieved using only one local database connection. Performance and manageability could not be better. DataJoiner Placement: One central DataJoiner instance enables best performance as well as ease of administration for the data consolidation approach that we want to implement. Refer back to 3.2, “System Design Options” on page 39 for a deeper discussion of the system design options available for replication systems using IBM DProp and IBM DataJoiner. 6.2.1.2 DataJoiner Databases As discussed in detail in Chapter 3, one DataJoiner database is required for each heterogeneous replication source database. Each DataJoiner database will contain one server mapping for one Informix replication source database (other server mappings are optional, but they will not be used for replication).
One DataJoiner Database for Each Non-IBM Replication Source To create one DataJoiner database for each non-IBM replication source database is a technical requirement of the DataJoiner / DProp replication solution. Let us see how we can deal with this requirement. The most interesting question here is, whether there will be any volatile data stored in these DataJoiner databases that will make any housekeeping for the DataJoiner databases necessary. And the answer is definitely NO! In a replication environment, the DataJoiner databases are only used to store nicknames. After the nicknames are defined, the data within these DataJoiner databases will never change again. All control tables that are needed to configure the capture triggers are directly placed within the Informix source database. They are only referenced by nicknames. All other control tables that are created by default within the DataJoiner databases will not contain
Case Study 1—Point of Sale Data Consolidation, Retail
143
any data (they are just necessary to successfully bind Apply’s static SQL packages against the DataJoiner databases). In summary, no regular maintenance is required for the possibly large number of DataJoiner databases.
Disk Space Requirements The only concern you might still have is that a large number of databases will occupy some disk space. Well, that is right. In the test setup described in this chapter, each DataJoiner database occupied about 20 MB of disk space. More than half of the space (12 MB) was used by log files. That means, the occupied disk space could be slightly reduced by reducing the log space offered for each database (knowing that we will not have any transactions writing into DataJoiner objects anyway). As a rule of thumb to estimate how much disk space will be finally required for all DataJoiner databases, multiply the number of non-IBM source servers by 20 MB: Number of Non-IBM Source Servers * 20 MB = Required DJ DB Disk Space
To consolidate data from 50 Informix servers for instance will require 1 GB of disk space at the DataJoiner server (the data volumes to be replicated do not influence the above formula). 6.2.1.3 How Many Apply Instances? When designing the solution, we finally have to decide how many Apply jobs will be collecting the data from the remote sources. Well, generally speaking, one Apply job is a good first approach. This Apply job, identified by a single Apply Qualifier, would be servicing all remote sources, one after the other. Optionally, to collect data from several branches in parallel, it would be a good idea to set up multiple Apply instances, one instance for each Apply Qualifier. Each Apply job would be servicing a different subset of all available subscriptions. If you realize, when your replication system is growing, that it takes too much time to collect data from all branches sequentially, you can always re-distribute the subscriptions that are already running over all available Apply qualifiers. To do so, follow the instructions given in 5.6.6, “Changing Apply Qualifier or Set Name for a Subscription Set” on page 134. A good example for distributing subscription sets over multiple Apply Qualifiers would be:
144
The IBM Data Replication Solution
• One Apply job for one or a small number of branches with a large replication volume • One Apply job for several branches with a smaller replication volume OS/390 Remark: On OS/390, for example, the size of the spill file that Apply allocates when it fetches data from a change data table is defined within the Apply start job. If you specify a huge file size, because the biggest shop requires it, a huge spill file is allocated for every set that this Apply job (this Apply Qualifier) services. This remark does not apply to Apply for UNIX or Intel platforms.
6.2.2 Data Consolidation—Replication Design After deciding about the system design, and before implementing the solution, we will be deciding the major replication design issues. Most interesting, because we have not discussed this before, will be the introduction of the target site UNION approach to achieve the automated consolidation of several distributed tables into one single target table. 6.2.2.1 Are Before Images Required? The central sales application the retail company is deploying needs the data recorded at the branches without any structural change. Before Images are not required. 6.2.2.2 Are Primary Keys or Partitioning Keys Ever Updated? Sales data is always inserted into the sales tables at the branch offices. The data is never updated. Therefore, we do not need to take special care on how to capture updates. We will not change the DProp standard setting to capture updates as updates. 6.2.2.3 Data Consolidation: Target Site Union The data consolidation approach we are going to use in this case study could be easily compared to a materialized UNION of all identically structured distributed replication source tables. Consider that the same identically structured table, in our example the SALES table, is created at each of the distributed locations. Additionally, consider that all the distributed tables will be defined as sources for data replication. To consolidate the content of all the SALES tables into one large company-wide SALES table that contains the data of all distribution sites, the technique we are introducing here basically requires creating multiple
Case Study 1—Point of Sale Data Consolidation, Retail
145
subscriptions that all point to the same target table. What we have to be concerned with, is that an initial refresh from one location does not replace the data already available at the target site, but "appends" the data from the new location to what is already available. In DProp terminology, we call this approach a Target Site UNION approach.
Implementing a Target Site Union Design In fact we will let Apply replicate all changes from all locations into the same target table. But to prevent that Apply replaces the target table content with the data from one location only in case of an automatic full refresh; views are going to be defined over the target table. Each subscription, coming from a different target, will be defined to replicate through a different view. Figure 23 graphically represents this consolidation technique:
UNION TABLE
VIEW 1
VIEW 2
VIEW n
TARGET
SOURCE 1
SOURCE 2
SOURCE n
Figure 23. Target Site UNION Example—Basic Idea
The following task list describes how to set up a Target Site Union replication system: 1. Create the replication target table manually. Use the same DDL (same structure) as used at the distributed locations. 2. Create as many views over the target table as there are distributed locations (create as many views as the number of subscriptions that you expect). Each view should be created as follows:
146
The IBM Data Replication Solution
SELECT * FROM OWNER.REPLTARGET WHERE CONSTANT_COLUMN = ’’;
Usually, if data is distributed over several locations, the data contains a unique identifier (like a branch number, for example), that shows where this data was originated. The unique identifier can be a single column or the combination of several columns. Use all the columns that uniquely identify the source data when creating the different views. If the source data does not contain any unique identifier, refer to “What if the Source Data Contains No Unique Identifier” on page 147 to see how such a unique identifier can be created during replication (with DProp!). 3. After defining all target views, define one subscription set for each replication source server. Add a member to each set, and select the appropriate view (containing the WHERE clause that identifies the source location) as target table for the subscription member. In case of a full refresh, this technique lets Apply automatically append the content coming from one location, instead of deleting the complete table and inserting the data from the one location that is currently being refreshed. Apply does so by deleting everything from the view, letting the WHERE clause of the view limit the effect of the delete. Apply has no knowledge of the UNION table: Apply knows only the view. Background on Refresh
When Apply performs an initial refresh, Apply deletes the complete target table before inserting the content selected from the replication source table (Apply replaces the target content with the source content to initialize the replication subscription). Defining the view as replication target table, Apply’s delete (we call that mass delete) is restricted to the values that fulfill the where-clause of the view. In the next section, we are going to describe what to do if the existing replication source tables do not contain any attribute that uniquely identifies where the data was originally created.
What if the Source Data Contains No Unique Identifier Regardless of whether your replication source tables, located at several distributed source databases, contain an attribute that is unique at each source site, we have to create a single view over the target table for every single source table. This technique is needed, as described above, to let
Case Study 1—Point of Sale Data Consolidation, Retail
147
Apply perform the full refresh without always deleting the complete target table before inserting the records coming from another source. If your source tables do not have an attribute that is unique at every source site that could be used in the where-clause of the target site views, we have two options to generate such a uniqueness attribute: 1. Create a new column at every source site. 2. Create a uniqueness attribute automatically during replication without the need to change the source data model. Obviously we could create another column, but that is perhaps not what we want. More easily, we could use one of DProp’s advanced features and create the uniqueness attribute on the fly (while replicating the data up to the consolidated target). The technique for creating a uniqueness attribute during replication is to add a computed column to every subscription member that replicated into a consolidated target table. This computed column needs to contain a different constant string for every source server (like a branch number). We recommend using the SQL function SUBSTR (substring) to compute a constant. Use the DJRA feature List Members or Add a Column to Target Tables to add a computed column to a subscription member as shown in Figure 24.
Figure 24. Add a Computed Column to a Subscription
148
The IBM Data Replication Solution
You can either create the target table (and the target views) including the new target column already, or let DJRA create the new column for you. If the new computed target column is not yet present, DJRA generates an ALTER TABLE statement automatically. The following SQL excerpt shows the most interesting statements that were automatically generated by DJRA: --* The column name REPLFLAG1 is not present in the target table --* CHRIS.REPLFLAG. ALTER TABLE CHRIS.REPLFLAG ADD REPLFLAG CHAR(8) NOT NULL WITH DEFAULT; ... -- create a new row in IBMSNAP_SUBS_COLS INSERT INTO ASN.IBMSNAP_SUBS_COLS (APPLY_QUAL, SET_NAME, WHOS_ON_FIRST, TARGET_OWNER, TARGET_TABLE, COL_TYPE, TARGET_NAME, IS_KEY, COLNO, EXPRESSION) VALUES (’IFXUP02’, ’SET01’ , ’S’, ’CHRIS’, ’REPLFLAG’, ’C’, ’REPLFLAG’, ’N’, 3 , ’SUBSTR (’’BRANCH01’’ , 1 , 8)’);
Create the views at the target site referencing the new calculated column in the where-clause, like: WHERE REPLFLAG = ’BRANCH01’;
6.2.2.4 Aggregation In common data consolidation examples, it not necessary to replicate all table records created at the source sites. Instead, it would be sufficient to replicate a summary only (for example, summaries grouped by products). The IBM replication solution provides two methods for the replication of summaries: • Base Aggregates: Summaries are built over the replication source tables • Change Aggregates: Summaries are built over the change data tables Both techniques can, of course, also be used when replicating into a consolidated target table. For an advanced practical example, have a look at Chapter 8, especially 8.4.7.1, “DProp Support for Aggregate Tables” on page 256.
6.3 Setting Up the System Environment This section will guide you through all installation and configuration tasks that were necessary to set up the test system for this case study. Basically, we will work along the implementation checklist that was provided in Chapter 4, “General Implementation Guidelines” .
Case Study 1—Point of Sale Data Consolidation, Retail
149
6.3.1 The System Topology Figure 25 shows the overall test system setup for the data consolidation scenario used in this case study. The test environment consists of three database system layers: • Layer 1: Informix Dynamic Server Version 7.3 for AIX, used as replication source systems. The test scenario consists of three independent Informix instances on three different AIX servers, namely sky, azov and star. • Layer 2: IBM database middleware infrastructure. The main component of this system layer is IBM DataJoiner, Version 2.1.1 for AIX. DataJoiner is used as the database middleware gateway to enable transparent access to the Informix replication source systems. Moreover, an IBM DB2 Universal Database (UDB) instance (without database) is used as a DRDA application server to enable Apply to connect from OS/390 to DataJoiner using TCP/IP (The DB2 instance would not be necessary if we had used DRDA over SNA). • Layer 3: IBM DB2 for OS/390, Version 5.1.1, is used as replication target system. DProp Apply for OS/390, Version 5.1 is installed at the target system. Figure 25 shows that system layer 2 (the IBM database middleware layer) is installed on one of the AIX servers (sky), that already contained one of the Informix instances. The other two Informix instances are accessed remotely using Informix ESQL/C client software.
150
The IBM Data Replication Solution
DB2 for OS/390 Data Sharing Group DB2I DProp
Apply
mvsip
DB2 UDB V5.2 DataJoiner V2.1 DJDB01 DJDB02 DJDB03 Ifx Client SDK Informix V7.3 sj_branch03 Informix V7.3
sky
sj_branch01
azov
Informix V7.3 sj_branch02
star
Figure 25. Case Study 1—Test System Topology
Smart Remark: All network connections between all system components use TCP/IP as the network protocol.
6.3.2 Configuration Tasks This section documents setup tasks, as we used them to prepare the systems for case study 1. Refer to Chapter 4, “General Implementation Guidelines” on page 61 for all general setup activities that are not specific to this case study. Assumption: All three Informix server instances are installed and running.
Case Study 1—Point of Sale Data Consolidation, Retail
151
Table 5 names all Informix instances on the dedicated AIX servers that were used during this case study. The names will be referenced later during several setup tasks. Table 5. Informix Instances used in this Case Study
AIX server (AIX V 4.3)
Informix Instance Name
sky
sjsky_ifx01
azov
sjazov_ifx01
star
sjstar_ifx01
All Informix server instances were from "Informix Dynamic Server, Version 7.30UC7". 6.3.2.1 Install and Set Up the DataJoiner Middleware Server One of the AIX servers belonging to the test setup is used as a branch server as well as the IBM DataJoiner middleware server. The location of this AIX server is assumed to be close to the company’s headquarters.
Set Up DataJoiner to Access the Informix Servers Based on step 1 and step 2 of the implementation checklist, the first task was to enable native Informix connectivity from AIX server sky (which, from an Informix perspective, behaves as a client to all other Informix server instances) to all other Informix instances. Use the Informix manuals (especially the Administrator’s Guide for Informix Dynamic Server, Part#000-4354) for further details. On sky, we completed the following three configuration steps: 1. Install the Informix Client Code (We used Informix Client SDK for AIX, Version 2.10UC1). 2. Set Informix environment variables: INFORMIXDIR - pointing to the Informix ’home’ directory INFORMIXSQLHOSTS - if sqlhosts file is located in a directory other than $INFORMIXDIR/etc 3. Set up the Informix connectivity file sqlhosts In order to connect to all Informix instances, the sqlhosts file used on sky was configured with the following four entries. Please be aware that we will reference these entries later when creating the DataJoiner server mappings. #******************************************************************** # # location: $INFORMIXDIR/etc/sqlhosts #
152
The IBM Data Replication Solution
# Title: sqlhosts # Description: sqlhosts file to access Informix servers at IBM ITSO # #********************************************************************* sjazov_ifx01 sjstar_ifx01 sjsky_ifx01 sjsky_ifx01_shm
onsoctcp onsoctcp onsoctcp onipcshm
azov star sky sky
2800 2801 2810 dummy
To check the success of this configuration step we used the Informix client interface dbaccess to natively connect to all three Informix instances. Refer to Appendix B, especially B.2.2, “Using Informix’s dbaccess” on page 329 for useful instructions on how to set up and use Informix’s client interface dbaccess.
Prepare DataJoiner to Access the Informix Servers To set up this heterogeneous environment we used IBM DB2 DataJoiner for AIX V 2.1.1, PTF U462216. The first step after loading the DataJoiner code onto the middleware server was to create an Informix data access module (“Step 4—Prepare DataJoiner to access the remote data sources” of the implementation checklist). DataJoiner will use this access module for all connections to Informix using the currently installed version of the Informix client. In compliance with the DataJoiner for AIX Planning, Installation and Configuration Guide, SC26-9145, the Informix data access module was created using the following command, logged on as user root: make -f djxlink.makefile informix72
Edit the file djxlink.makefile before executing the make command to set the Informix environment variables accordingly. The result of executing the make command will be the Informix data access module, named ’ informix72’. Remark: The name of the DataJoiner data access module we created during this step is ’informix72’. Nonetheless, we will access Informix servers running Informix Dynamic Server Version 7.3. To clarify, the name of the data access module is not related to the server version. It is just a label. If you like, change the name when building the data access module.
Case Study 1—Point of Sale Data Consolidation, Retail
153
Set Up a DataJoiner Instance In accordance with “Step 5—Create a DataJoiner instance” of the general implementation checklist, the DataJoiner instance was created. We created the instance with the name ’djinst3’. The instance was set up to support TCP/IP clients. Finally, three databases were created, one for each Informix replication source server. Why Another DB2 UDB Instance?! Well yes, that’s an interesting point! If you go back to Figure 25 on page 151, you can see that the DB2 UDB instance was installed on AIX server sky. According to the information flow, note that it is placed between the DataJoiner instance and the DB2 for OS/390 replication target system. The following sections explain why and how we used the DB2 UDB instance in this case study: DB2 UDB for AIX, Version 5.2, was not only installed to demonstrate how flexible the DB2 communication setup really is, but also to work around a current limitation of DataJoiner: Although DataJoiner, V2.1.1, contains the capability to access all DRDA servers using TCP/IP, DataJoiner V.2.1.1 is not enabled to be a DRDA application server through TCP/IP itself. DB2 UDB, by the way, is. Therefore, we are using DB2 UDB (without databases) simply as a DRDA gateway (DRDA application server). DB2 for OS/390 (APPLY, respectively) has no knowledge about the DataJoiner Instance. It only knows the DataJoiner databases and the DB2 UDB instance. DB2 for OS/390 will always "hop over the DB2 UDB instance" to connect to the DataJoiner databases. How Does it Work: 1. Both the DB2 UDB instance and the DataJoiner instance are accepting TCP/IP clients. DataJoiner and DB2 UDB are assigned to different TCP/IP ports (this is done by updating the database manager configuration of each the DB2 and DataJoiner instance, setting the SVCENAME parameter). To update the database manager configuration of the DB2 UDB instance, we logged on as DB2 instance owner and issued the following command (the port address you are using might be different): update database manager configuration using SVCENAME 2820
Note that we will configure DB2 for OS/390 to connect to this DB2 UDB instance by assigning the DB2 UDB instance’s port number to the target
154
The IBM Data Replication Solution
locations that we cataloged into the communication database of the DB2/390 system. To update the database manager configuration of the DataJoiner instance, we logged on as DataJoiner instance owner and issued the following command: update database manager configuration using SVCENAME 2822
2. Within the DB2 UDB instance, we subsequently cataloged the DataJoiner instance as a TCP/IP node as well as all DataJoiner databases: Catalog DataJoiner at the DB2 UDB instance: catalog tcpip node DJSKY remote sky server 2822
Catalog the DataJoiner databases at the DB2 UDB instance: catalog database DJDB01 at node DJSKY catalog database DJDB02 at node DJSKY catalog database DJDB03 at node DJSKY
3. Next, we configured the DB2 for OS/390 communication database (CDB). The values were inserted, as if the DataJoiner databases were located at the DB2 UDB instance: -------
SYSIBM.LOCATIONS -----------------------------------------------LOCATION: LINKNAME: PORT: TPN:
DJ database, DJDB01 in this case Arbitrary pointer to SYSIBM.IPNAMES The TCP/IP service port of the DataJoiner instance This column is onlt used for APPC connections
insert into SYSIBM.LOCATIONS (LOCATION, LINKNAME, PORT) values (’DJDB01’, ’UDBSKY’, ’2820’) ; insert into SYSIBM.LOCATIONS (LOCATION, LINKNAME, PORT) values (’DJDB02’, ’UDBSKY’, ’2820’) ; insert into SYSIBM.LOCATIONS (LOCATION, LINKNAME, PORT) values (’DJDB03’, ’UDBSKY’, ’2820’) ; ------
SYSIBM.IPNAMES ------------------------------------------------LINKNAME: Pointer to SYSIBM.LOCATIONS IPADDR: HOSTNAME or IP Address SECURITY_OUT: P (connect with userid and password)
Case Study 1—Point of Sale Data Consolidation, Retail
155
-- USERNAMES:
O (outbound)
insert into SYSIBM.IPNAMES (LINKNAME, IPADDR, SECURITY_OUT, USERNAMES) values (’UDBSKY’, ’sky.almaden.ibm.com’, ’P’, ’O’) ; --------
SYSIBM.USERNAMES ----------------------------------------------TYPE: AUTHID: LINKNAME: NEWAUTHID: PASSWORD:
O (Type DB2RES5 Pointer djinst3 djinst3
of translation, oh for outbound) to SYSIBM.LOCATIONS (DJ authid) (dj’s password)
insert into SYSIBM.USERNAMES (TYPE, AUTHID, LINKNAME, NEWAUTHID, PASSWORD) values (’O’, ’DB2RES5’, ’UDBSKY’, ’djinst3’, ’pwd’) ;
Refer to the IBM Redbook Wow! DRDA supports TCP/IP, SG24-2212 for further details on how to set up DRDA connectivity using TCP/IP between DB2 for OS/390 and other DB2 database servers. Remark: DB2 for OS/390 caches the tables of the communication database (CDB). Therefore, if you update your CDB tables again after the first connection attempt, you will need to recycle the DB2 Distributed Data Facility (DDF) to make your changes effective. To reposition ourselves: We just completed “Step 8—Enable DB2 clients to connect to the DataJoiner databases” of the implementation checklist. DB2 for OS/390 is DataJoiner’s client in this case.
Set Up DataJoiner to Transparently Connect to Informix The following paragraphs will provide you with the details that we used to configure DataJoiner’s access to Informix databases. The SQL statements refine “Step 9—Create Server Mappings for all non-IBM database systems” and “Step 10—Create the Server Options” of the general heterogeneous implementation checklist. Because one DataJoiner database is needed for every non-IBM replication source database, we created one single server mapping within each of the DataJoiner databases. The SQL statements below show how we configured the DataJoiner database DJDB01. connect to djdb01 ;
156
The IBM Data Replication Solution
create server mapping from SJ_BRANCH01 to node "sjazov_ifx01" database "sj_branch01" type informix version 7.3 protocol "informix72" ; create server option TWO_PHASE_COMMIT for server SJ_BRANCH01 setting ’N’ ; create user mapping from djinst3 to server SJ_BRANCH01 authid "djinst3" password "pwd" ;
Remark: Even if the DataJoiner user (djinst3 in this case) is authorized to natively connect to Informix, a user mapping will be needed when the DProp control tables are created. Note that the password specified when creating the user mapping will be encrypted before it is stored within the DataJoiner catalog tables.
Set Up the Replication Administration Workstation A Windows NT Workstation Version 4.0 PC was used as replication administration workstation. The software components that had to be installed were: • DB2 UDB Client Application Enabler V 5 • DataJoiner Replication Administration (V. 2.1.1.140) From this client, we cataloged the DataJoiner instance as TCP/IP node and all databases directly at the DataJoiner instance (for LAN connections, no hopping over DB2 UDB is required). The DB2 for OS/390 replication target server was also cataloged at the DataJoiner instance, using the DataJoiner instance as DRDA Gateway.
Create Replication Control Tables The replication control tables were created for DB2 for OS/390 (replication target) and for all DataJoiner databases. Notice that, because we are using a non-IBM database as a replication source, some replication control tables are natively created within the Informix source databases, and others are created within the DataJoiner databases. Refer to 3.2.3, “Control Tables Placement” on page 46 for the background. Nicknames are created within the DataJoiner
Case Study 1—Point of Sale Data Consolidation, Retail
157
databases to enable transparent access to those control tables created at the remote data sources.
Bind Apply After creating the replication control tables, Apply for OS/390 was bound against the replication target server (DB2 for OS/390) and against all DataJoiner databases. Refer to “Step 22—Bind DProp Apply” of the general implementation guidelines for more details about the Bind task.
6.4 Nice Side Effect: Using SPUFI to Access Multi-Vendor Data Wouldn’t you like to know (from a DB2-for-OS/390 perspective) what the client-server folks are doing down there on their workstations? You, as a DBA using DB2 for OS/390, can easily find this out, after the DRDA connectivity between DB2 for OS/390 and DataJoiner and the connectivity between DataJoiner and Informix has been established, by using SPUFI on OS/390 to query Informix data (generally all databases that can be transparently accessed through DataJoiner). To enable SPUFI to work with DataJoiner databases, just bind SPUFI against those databases. SPUFI packages have to be bound against all new locations you want to access (in our case, all three DataJoiner databases), and the SUPFI plan has to be rebound, for all locations (including those you were accessing already before). The following excerpt of the Bind job shows the procedure: DSN SYSTEM(DB2I) BIND PACKAGE (DJDB01.DSNESPCS) MEMBER(DSNESM68) ACT(REP) ISO(CS) SQLERROR(NOPACKAGE) VALIDATE(BIND) BIND PACKAGE (DJDB02.DSNESPCS) MEMBER(DSNESM68) ACT(REP) ISO(CS) SQLERROR(NOPACKAGE) VALIDATE(BIND) BIND PACKAGE (DJDB03.DSNESPCS) MEMBER(DSNESM68) ACT(REP) ISO(CS) SQLERROR(NOPACKAGE) VALIDATE(BIND) BIND PLAN(DSNESPCS) PKLIST(*.DSNESPCS.DSNESM68) ISOLATION(CS) ACTION(REPLACE)
To process any SQL against DataJoiner, and therefore any SQL against Informix, possibly using nicknames, DataJoiner’s PASSTHRU mode or transparent DDL, set the CONNECT LOCATION on the SPUFI main panel to the location name of the DataJoiner database (as defined within the DB2 for OS/390 communication database): For remote SQL processing:
158
The IBM Data Replication Solution
10 CONNECT LOCATION ===> DJDB01
Play around! Create a table in, say Informix, using SPUFI for OS/390, to realize that there are no more limits (ask your DataJoiner administrator for the necessary database privileges).
6.5 Implementing the Replication Design In this section we will actually describe which replication sources and targets we defined to consolidate Informix sales data on DB2 for OS/390. Assumption: All Informix databases contain an identically structured sales table. The table name is SJCOMP.SALES. The Informix databases are called sj_branch01 (server azov), sj_branch02 (server star), sj_branch03 (server sky).
6.5.1 Registering the Replication Sources If a table located at a non-IBM database is going to be registered as a replication source, a NICKNAME for the table has to be created upfront (manually). This nickname will be registered as a replication source using the DataJoiner replication administration. Therefore, the first task when preparing the registration of the SALES tables, located within the three Informix source databases, was to create a nickname for every SALES table: CONNECT TO DJDB01; -- Create a NICKNAME for the SALES table CREATE NICKNAME SJCOMP.SALES FOR "SJ_BRANCH01"."sjcomp"."sales"; CONNECT TO DJDB02; -- Create a NICKNAME for the SALES table CREATE NICKNAME SJCOMP.SALES FOR "SJ_BRANCH02"."sjcomp"."sales"; CONNECT TO DJDB03; -- Create a NICKNAME for the SALES table CREATE NICKNAME SJCOMP.SALES FOR "SJ_BRANCH03"."sjcomp"."sales";
Case Study 1—Point of Sale Data Consolidation, Retail
159
All nicknames can have the same name. They are all created within separate databases. After creating the nicknames, the DJRA function Define One Table as a Replication Source was used to register the nicknames as replication sources. We chose the following replication source characteristics for this case study: • Capture all available columns • Capture After-Images only • Capture Updates as Updates (not as Delete/Insert pairs) When generating the SQL script to actually register a non-IBM table as a replication source, DJRA determines if the object to be registered is a DB2 table or a nickname. Given that it is a nickname, DJRA automatically generates native DDL statements (in this case native Informix DDL) to create all necessary capture triggers as well as the change data table remotely at the non-IBM database. If you want to understand how the created change capture triggers finally work, see section 6.7, “Some Background on Replicating from Multi-Vendor Sources” on page 166. It introduces an overall picture of all triggers defined for a non-IBM replication source server and describes how the triggers interact to emulate all functions that, for DB2 replication sources, are provided by DProp Capture.
6.5.2 Preparation of the Target Site Union To prepare for the target site UNION, we created the target table at the replication target server before defining the first subscription. Additionally, we created one view over the target table for each subscription replicating into the consolidated SALES table. 6.5.2.1 Creating the Target Table We created the replication target table on DB2 for OS/390, using SPUFI: -- Create Replication Target Table in DB2 for OS/390 CREATE TABLE SJCOMP.SALES ( DATE TIMESTAMP NOT NULL, LOCATION DECIMAL (4 , 0) NOT NULL, COMPANY DECIMAL (3 , 0) NOT NULL, SALESTXNO DECIMAL (15 , 0) NOT NULL, ITEMNO DECIMAL (13 , 0) NOT NULL, PIECES DECIMAL (7 , 0) NOT NULL, OUT_PRC DECIMAL (11 , 2) NOT NULL,
160
The IBM Data Replication Solution
TAX DECIMAL (11 , 2) NOT NULL, WGRNO DECIMAL (5 , 0) NOT NULL, SUPPLNO DECIMAL (7 , 0) NOT NULL) IN SJ390DB1.TSSALESH; -- Create Replication TARGET Table Index in DB2 for OS/390 CREATE UNIQUE INDEX SALESIX ON SJCOMP.SALES (LOCATION, COMPANY, SALESTXNO); COMMIT;
6.5.2.2 Creating Views Over the Target Table We created three views over the target table, one for each replication source table (branch sales table). The WHERE clause uniquely identifies each branch: -- Create Replication TARGET View for branch01 (company 63 / location 54) CREATE VIEW DB2RES5.SALES6354 AS SELECT * FROM SJCOMP.SALES WHERE COMPANY = 63 AND LOCATION = 54; -- Create Replication TARGET View for branch02 (company 63 / location 55) CREATE VIEW DB2RES5.SALES6355 AS SELECT * FROM SJCOMP.SALES WHERE COMPANY = 63 AND LOCATION = 55; -- Create Replication TARGET View for branch03 (company 63 / location 57) CREATE VIEW DB2RES5.SALES6357 AS SELECT * FROM SJCOMP.SALES WHERE COMPANY = 63 AND LOCATION = 57; COMMIT;
6.5.3 Defining Replication Subscriptions Because each of the three source tables is located at a different source server (different DataJoiner databases, connecting to the Informix source tables), we will have to define three subscription sets (one subscription set always includes one source server and one target server). After defining the sets, one member was added to each set. We have been using DataJoiner Replication Administration (DJRA) to set up the subscription set and the subscription members.
Case Study 1—Point of Sale Data Consolidation, Retail
161
6.5.3.1 Creating Three Empty Subscription Sets We created three subscription sets, as shown in Table 6. All subscriptions sets were defined with the same Apply Qualifier; that means, all subscription sets will be serviced by the same Apply job. Table 6. Subscription Set Characteristics for the Data Consolidation Approach
Source Server
Target Server
Apply Qualifier
Set Name
Event Name
DJDB01
DB2I
SALES01
BRANCH01
BRANCH01
DJDB02
DB2I
SALES01
BRANCH02
BRANCH02
DJDB03
DB2I
SALES01
BRANCH03
BRANCH03
Note that we chose event-driven subscription timing, using a single event for every set (to better control the replication activities for our test scenario). 6.5.3.2 Adding One Member to Each Subscription Set After the subscription sets were generated, one member was added to every set. As target table we specified one of the previously created views: Source Table DJDB01.SJCOMP.SALES01 DJDB02.SJCOMP.SALES01 DJDB02.SJCOMP.SALES01
--> --> -->
Target View DB2RES5.SALES6354 DB2RES5.SALES6355 DB2RES5.SALES6357
Note: DJRA also supports the setup of subscription members for existing target tables or target views. That means, no target table is created if the target table (or view) already exists. However, a CREATE TABLESPACE statement is always generated, regardless of whether the target table exists or not. We simply removed the CREATE TABLESPACE statement from the SQL output that DJRA generated.
6.5.4 Starting Apply One Apply job was created and customized for Apply Qualifier ’SALES01’. As all sets are defined for the same Apply Qualifier, this Apply job will service all three subscriptions. 6.5.4.1 The Customized Apply Job The following job excerpt shows the invocation parameters that we used to start the Apply job: 000039 000040 000041 000042
162
//ASNARUN EXEC PGM=ASNAPV25, // PARM=’SALES01 DB2I DISK’ //* <== APPLY_QUALIFIER //* <== CONTROL_SERVER
The IBM Data Replication Solution
000043 //*
<== SPILL FILE OPTION ’DISK’
6.5.4.2 Triggering an Event The actual replication action was triggered by inserting events into DProp’s event table. The data from all three source servers was separately replicated by issuing separate inserts into the event table. The following insert into the event table, for example, will trigger the subscription replicating from branch 03: INSERT INTO ASN.IBMSNAP_SUBS_EVENT (EVENT_NAME, EVENT_TIME) VALUES (’BRANCH03’, CURRENT TIMESTAMP);
Remark: Apply queries the event table after every subscription cycle to see if there are new events that trigger another subscription. If there is nothing to replicate, Apply will at least query the event table every 5 minutes.
6.5.5 Some Performance Remarks Please do not think that we are seriously going to compare Informix on AIX performance with DB2 for OS/390 performance. But to give you a feeling for what you can expect, have a quick look at Figure 26. The figure shows two bars. Bar 1 visualizes the amount of time that it took to insert a day’s worth of sales data (27,340 rows) into the sales table at Informix. (The value was taken from the performance measurement experiment in Chapter 3: 3.4, “Performance Considerations for Capture Triggers” on page 55. Even though we have set up capture triggers for the Informix table during this case study, we want to eliminate the impact of change capture triggers for this comparison.) Bar 2, now, shows the time Apply for OS/390 needed to replicate the captured changes (27,340 rows) to DB2 for OS/390. This time bar is divided into two sections: • Section 1: Apply’s fetch phase, fetching the data from Informix/AIX into the Spill file on the host. • Section 2: Apply’s insert phase, inserting the change data from the spill file into the target table (through the target view). Remark: The start and the end of the insert phase was exactly measured by adding SQL statements (one of type B, one of type A) to the subscription set, that inserted the current timestamp into a separately created table. See Figure 26.
Case Study 1—Point of Sale Data Consolidation, Retail
163
140 sec
Inserts into *) Informix/AIX
Applying the Change Data to DB2 for OS/390
45 55 sec sec
INSERT Phase FETCH Phase *) Informix Insert Performance without change capture triggers
60 sec
120 sec
Figure 26. Replication Performance Remarks
As expected, the Inserts on the host are quicker that the Inserts on AIX! Even though you might consider this to be obvious, we would like to use this result to encourage you to invest some time on performance considerations before you decide about the platform of your central data store or data warehouse.
6.6 Moving from Test to Production The next major step will be to move the tested replication solution to the productive system. As a special consideration for a large data consolidation scenario, the replication source server configuration has to be rolled out to a considerable number of source databases. As we have seen, no manual interaction is necessary to initialize the replication environment once all replication definitions are created at the replication source and target server. Additionally, we are only dealing with one single and centrally located DataJoiner instance here, which will make this step easier. Summarizing, the following definitions have to be created or carried over from the test site to configure the production system: • One DataJoiner database for every Informix source server • The replication control tables within every replication source server database, both DataJoiner and Informix • One source table registration for every source database (which includes Informix triggers and inserts to replication control tables)
164
The IBM Data Replication Solution
• One subscription set for every source server • One subscription member for each subscription set • The productive target table, which has to be created manually • One target view for each subscription member The main issue will therefore be to clone the available setup information and all defined database objects (like change data tables or capture triggers) to meet the productive requirements. Mainly, two different strategies can be followed to achieve this cloning: • Strategy 1: DJRA provides a feature to re-generate DProp control information, by re-engineering inserts to the DProp control tables from existing definitions. This feature is called the PROMOTE feature (also referred to as the CLONE feature). It is recommended to use the promote function when carrying replication definitions over from a test to a production system, because all changes made to the replication control tables after the initial setup (for example, to tune the setup) will be caught by PROMOTE. • Strategy 2: Save all DJRA-generated or customized SQL scripts that were used to configure the test system. As an option, anonymize the scripts and generate new scripts from the anonymized examples when adding a new source server to the replication system. Objects that are unique for each productive instance are: • CONNECT statements (either to the source server or to the control server) • Non-IBM database names, which are referenced in SET PASSTHRU commands or CREATE NICKNAME statements • References to the replication source server, the replication target server, and the replication control server, that are named within the INSERT statements that configure the replication control tables. If separate procedures exist to create database objects for Informix and DB2/DataJoiner, divide the generated scripts into one DB2/DataJoiner part and one Informix part. Remark: You may notice that the pruning control trigger code changes with every new replication source table that is added to a non-IBM replication source server.
Case Study 1—Point of Sale Data Consolidation, Retail
165
Remark: Please note, that when this book was edited, DJRA’s PROMOTE feature did not yet support heterogeneous replication sources (non-IBM change data tables, capture triggers cannot be promoted so far). Please watch for future releases.
6.7 Some Background on Replicating from Multi-Vendor Sources Change capture for non-IBM databases, such as Informix (as demonstrated in this case study), Oracle, Microsoft SQL Server, and Sybase SQL Server, is achieved by creating capture triggers upon the multi-vendor replication source tables that queue all changes made against source tables in a change data table. This section contains an overall picture of all different kinds of triggers that are defined for non-IBM replication sources, and describes how the triggers interact to emulate all functions, that, for DB2 replication sources, are provided by DProp Capture.
6.7.1 Using Triggers to Emulate Capture Functions Considering a non-IBM replication source, triggers are used to emulate all the Capture functions. Capture triggers must be provided for the following three main Capture tasks: • Capture of changes (inserts, deletes, updates) • Pruning • Advancing the SYNCHPOINT column in the REGISTER table The change capture triggers will feed the Change Data tables at the multi-vendor replication source database. Providing compatibility with DB2 replication sources, capture triggers can be defined to capture both before and after images or after images only. Additionally, the DProp Capture feature to capture updates as delete-and-insert pairs can be emulated. Of course, triggers can be set up to capture either only certain columns of a replication source table or all the available columns. The pruning trigger is used to delete records which are no longer needed from the non-IBM replication source’s Change Data tables. Change Data table rows are no longer needed when all the Apply processes have replicated these records to the replication targets. The pruning trigger is defined on the pruning control table (within the non-IBM replication source database) and is invoked when Apply updates the pruning control table after successfully replicating a Subscription Set. Refer to Chapter 5.5.13.2, “How to Defer Pruning for Multi-Vendor Sources” on page 127 to see how to gain performance benefits by temporarily disabling the pruning trigger for non-IBM
166
The IBM Data Replication Solution
sources. Disabling the pruning trigger can be used to emulate Capture’s NOPRUNE runmode. The reg_synch trigger is used to advance the SYNCHPOINT value in the ASN.IBMSNAP_REGISTER table, for all the registered replication sources, before the Apply program accesses the REGISTER table to see if new changes are awaiting replication. For DB2 replication sources, Capture maintains the SYNCHPOINT column to provide Apply with performance hints. Apply uses this column to see which Change Data tables did receive new changes since the previous replication cycle. Apply can omit to open cursors for those tables that did not receive any new changes. We have seen when setting up the replication definitions for this case study that all the triggers are created natively within the non-IBM replication source database. The reg_synch trigger is defined when the control tables are created; the capture triggers are generated when a non-IBM table is defined as a replication source. 6.7.1.1 Change Capture Triggers Although the three kinds of triggers (capture, pruning, reg_synch) have to be in place to properly capture changes for non-IBM source tables, the most interesting ones are the change capture triggers. The change capture triggers feed the change data tables that Apply will use to replicate database changes to the replication target tables. Change capture triggers are always automatically generated for the three possible DML operations. The definition of a non-IBM table as a replication source, therefore, always results in the creation of three native change capture triggers: • One trigger for INSERT • One trigger for UPDATE • One trigger for DELETE
Change Capture Trigger Example The DDL for all the necessary triggers is automatically created by DJRA. Nevertheless, we want to give you a feeling of what the capture triggers look like. Although we discussed the setup of a replication system using Informix source servers in this case study, we want to show the DDL to create a change capture trigger for an Oracle replication source table this time, just to remind you that the same basic setup and the same replication techniques would be applicable to an Oracle environment as well. Figure 27 shows the
Case Study 1—Point of Sale Data Consolidation, Retail
167
DDL to create the capture trigger (AFTER INSERT) for the Oracle replication source table CHRIS.SALES. It is possible to define a naming scheme for heterogeneous triggers by customizing DJRA’s REXX user exit for replication sources, called SRCESVR.REX. We chose to name the insert trigger by adding the constant ’CD’ to the name of the replication source table. Doing so, the name of the insert trigger is CHRIS.ISALESCD.
-- create the insert trigger for CHRIS.SALES CREATE TRIGGER CHRIS.ISALESCD AFTER INSERT ON CHRIS.SALES FOR EACH ROW BEGIN INSERT INTO CHRIS.SALESCD ( ORADATE, LOCATION, COMPANY, SALESTXNO, ITEMNO, PIECES, OUT_PRC, TAX, SUPPLNO, IBMSNAP_COMMITSEQ, IBMSNAP_INTENTSEQ, IBMSNAP_OPERATION, IBMSNAP_LOGMARKER ) VALUES ( :NEW.ORADATE, :NEW.LOCATION, :NEW.COMPANY, :NEW.SALESTXNO, :NEW.ITEMNO, :NEW.PIECES, :NEW.OUT_PRC, :NEW.TAX, :NEW.SUPPLNO, LPAD (TO_CHAR(CHRIS.SGENERATOR001.NEXTVAL), 20 ,’0’), LPAD (TO_CHAR(CHRIS.SGENERATOR001.NEXTVAL), 20 ,’0’), ’I’, SYSDATE ); END;
Figure 27. Example of an Oracle Change Capture Trigger (Insert Trigger)
The trigger is defined to execute after each insert operation into the source table, and it inserts a new row into the Change Data table, named CHRIS.SALESCD. All the new column values, represented by :NEW., are used when inserting a row into the Change Data table. Note that an Oracle unique sequence generator is used to maintain the sequence columns (IBMSNAP_COMMITSEQ and IBMSNAP_INTENTSEQ) of the Change Data table. For other non-IBM databases, that do not provide a unique sequence generator, the sequence columns are populated by values derived from the current timestamp. Remark (Informix): Because Informix triggers have a limited length, Informix stored procedures, invoked by triggers, are used to perform the change capture activity.
168
The IBM Data Replication Solution
6.7.2 The Change Data Table for a Non-IBM Replication Source You may have already noticed that there is no unit-of-work table for non-IBM replication sources (there is always one for DB2 replication sources). The reason for this is that synchronous triggers only commit (or execute) when the source application commits. When the source application aborts or performs a rollback, no data is inserted into the Change Data tables. In the DProp terminology, we call such a Change Data table a Consistent Change Data (CCD) table. For experts: DProp Apply handles heterogeneous Change Data tables as non-complete, non-condensed internal CCD tables. Remark: DProp Capture, DB2’s log based change capture mechanism, reads the DB2 database log sequentially and as quickly as possible. Capture does not wait for transactions to commit or rollback. To ensure that only committed change data is replicated to the replication target tables, DProp Capture maintains a global unit-of-work table (ASN.IBMSNAP_UOW) that contains one record for every committed transaction. DProp Apply joins every Change Data table with the global unit-of-work table when replicating from a DB2 replication source. Using this technique, change data that has not yet been committed is hidden from the Apply process and therefore is not replicated.
6.7.3 How Apply Replicates the Changes from Non-IBM Sources When Apply finally accesses the replication source server to replicate the most recent changes, it transparently accesses all non-IBM database objects through the DataJoiner nicknames. Actually, Apply has no knowledge that the replication source tables and the change data tables are located at a non-IBM source server. Figure 28 graphically represents the sequence in which Apply accesses the control tables and the change data tables to fulfill its task:
Case Study 1—Point of Sale Data Consolidation, Retail
169
Business Applications
reg_synch
Insert Update
Source Table
Pruning Control
Reg_Synch
Register
Reg_Synch Nickname
Register Nickname
Delete
Prune CCD Table
Multi-Vendor Database
Source Nickname
CCD Nickname
Pruning Control Nickname
3
1
4
2
Other Control Tables
DataJoiner Database APPLY
To Target Figure 28. Replication from a Multi-Vendor Source Table
As the first action after Apply has connected to the DataJoiner database, Apply executes an SQL Before statement which updates the REG_SYNCH table (in Figure 28, this operation is marked as step 1). This only use of this update is to invoke the reg_synch trigger, which immediately updates the SYNCHPOINT column for all registered source tables in the register table (as previously explained). The SQL Before statement that updates the REG_SYNCH table is automatically added when creating a subscription set if the source server is a non-IBM database.
170
The IBM Data Replication Solution
Next, Apply queries the register table, as usual, to determine which change data table belongs to which registered source table. This is shown as step 2 in Figure 28. Still connected to the replication source server, Apply will subsequently fetch the most recent changes to the target server, which is shown as step 3. As we are dealing with multi-vendor sources here, the change data table is previously fed by change data triggers (assuming that the source table was changed since Apply accessed the source server before). After all changes have been applied to the target server, Apply reconnects to the source server to advance the status of the subscription with an update to the pruning control table, which is shown as step 4. Updates to the pruning control table will finally invoke the pruning control trigger (if it has not been disabled as described in 5.5.13.2, “How to Defer Pruning for Multi-Vendor Sources” on page 127) to prune all records from the change data table that were already replicated.
6.8 Summary We used case study 1 to give you a practical example for a data replication application, using: • Informix replication source servers • A DB2 for OS/390 replication target server • IBM DataJoiner as central database middleware • DProp Apply to actually move the data We satisfied the business requirements by demonstrating an easy-to-implement advanced technique that IBM DProp provides to consolidate distributed data into a single central table at the replication target site. We have additionally demonstrated that the initialization of the target table (in DProp terms: the full refresh) can be achieved without any manual interaction. Additional source servers can be added to an existing solution at any time. After focusing on the implementation of the test environment that was used to prove all techniques, we provided ideas on how to carry a tested replication application over from a test environment to a production environment. The final part of this chapter, showing change capture triggers at work, can be used as a reference see how the IBM replication solution integrates multi-vendor database systems into an enterprise-wide, cross-platform data replication application. (It’s really that easy!)
Case Study 1—Point of Sale Data Consolidation, Retail
171
172
The IBM Data Replication Solution
Chapter 7. Case Study 2—Product Data Distribution, Retail In this case study, we describe how centrally managed data can be distributed to databases in the branch offices using IBM’s Data Replication solution. We will utilize the following major techniques in IBM Data Replication Solution within this scenario to optimize the performance and the managability of the solution: • Replication from DB2 for OS/390 to Microsoft SQL Server • Source-Site Join-Views • Noncomplete, condensed internal CCDs • Two-tier versus three-tier approach • Pull configuration for enhanced replication performance • Data subsetting to distribute only the data relevant to each branch • Invoking stored procedures in the target database
7.1 The Business Problem A retail company that has a number of branches throughout the country wants to implement a new decentralized inventory management system for their branches. A local inventory management application in each branch needs to access information about the products, sold in the branch, including product line and brand information. It also needs access to the supplier data for the products. This information is managed centrally by applications in the company headquarters, which are based on DB2 for OS/390. The new inventory application in the branches, which will be implemented in Visual Basic, relies on Microsoft SQL Server as the data store. A TCP/IP connection exists between the OS/390 system at the company headquarters and the Windows NT systems in the branch offices, but no database connections are yet in place. Figure 29 shows the high-level system architecture:
© Copyright IBM Corp. 1999
173
OS/390
Company Headquarters
Central App’s. DB2 Data Sharing Group (DB2I)
TCP/IP
WinNT Branch 01 VB Inv.App. MS SQL Server
WinNT Branch 02 VB Inv.App. MS SQL Server
WinNT Branch 03 VB Inv.App. MS SQL Server
WinNT Branch n VB Inv.App. MS SQL Server
Figure 29. Case Study 2—High Level System Architecture
Two major approaches exist for the design of the new inventory application: 1. The inventory application accesses the required product and supplier information directly from the DB2 for OS/390 database at the company headquarters, using remote requests over the network link. 2. The application accesses a local copy of the required data held in the Microsoft SQL Server database (where all the other relevant data for the application is located as well). The first design approach has some serious disadvantages in this scenario: • Network outages between head offices and branches will directly affect the availability of the new inventory application. • The contention between the instances of the new inventory application in the branches and the central applications will have an impact on the performance of the central applications. • The network traffic will increase, which will result in higher network costs. • The performance of the local inventory application will be degraded due to remote database requests. These issues lead to the conclusion that the second design approach, where local copies of the relevant data are distributed to each of the branches, is more feasible. The only issue that has to be resolved for the second approach is: The distribution of copies of the data introduces redundancy into the system. Because the required data is not static, the redundancy has to be managed to
174
The IBM Data Replication Solution
keep all the copies up-to-date. The IBM Data Replication solution enables this design approach by managing the distribution and currency of the copies as well as the complexity of this process. The inventory application can access the local database and rely on the correctness of the data. As no history information is required at the branch offices, only the most current change to a source table record has to be replicated. In DProp terminology, this is called transaction-consistent replication. Note: The IBM Data replication solution supports both transaction-based replication and transaction-consistent replication. Transaction-based replication will propagate every update issued by every transaction, and transaction-consistent replication will only propagate the net results of the recent activity.
7.1.1 Source Data Model In Figure 30 we show the subset of the data model of the source database, which is relevant to our case study. Each branch will copy a subset of data from the headquarters database corresponding to the products sold at that particular branch.
Case Study 2—Product Data Distribution, Retail
175
.
Supplier
Store_item
Supp_no Supp_Name
Store_num Prodline_no
Store Store_Num CompNo Name Street City Zip Region_Id
Items Item_Num Desc Prod_Line_No Supp_No
Sales
BasartNo Date StoreNo Comapny Out_Prc Tax Location Pieces Transfer_Date Process_Date
ProdLine Prod_Line_Num Desc Brand_Num
Brand Brand_Num Desc
Figure 30. Partial Data Model for the Retail Company Headquarters
You can refer to Table 7 on page 206 for description of the tables. Only the STORE_ITEM table is not described there. This table holds information about the product lines sold in each store.
7.1.2 Target Data Model Figure 31 shows the partial data model in each branch of the retail company. Although more information is required by the inventory application, only the product domain of the target data model is shown, because this is sufficient for the understanding of the case study. The table S_PRODUCT holds the information about the products available at a particular branch. The table P_ITEMS holds the information about the number of ITEMS for each product line. For other table descriptions, please refer to Table 7 on page 206.
176
The IBM Data Replication Solution
Supplier Supp_no Supp_Name
S_Product Item_Num Desci Prod_Line_Num Supp_No
ProdLine Prod_Line_Num Desc Brand_Num
P_Items Prod_Line_Num Item_Count
Brand Brand_Num Desc
Figure 31. Partial Data Model for a Branch of the Retail Company
7.2 Architecting the Replication Solution In coordination with the structured approach proposed in Chapter 3, we discuss the architecture of the replication solution in terms of the system design and the replication design.
7.2.1 Data Distribution—System Design Data will be replicated from the company headquarters to each branch. Changes are captured from DB2 for OS/390, then replicated to the Microsoft SQL Server database. This is called a data distribution configuration. The target tables are read-only. Therefore, you do not need to set up conflict detection. Applications can use the target tables, which are local copies, so that they do not overload the network, and will make the load on the central server more managable. Refer to Figure 32.
Case Study 2—Product Data Distribution, Retail
177
Source Server
read/write source table
Target Server
read-only target table
Target Server
read-only target table
Figure 32. Data Distribution with Read-Only Target Tables
Since the target is a non-IBM database, the Apply program cannot connect to the Microsoft SQL Server directly. It will connect to a DataJoiner database instead (with DB2 DataJoiner connected to the Microsoft SQL Server) and will apply the changes to Microsoft SQL Server targets using DB2 DataJoiner nicknames. Apply issues INSERT, UPDATE or DELETE statements against the nicknames (which, for Apply and any other DB2 client application, appear to be DB2 objects), and DataJoiner passes the SQL statements transparently to the remote data sources. Figure 33 summarizes the foregoing explanation.
178
The IBM Data Replication Solution
APPLY
Apply requests DJ requests on behalf of Apply
Nickname 1
Target Table 1
Nickname n
Target Table n
Replication Control Tables
DataJoiner Database
Non-IBM Target Database
Figure 33. Replicating to Non-IBM Target Tables
7.2.1.1 How Many DataJoiner Instances—One or Several? According to the discussion in Chapter 3, we have two choices when we set up the connection and replication between the source and the targets. 1. One DataJoiner connected to mutiple branches (see Figure 34). Advantages of this implementation: • Ease of administration • Low cost (for example, license fees, roll out, administration) But there is also a disadvantage: • If the data volume is large, the replication performance will be poor, because the DataJoiner instance will become a bottleneck, and Apply will have to push all changes to the remote targets. (See Chapter 3 for a detailed discussion of this topic.)
Case Study 2—Product Data Distribution, Retail
179
OS/390
Company Headquarters DB2 for OS/390
WinNT/AIX DataJoiner
WinNT Branch 01
WinNT
MS SQL Server
MS SQL Server
Branch 02
WinNT
Branch 03
WinNT Branch n
MS SQL Server
MS SQL Server
Figure 34. One DataJoiner Connected to Multiple Store Servers
2. One DataJoiner instance at each branch office (see Figure 35). Advantage for this implementation: • Performance will be good, even when the data volume is large. Disadvantages: • Setup will be more complicated because several DataJoiner instances must be installed. • Higher cost due to the number of licenses.
OS/390
Company Headquarters DB2 for OS/390
WinNT DataJoiner
WinNT DataJoiner
WinNT DataJoiner
WinNT DataJoiner
Branch 01
Branch 02
Branch 03
Branch n
MS SQL Server
MS SQL Server
MS SQL Server
MS SQL Server
Figure 35. One DataJoiner for Each Branch Office
Since the data volume was acceptable in this case study, we chose the first solution.
180
The IBM Data Replication Solution
7.2.1.2 Apply Program Placement—Pull or Push? Logically, the Apply program could run on any server that has connectivity to source, target, and control server. That is, we can choose to run the Apply program at the source server (on the headquarters side), which is called a Push configuration, or at the target server (on the DataJoiner side), which is called a Pull configuration. 1. In a Push configuration, the Apply program for OS/390 connects to the headquarters source server (DB2 for OS/390) and retrieves the data. Then it connects to the remote DataJoiner server and pushes the updates to the target table in Microsoft SQL Server (through DataJoiner nicknames). In a Push configuration, the Apply program pushes the updates row by row, and cannot use DB2’s block fetch capability to improve network efficiency. The Push techniques are touted as reducing the overhead of having clients continually poll the server, looking to see if there is any new information to pull. This configuration will be sufficient when tables are infrequently updated. 2. In a Pull configuration, the Apply program is located at the DataJoiner server and connects to the remote DB2 for OS/390 to retrieve the data. DB2 can use block fetch to retrieve the data across the network efficiently. After all the data is retrieved, the Apply program connects to the DataJoiner database and applies the changes to Microsoft SQL Server through DataJoiner nicknames. In a Pull configuration, the Apply program can take advantage of the database protocol’s block fetch optimization. We select a Pull configuration here to gain advantage of the optimized performance compared to Push configurations.
7.2.2 Data Distribution—Replication Design After discussing the decisions regarding the general system design, we introduce the replication techniques used to fulfill the business requirements of this case study. 7.2.2.1 Data Subsetting The tables in the headquarter database are REGION, STORE, SALES, ITEMS, SUPPLIER, PRODLINE, BRAND, STORE_ITEM. The tables SUPPLIER, PRODLINE and BRAND will be propagated to each store completely. Regarding the ITEMS table, each store only needs the
Case Study 2—Product Data Distribution, Retail
181
product-related information that is relevant to the store. So we will use a row subsetting technique for the ITEMS table. Basically, row subsetting can be achieved by: • Defining a simple predicate (WHERE clause), if the replicated table contains the subsetting columns • Replicating a subset join, if the subsetting column is not part of the replicated table In our example, we had to choose row-subsetting using join views: 1. Create a join view based on STORE_ITEM table and ITEMS table. The view ’s definition is: CREATE VIEW DB2RES5.S_PRODUCT AS SELECT S.STORE_NUM, I.ITEM_NUM, I.DESC, I.PROD_LINE_NUM, I.SUPP_NO FROM LIYAN.STORE_ITEM S , LIYAN.ITEMS I WHERE S.PRODLINE_NO=I.PROD_LINE_NUM;
This view was defined on the OS/390, which is the replication source database. Refer to Figure 36.
ITEMS
STORE_ITEM
Item_num Desc Prod_line_no Supp_no
Store_num prodline_no
Two-tier Replication
S_PRODUCT (view) Store_num Item_num Desc Prod_line_no Supp_No
Store_num=01
Source
Store_num=nn
S_PRODUCT
S_PRODUCT
Item_num Desc Prod_line_no Supp_no
Store01
Item_num Desc Prod_line_no Supp_no
Storenn
Figure 36. Replication of the Product Information
182
The IBM Data Replication Solution
Target
Remark: Always specify the view’s schema name. Specify all the column names there, and also specify the correlation id after the table name, and use the id in the where-clause. Otherwise you will get an error message when you try to register the view as a replication source. 2. When you add a member to this subscription set, specify a where-clause: store_num = ...; 7.2.2.2 Replication Strategy—Two-Tier or Three-Tier? For the STORE_ITEM and ITEMS tables, since we will use a subsetting technique (join view first, then subsetting according to store_num), we will use the two-tier approach. For the other tables, BRAND, PRODLINE, and SUPPLIER which are all needed in each store with their whole contents, we will use internal CCDs to net out hot-spots while updating the source tables. This will reduce the number of rows that really need to be replicated, if the same record (same primary key) is updated several times within one replication cycle. So we will have a two-tier topology for tables STORE_ITEM and ITEMS, and a three-tier topology for the other tables, as shown in Figure 37.
Case Study 2—Product Data Distribution, Retail
183
Headquarters DB2 for OS/390 STORE_ITEM
ITEMS
SUPPLIER
Tier 1
S_PRODUCT View
Capture
APPLY
Middleware
APPLY Noncomplete condensed CCD Table (internal)
Tier 2
DJDB Nicknames
DataJoiner for NT
Tier 3 PRODUCT
SUPPLIER
MS SQL Server Store 01 NT
Figure 37. Three-Tier Replication Architecture
184
The IBM Data Replication Solution
PRODUCT
SUPPLIER
MS SQL Server Store 02 NT
About the CCD Tables We use condensed, noncomplete internal CCD tables in this case study, because condensed CCD tables ensure that only the net change for a row are replicated to the targets, and therefore it reduces the network load. Noncomplete CCD tables contain only the modified rows from the source table. The CCD table that we created for the SUPPLIER table is called CCDSUPP. We use internal CCDs to benefit from the following advantages: 1. The join between the CD tables and the UOW table will be performed only once, regardless of the number of subscriptions (stores). 2. "Hot spot" updates to the same row will be eliminated; using condensed CCD, only the last image of the row will be kept, and only the most current row image will propagate to the targets, not each and every change. 7.2.2.3 Invoking Stored Procedures at the Target Database After replicating the source data, the store offices also need some specific information for their applications: this information is to be calculated from the replicated data, and a good way to do this is to use a stored procedure. So we illustrate here the DProp ability to call a stored procedure before or after the processing of a Subscription Set. Together with DataJoiner’s capability of creating nicknames for remote non-IBM stored procedures, it is even possible to invoke a Microsoft SQL Server stored procedure with Dprop Apply. In this case study, we used the following technique to fulfill this task: 1. In the Microsoft SQL Server database, we created the following stored procedure in the target database: CREATE PROCEDURE compute_item AS delete from p_items insert into p_items select prod_line_num, count(item_num) from s_product group by prod_line_num
This stored procedure will aggregate the product numbers for each product line sold in the store. The first statement is used to clear the historic data. And the second part of the stored procedure computes the current aggregate data, then inserts into the aggregation table. Each time the Subscription Set is processed, this stored procedure is called.
Case Study 2—Product Data Distribution, Retail
185
2. Next, we created the stored procedure nickname in the DataJoiner database: Create Stored Procedure Nickname c_item for infodb1.dbo.compute_item;
3. Finally, we added the stored procedure to the Subscription set definition, using DJRA (see Figure 47 on page 200).
7.3 Setting Up the System Environment This section introduces the detailed steps and tips used to set up the test environment for this case study . Basically, we will work along the implementation checklist that is provided in Chapter 4.
7.3.1 The System Topology Figure 38 shows the topology of the environment for this case study.
186
The IBM Data Replication Solution
DB2 for OS/390 Data Sharing Group DB2I DProp
DProp
Apply
Capture OS/390
wtscpok
DProp
Apply DataJoiner V2.1
DJDB MS SQL Server Client Intel, WinNT Server MS SQL Server V7 infodb 1
MS SQL Server V7 infodb n
Intel, WinNT Server
Intel, WinNT Server
branch1
branch n
Figure 38. Case Study 2—System Topology
In Figure 38, you can identify the following major components: 1. The company headquarters—source site: • DB2 for OS/390 V5.1.1 • DProp Capture and Apply for MVS 2. The DataJoiner server: • DataJoiner for NT V2.1.1, including DJRA • Microsoft SQL Server V7.0 client code (ODBC driver)
Case Study 2—Product Data Distribution, Retail
187
3. The branches - target sites: • Microsoft SQL Server V7.0. In this environment, we use TCP/IP to connect all the systems.
7.3.2 Configuration Tasks This section documents the setup tasks, as we used them to prepare the systems for case study 2. This section refers to Chapter 4 for all general setup activities that are specific to this case study. 7.3.2.1 Setup the Database Middleware Server One of the NT servers used in this environment is used as IBM DataJoiner middleware server. We assume that all the Microsoft SQL Servers in the branches are already installed and running.
Set up DataJoiner to access the Microsoft SQL Servers Based on step 1 and step 2 of the implementation checklist that was introduced in Chapter 4, the first task is to enable native SQL Server connectivity between DataJoiner and all the Microsoft SQL Servers. The following steps were necessary to connect the DataJoiner database to all the SQL Server instances: • Install the Microsoft SQL Server Client Code on the DataJoiner machine. • Use SQL Server Client Network Utility on the DataJoiner server to set up the connections to each of the stores SQL Server databases. Use the General tab in this tool (see Figure 39), select TCP/IP as Default network library, then click the Add... button to add a server. You can refer to the Microsoft SQL Server manuals for more details.
188
The IBM Data Replication Solution
Figure 39. Configure Microsoft SQL Server Client Connectivity
To check the success of this configuration step we used the SQL Server Enterprise Manager to natively connect to all the SQL Server instances.
Prepare DataJoiner to Access the Microsoft SQL Server Systems The DataJoiner release that we used to set up this environment is IBM DB2 DataJoiner for NT 2.1.1 with PTF6. Register the SQL Server Databases as ODBC data sources.
Create your DataJoiner Database One single DataJoiner database can generally be used to access multiple backend database systems. CREATE DATABASE DJDB;
Create Server Mappings for Microsoft SQL Server Systems A server mapping specifies how DataJoiner will subsequently access a remote data source. connect to djdb; create server mapping from infodb1 to node "branch1" database "infodb1" type mssqlserver version 7.0 protocol "djxmssql";
Case Study 2—Product Data Distribution, Retail
189
Create Server Options When using DataJoiner as a gateway for DProp replication, we recommend that you always set the server option TWO_PHASE_COMMIT to NO. create server option TWO_PHASE_COMMIT for server infodb1 ’N’;
Create User Mappings DataJoiner user mappings are used to map DataJoiner userid’s and passwords to non-IBM database userid’s and passwords. create user mapping from grohres3 to server infodb1 authid "sa" Password "password";
Connect to the Replication Source Server (DB2 for OS/390) We used DataJoiner’s built-in DRDA requester to connect to the replication source server. The DRDA connectivity was implemented using the TCP/IP network protocol. • Open the hosts file on the DataJoiner server, and insert the following line: 9.12.14.555 wtscpok
• Make sure you can ping the host (wtscpok), and then execute the following steps from the DB2 command line processor: catalog tcpip node hostdb remote wtscpok server 33320; catalog database sj390db1 as sj390db1 at node hostdb authentication dcs; catalog dcs database sj390db1 as sj390db1;
• Test the connection: connect to sj390db1 user db2res5 using pwd;
• Bind the DB2 utilities to the data source: bind @ddcsmvs.lst blocking all grant public;
7.3.2.2 Implement the Replication Subcomponents This paragraph refers to the installation of DProp Capture and DProp Apply corresponding to the system topology outlined in Figure 37 on page 184.
Install and set up DProp Capture Install DProp Capture on the OS/390 source system. Refer to “Step 12: Install and Set Up DProp Capture, if Required” on page 71 for more details. Install and Setup DProp Apply Install DProp Apply on the OS/390 system to support the three-tier approach for the BRAND, PRODLINE, and SUPPLIER tables. You do not need to install DProp Apply on the DataJoiner server because Apply has already been installed with DataJoiner.
190
The IBM Data Replication Solution
Refer to , “Step 13: Install and Setup DProp Apply, if Required” on page 71 for additional information. 7.3.2.3 Set Up the Replication Administration Workstation Set up DJRA preferences, so that it can access the headquarters database (SJ390DB1) and the DataJoiner database (DJDB). Open DJRA, select File => Preferences, then click the Connection tab, and set the userid and password for the source and target. 7.3.2.4 Create the Replication Control Tables You must create Control tables at the Source Server and also at the DataJoiner server (which acts as the Control Server in this scenario).
Create the Control Tables at the Replication Source Use the DJRA function Create Replication Control Tables to create all the DProp control tables at the replication source server. Select SJ390DB1 in the panel, then generate the SQL script. Create the Control Tables at the DataJoiner Server Use the DJRA function Create Replication Control Tables to create all the DProp control tables in the DataJoiner database (see Figure 40).
Figure 40. Creating Replication Control Tables with DJRA
7.3.2.5 Bind DProp Capture and DProp Apply After the control tables are successfully created, we are able to bind DProp Capture and DProp Apply.
Bind DProp Capture Use the job provided with the Capture for OS/390 installation media to bind the Capture component against the source database.
Case Study 2—Product Data Distribution, Retail
191
Bind DProp Apply Use the job provided with the Apply for OS/390 installation media to bind Apply for OS/390 against the source database. You can refer to “Step 22: Bind DProp Apply” on page 75 for details. We must also bind the Apply component that is included in DataJoiner against the replication source server (SJ390DB1), all replication target servers (DJDB), and the replication control server (DJDB). At the DataJoiner instance, change the directory to SQLLIB\BND and use the following statements to bind Apply: Connect to SJ390DB1 user db2res5 using pwd; bind @applyur.lst isolation ur blocking all; bind @applycs.lst isolation cs blocking all; Connect to DJDB; bind @applyur.lst isolation ur blocking all; bind @applycs.lst isolation cs blocking all;
7.4 Implementing the Replication Design In this section we describe which replication sources and targets have to be defined to distribute the required data to the branch offices.
7.4.1 Define DB2 for OS/390 as Replication Source Use DJRA to generate and run the SQL script to define the OS/390 tables as replication sources (see Figure 41).
192
The IBM Data Replication Solution
Figure 41. Register Table ITEMS as a Replication Source
Remark: You must first register all the tables, before defining the join view as replication source, then register the S_PRODUCT view (see Figure 42). For the Column capture policy, select After-images only (Option: both before-images and after-images would be used in an auditing scenario, for example). For Update capture policy, if the souce tables’ primary key or partition key could be updated, then you would have to choose Updates as delete/insert pairs. Here we simply need Updates captured as updates. Since this case study is not an update-anywhere scenario, Conflict detection level has to be set to None here. Remark: If the Capture program is running while you are defining a new replication source, you will have to reinitialize Capture so that it takes the new registration into account.
Case Study 2—Product Data Distribution, Retail
193
Figure 42. Register Views as Replication Sources
7.4.2 Define Empty Subscription Sets In this case study, we create several subscription sets, some for the internal CCD tables (subscription set names: SETBRAND, SETSUPP, SETPROD) and another for the User Copy tables (target tables in SQL Server; subscription set name: SETLY). See Figure 43 on page 195 and Figure 44 on page 196. Logically you will create the Subscription Sets for the CCDs first, and then the Subscription Set for the User Copy tables. For the CCD Subscription Sets, the Apply Qualifier is AQCCD. For the Copy Tables Subscription Set, the Apply Qualifier is AQLY (the Apply Qualifier is used in the command to start Apply, and it is also used as part of the password-file name). Remember: The Apply Qualifier is case-sensitive!
194
The IBM Data Replication Solution
Figure 43. Use DJRA to Create Empty Subscription Set.
We also specify the time interval for this Subscription set as 1440 minutes, which means 24 hours. We can see from Figure 43 that there is another parameter named Blocking factor. The value you specify here will be the MAX_SYNCH_MINUTES value. If a blocking factor is specified, Apply takes this factor into account when selecting data from the change data tables (either CD or CCD). If the time span of queued transactions is greater than the numbers of minutes specified by MAX_SYNC_MINUTES, Apply will try to convert a single subscription cycle into many mini-cycles, cutting the backlog down to manageable pieces. But, in doing so, Apply will never cut transactions into pieces. A transaction is always replicated completely, or not at all. This reduces the stress on the network and DBMS resources and reduces the risk of failure. We specify the MAX_SYNC_MINUTES here as 6 hours. That is, every 24 hours, Apply will run, and it will try to subset the change data into 4 subsets. That Apply instance will slice the propagation backlog into several mini-subscriptions, each mini-subscription taking a maximum of 6 hours of changes. For performance considerations, we select the target database as the control database, because the Apply program runs in the same machine; this can save Apply from going to the network to access the subscription definitions.
Case Study 2—Product Data Distribution, Retail
195
When creating the subscription sets for the CCD tables, we use SJ390DB1 as the control server and the target server (see Figure 44).
Figure 44. Create Empty Subscription Sets for CCDs
7.4.3 Create a Password File Only the UNIX platforms and Windows platforms need a password file. It contains the following information: SERVER=SJ390DB1 USER=db2res5 PWD=pwd SERVER=DJDB USER=grohres3 PWD=pwd
Save the file as AQLYdb2DJDB.PWD in the directory where you will invoke the Apply program. AQLY is the value of Apply qualifier we defined in the previous step (see Figure 43 on page 195). If you want to start Apply as a Windows NT service, please refer to DB2 Replication Guide and Reference, SR5H-0999.
7.4.4 Add Members to the Subscription Sets First you add members to the CCDs subscription sets (see Figure 45).
196
The IBM Data Replication Solution
Figure 45. Add a Member to Subscription Sets
Note: In this step, the CCD table is an internal CCD; we used DJRA’s target table logic user exit to customize the create tablespace statements for the CCD table’s tablespace. The following is the DB2 for MVS part of the target table logic file: SAY "-- in TARGSVR.REX"; SUBLOGIC_TIME_SUFFIX=SUBSTR(TIME(’L’),4,2)||, SUBSTR(TIME(’L’),7,2)||, SUBSTR(TIME(’L’),10,2); SELECT WHEN SUBSTR(IN_TARGET_PRDID,1,3)="DSN" THEN; /* DB2 FOR MVS */ DO; /* CREATE A TABLESPACE FOR THE TARGET TABLE */ SAY "-- About to create a target table tablespace"; SAY "CREATE TABLESPACE TS"||SUBLOGIC_TIME_SUFFIX; SAY " IN SJ390DB1 SEGSIZE 4 LOCKSIZE PAGE CLOSE NO CCSID EBCDIC;"; OUT_TARGET_TABLESPACE="SJ390DB1.TS"||SUBLOGIC_TIME_SUFFIX; END
Case Study 2—Product Data Distribution, Retail
197
Once you have added the members to the CCDs subscription sets, you must add members to the User Copy subscription sets. Attention: The source tables you choose are always the real tables, not the CCDs. This would be different if you had defined external CCDs instead of internal CCDs, because in the case of external CCDs, it is the CCDs that are indicated as sources for the dependent target tables. So when you use internal CCDs, the CCDs are really transparent. You define them as targets, but you never refer to them afterwards. Apply will, of course, take the internal CCDs into account when servicing the subscriptions. When we add a member to a subscription set, we can specify a where-clause to indicate the subset of data we want to replicate to the target. Indicate a where-clause for the PRODUCT table (see Figure 46). Remark: You must not indicate the word where in the where-clause input field. The DJRA panel should look like this:
Figure 46. Data Subsetting
198
The IBM Data Replication Solution
In this step, we specify the target table qualifier as grohres3; it is a DataJoiner user, which is the SQL Server authentication id. You should pay attention to the following items in the generated SQL: The table it creates in the SQL Server database has the default schema "dbo", but DJRA will fetch the REMOTE_AUTHID from the SYSIBM.SYSREMOTEUSER table: "sa". In the generated SQL, it will use "sa" as the table schema when creating nicknames and indexes, so you should update the SQL script, and change "sa" to "dbo". If you can create a user with the same login id and username in Microsoft SQL Server, then there will be no need to update the SQL Script. Note: Since DESC is a reserved word in SQL Server, you cannot create a table with this column name in the SQL Server database (it will report an ODBC error 37000), so you should update the generated SQL manually, and update the target table column name. Refer to Appendix D.3, “Add a Member to Subscription Sets” on page 342.
7.4.5 Add Statements or Stored Procedures to Subscription Sets As you can see in Figure 47, we indicate that the stored procedure must be called after the data has been replicated. The native Microsoft SQL Server stored procedure developed in 7.2.2.3, “Invoking Stored Procedures at the Target Database” on page 185, is referenced by name (c_item).
Case Study 2—Product Data Distribution, Retail
199
Figure 47. Add Stored Procedure to Subscription Sets
7.4.6 Start DProp Capture and Apply on the Host Refer to the DB2 Replication Guide and Reference, S95H-0999" Capture and Apply for MVS", for a detailed description of how to operate these components in an OS/390 environment.
7.4.7 Start DProp Apply on the DataJoiner Server To do this, issue the following command from a Windows NT command window: asnapply AQLY DJDB
You also can specify a trace for the Apply program using the following command: asnapply AQLY DJDB trcflow;
This can help you when there is something wrong: You can get the error messages and sqlcode from the trace information. You can also record the trace information in a file by running the following command: asnapply AQLY DJDB trcflow > filename;
200
The IBM Data Replication Solution
7.5 Summary In this chapter we have shown one of the most common replication scenarios, a data distribution from a central site to a number of locations. We demonstrated how to use IBM’s replication solution to distribute data to Microsoft SQL Server databases. We showed the advantages of a pull configuration for Apply, as well as a three-tier configuration with an internal CCD table to fan out copies to the target systems without putting a burden on the data source. We demonstrated the powerful data subsetting capabilities, and we showed you how to invoke native stored procedures within the target database.
Case Study 2—Product Data Distribution, Retail
201
202
The IBM Data Replication Solution
Chapter 8. Case Study 3—Feeding a Data Warehouse This case study concentrates on the design and implementation of populating a data warehouse by using log based change capture from a central production system running On Line Transaction Processing (OLTP) types of applications. We intend to capture data changes from the production system using DProp Capture and maintain historic information within the data warehouse by using DProp Apply. The specific objectives of the case study are to demonstrate how DProp can be used in a data warehousing environment to: • Populate and maintain a data warehouse in a non-IBM database. • Show how join replication can be used to denormalize data. • Describe how temporal histories can be automatically maintained by DProp within the data warehouse. • Demonstrate how DProp can automatically maintain aggregations of data within the data warehouse. In this chapter we will also describe a technique for pushing down the replication status to a non-IBM database. This is not specifically a data warehousing issue, but it is, nevertheless, a useful trick.
8.1 The Business Problem In this case study, a retail company has a number of retail outlets which collect electronic point-of-sales (EPOS) information. The information is consolidated from the outlets to the head office in order to perform stock and order analysis. Stock is dispatched to the outlets from a central warehouse when product levels at that outlet fall below a certain threshold. New product items are also ordered from suppliers when stock within the warehouse is low. To remain competitive and streamline its operations the company wishes to analyze the sales data collected by the EPOS terminals. The company wishes to hold up to 2 years worth of sales and related data in the data warehouse. The production system currently only maintains the most recent 7 days worth of sales data. This new business intelligence (BI) application will enable the company to control their inventory more closely and manage their supply chain more
© Copyright IBM Corp. 1999
203
efficiently. In order to analyze trends and forecast the demand based on the sales data, the data warehouse must provide historic, time-consistent data. The retail company has decided to utilize an existing Oracle server to act as the data warehouse store. This server is located within the head office. Figure 48 summarizes the business environment of this organization.
Head Office DB2 for OS/390 Data Sharing Group
Stock Ordering and Distribution Application
Oracle Data Warehouse Server
Retail Outlet EPOS
Retail Outlet EPOS
. .. Retail Outlet EPOS
Figure 48. The Business Environment
The data warehouse is to be populated directly from the OS/390 production system. The production system also contains global store and product information which is not available in the outlets. This case study documents the process of establishing a replication environment which feeds an Oracle data warehouse from a production system running DB2 for OS/390. This data flow is shown with a bold arrow in Figure 48. Note: The replication techniques introduced in this case study will show solutions for some of the most common issues in populating data warehouses or data marts, and will be applicable for many other data warehousing situations.
204
The IBM Data Replication Solution
To understand the operations, transformations, and denormalization which are described in this case study, we must first of all understand the source data model along with the proposed data model of the data warehouse.
8.1.1 Source Data Model The Entity-Relationship (E-R) diagram in Figure 49 shows the data model for the source data which resides on the production DB2 for OS/390 database.
Supplier Supp_no Supp_Name
Store Store_Num CompNo Name Street City Zip Region_Id
Items Item_Num Desc Prod_Line_No Supp_No
Sales BasartNo Date Location Company Out_Prc Tax Pieces Transfer_date Process_Date
ProdLine Prod_Line_Num Desc Brand_Num
Region Region_Id Region_Name Contains_stores
Brand Brand_Num Desc
Figure 49. Data Model Diagram of Source Data
Case Study 3—Feeding a Data Warehouse
205
To minimize redundant data on the production system, the data is stored in a highly normalized form. The seven tables which compose the source data model are described briefly in Table 7. Table 7. Source Data Tables
Table Name
Description
Approx. Rows
Sales
The central table which records information relating to a particular sales transaction. Each record is 1 EPOS transaction.
790,000 (7 days worth of data)
Items
Contains 1 row for each product which the company sells.
38,000
Supplier
Holds information about the suppliers of the products.
6,500
ProdLine
Holds information relating to the product lines into which the products are grouped.
2,300
Brand
Holds information about the brands which the product lines are associated with.
469
Store
Contains information on the stores through which the company sells its products.
3,000
Region
Contains geographical information on the location of the stores.
41
8.1.2 Target Data Model The E-R diagram in Figure 50 shows the design of the data model for the target data warehouse within Oracle. The data model shows a typical data warehouse or data mart approach, which is suitable for multi-dimensional analysis.
206
The IBM Data Replication Solution
Suppliers Supp_Number Supp_Name Valid_From Valid_To
Outlets Store_Num CompNo Name Street City Zip Region_Id Region_Name Valid_From Valid_To
Products
Sales
BasartNo Sale_Date Pieces Out_Prc Tax Location Comapny Transfer_Date Process_Date
Item_Num Item_Description Prod_Line_Num Supp_Num Product_Line_Desc Brand_Num Brand_Description IBMSNAP_LOGMARKER EXPIRED_TIMESTAMP
Time Date Day_of_Year Day_of_Month Day_of_Week Week Month Quarter Year Holiday_Ind Season_Ind
Figure 50. Data Model Diagram of Target Data
The Valid_From and Valid_To columns in Outlets and Suppliers and the IBMSNAP_LOGMARKER and EXPIRED_TIMESTAMP columns in Products enable those tables to maintain temporal histories and will be created manually. For more detailed information refer to 8.4.6, “Adding Temporal History Information to Target Tables” on page 250. The data model shown in Figure 50 does not show all of the DProp control columns. These columns are added to target tables during the subscription process and are automatically maintained by DProp. IBMSNAP_LOGMARKER is shown because it is used by the data warehouse applications as the start of a record’s validity period.
Case Study 3—Feeding a Data Warehouse
207
Table 8 provides a brief description of the tables in the target data model: Table 8. Target Data Tables
Table Name
Description
Approx. Change Volume
Sales
The central fact table in a star schema which maintains a history of sales records over a period of 2 years.
120,000 rows per day
Products
The denormalization of the Item, ProdLine, and Brand source tables which maintains a history of changes to products and their attributes.
20 rows per week
Suppliers
A simple copy of the Supplier source table which maintains a history of changes to suppliers.
2 rows per week
Outlets
Denormalization of Store and Region source tables describing stores and their locations. This is actually a view defined over the Sales and Region tables which are replicated to the target as individual tables.
1 row per month
Time
Contains 1 row for each day in the analysis period. It is a standard dimension in nearly every multi-dimensional data model and is used to identify special date/time related information such as holidays or season indicators. It is also used to enable grouping of the analysis results according to weeks, months or quarters.
none
The data has been denormalized into a star schema with Sales as the central fact table and three dimensions for Products, Outlets and Time. Denormalization was performed in order to aid query performance. More complex data warehouses are likely to have more than three dimensions, but for the purpose of this study, three will suffice. Since the Time dimension table does not require any DProp replication definitions to be maintained, it will not be discussed further in this book. Section 8.4, “Implementing the Replication Design” on page 217 of this chapter, takes each of the target tables in turn and details the DProp steps necessary to assemble the target from the source tables provided. Once all
208
The IBM Data Replication Solution
these steps have been completed, the data model in the warehouse will resemble the one described in Figure 50 on page 207. However, first of all, we need to describe how the replication solution is architected.
8.2 Architecting the Replication Solution In this section, we will follow our structured approach, introduced in Chapter 3, and discuss the system and replication design in the context of the data warehouse scenario described above.
8.2.1 Feeding a Data Warehouse—System Design The following recommendations regarding the placement of the different system components apply to our data warehouse case study. Both the type and location of the replication source and replication target are fixed by business requirements. Since the source database is fixed, the placement of DProp Capture is also fixed: Capture must be co-located with the source database. The placement of all other components, such as DataJoiner, Apply and the Replication Administration Workstation are variable. The configuration shown in Figure 51 on page 214 co-locates Apply and DataJoiner on the same physical machine as Oracle. This configuration provides the maximal performance option since it reduces the number of network hops. In DProp terminology the topology of this environment is known as a pull configuration because Apply is co-located with the target and "pulls" the data from source to target. If Apply for DB2 on OS/390 was used in this configuration, it would "push" the data to the Oracle target. Pull configurations generally perform better than push configurations because in a pull configuration, individual rows from an SQL result set can be blocked and sent across the network in large chunks—this is generally more efficient than sending one row at a time across the network. Since data warehousing scenarios are likely to deal with large amounts of data, the optimal performance configuration was adopted for this business problem. The placement of the Replication Administration Workstation does not have any bearing on replication performance and can be located on any Windows 32-bit workstation.
Case Study 3—Feeding a Data Warehouse
209
8.2.2 Feeding a Data Warehouse—Replication Design This case study describes DProp procedures to implement four common data warehousing techniques. The four techniques described are: • Data transformations • Denormalization • Maintaining history information • Maintaining summary information These techniques are outlined below. Suggested methods of implementing them using DProp are described in the remainder of this chapter. 8.2.2.1 Data Transformations It is often necessary to transform and manipulate source data before placing it into a warehouse. These transformations change the data into a form which is more suitable for warehouse applications and business analysts. Transformations are also often used to translate cryptic codes (such as Region_Id) into more meaningful descriptive information (such as Region_Name). As a general rule of thumb, DProp can perform any data transformation which can be expressed in SQL by using views over source tables, staging tables or target tables. Alternatively, more complex transformations can be achieved by executing SQL statements or stored procedures (either DB2 or multi-vendor) at various stages during the subscription cycle. The SQL or stored procedure can operate against: • Any table at the replication source system, including replication source and change data tables. • Any table at the replication target system, including replication target tables. The SQL statements or stored procedures can be executed before or after the answer set is applied. 8.2.2.2 Denormalization Database systems used for on-line business critical applications are tuned for high volume transaction processing. Typically this requires the data to be in a highly normalized form (as shown in Figure 49 on page 205). This form is optimized for fast SQL insert and update transactions, but not for the selects which will be used in the warehouse environment. A common technique in data warehousing is therefore to hold the data in a denormalized form within the warehouse—thus facilitating faster response to queries. The process of introducing redundancy and structuring the data according to business needs instead of application needs is known as denormalization. 210
The IBM Data Replication Solution
Two techniques are described for denormalizing data within this chapter. They are: • Denormalization by using a target site join—see 8.4.3, “Using Target Site Views to Denormalize Outlet Information” on page 228. • Denormalization by using a source site join—see 8.4.4, “Using Source Site Joins to Denormalize Product Information” on page 237. Other techniques are available with DProp for denormalizing data. For example, creating views over staging tables or simulating outer joins. These techniques are not specifically covered in this chapter, but use the same type of procedures as those which are described in detail. 8.2.2.3 Maintaining History Information The ability to provide the foundation for historical data analysis is one of the major differentiators of business intelligence or data warehouse solutions as opposed to OLTP systems. Typically, the production database will only contain the most recent value for a particular entity (for example, the customer record, or supplier information for a product). It does not contain information on the value of a certain attribute as of two years ago, or information on how that entity has evolved over time (for example, information about how the packaging of a product has changed over time). In a data warehouse environment, keeping history information is essential in order to analyze trends over time. One approach to establish a record of history is to prevent deletes (and updates) in the production data from being replicated to the data warehouse environment. Instead, these events result in a new record associated with the type of operation and a timestamp of the event, being inserted into the data warehouse history table. This technique is especially well suited for audit trails, and for analyzing churn patterns in the data. DProp provides the so-called Consistent Change Data Tables (CCD Tables) as a solution for history tables of this type. See the DB2 Replication Guide and Reference, SR5H-0999 for a basic introduction on CCD tables. A more complex requirement for time-series is very common in relational snowflake or star-schema models for data warehouse or data mart solutions. It deals with modeling slowly-changing dimensions of such a data model. The structure of a star-schema or snowflake data model usually consists of a central fact table, which contains the measures to be analyzed for a particular business domain (for example, sales revenues, number of items ordered) and
Case Study 3—Feeding a Data Warehouse
211
a multipart key, that relates these measures to the dimension tables. The dimension tables hold the various selection and grouping criteria along which the analysis is performed (for example: product, supplier, or customer information). The fact table usually records events (such as a sale). An event is associated with a single date or timestamp. Events are inserted periodically (for example, daily or weekly) into the fact table, building a history of events over time. The attribute values recorded in the dimension tables (for example the supplier information for a product) are usually valid for a certain period of time. For example, product X was supplied by supplier A from 1997-02-01 to 1997-12-31. After this time period the supplier for product A was switched to supplier B. If we just propagated the update of the supplier information from the production system to the dimension table, we would implicitly associate the new supplier with all events in the fact table, even those recorded prior to the change (that is, before 1997-12-31). In section 8.4.6, “Adding Temporal History Information to Target Tables” on page 250 we show how DProp can be used to automatically maintain validity time periods for attributes in dimension tables, when an update to this attribute occurs in the production table. The validity time period is denoted by two date/timestamp columns, that show the beginning and the end of the validity of a specific value for that attribute. This technique enables correct time-based analysis in a multidimensional data model with facts and dimensions. It has many advantages over the traditional full extract and load technique of populating data warehouses. Some of these advantages are listed below: • A more granular validity period can be maintained by DProp. The granularity period for the extract and load technique is limited to the time between the source data snapshots used to populate the warehouse. The granularity of the DProp maintained period can be accurate to the sub-second and can easily be changed using SQL date and timestamp functions. • The extract and load technique involves keeping several snapshots of the data. These snapshots contain redundant data for all records which have not changed since the previous snapshot (thus consuming needless network bandwidth and disk space). With DProp, only the changes to the data are replicated.
212
The IBM Data Replication Solution
• The snapshot method does not reveal a complete history of the changes made at the source. If a record changes multiple times between snapshots, only the last change will be replicated. Therefore, a warehouse maintained from this technique will only provide partial history information, and does not tell the whole story. DProp will provide a complete history of all changes made at the source, maintaining a complete history. 8.2.2.4 Maintaining Summary Information Aggregate or summary tables contain summaries of information from the base table. The summaries can be calculated at the data warehouse when a query is issued, or they can be automatically maintained by the process which feeds the warehouse. If they are automatically maintained by the feeding process, then Business Intelligence applications which require the summary information can perform a simple table lookup (rather than an expensive query which may involve a table scan). The technique discussed in 8.4.7, “Maintaining Aggregate Information” on page 256 describes an advanced DProp technique for maintaining summary information with no impact on source tables.
8.3 Setting Up the System Environment This section details the steps required to setup the heterogeneous replication environment used in this case study. It uses the implementation checklist described in 4.3, “Setting up a Heterogeneous Replication System” on page 63. Only those steps from the checklist which require further explanation are described in detail below.
8.3.1 The System Topology To summarize the discussion about the placement of the system components, and to provide an overview of the decisions made so far, the system architecture of the solution is shown in Figure 51.
Case Study 3—Feeding a Data Warehouse
213
OS/390 Capture
Windows NT
DB2 for OS/390 V5.1.1
Replication Administration Workstation DJRA v2.1.1.140
DB2 CAE TCP/IP
TCP/IP
DDCS
Apply
RS/6000 J50 AIX v4.3.1
SQL*Plus
DataJoiner V2.1.1 Oracle V8.0.4 Net8
Data Warehouse
Figure 51. Case Study 3—System Topology
8.3.2 Configuration Tasks Let us now review the detailed steps necessary to implement the replication infrastructure for this case study, according to the checklist described in Chapter 4. 8.3.2.1 Set Up the Database Middleware Server DataJoiner requires Oracle’s Net8 (for Oracle8) or SQL*Net (for Oracle Version 7) client code to be installed on the same machine as DataJoiner before it can access Oracle databases. Refer to the Oracle Net8 Administrator’s Guide, A58230-01 for more information on how to install and configure Net8. Advice: Net8 only provides the communication between Oracle client and Oracle database server. It does not provide a command line interpreter where
214
The IBM Data Replication Solution
Oracle commands and SQL can be executed. This feature is provided by another Oracle product called SQL*Plus. It is a good idea to install SQL*Plus at the same time as Net8 so that native Oracle connectivity between client and server can be tested—see Appendix B.1.2, “Using Oracle’s SQL*Plus” on page 325 for more details. Once native connectivity has been verified, proceed to “Step 3—Install the DataJoiner software” of the implementation checklist. When installing the DataJoiner product on the AIX machine, remember to select the DProp Apply sub-component. After installing the DataJoiner code, create the Oracle data access module using the djxlink.sh shell script. For more details, refer to “Step 4—Prepare DataJoiner to access the remote data sources” of the Implementation Checklist. Now create the DataJoiner instance using db2icrt, and create the DataJoiner database that will be used to access the Oracle database. The following syntax was used to create the DataJoiner database for this case study: CREATE DATABASE djdb COLLATE USING IDENTITY WITH "DataJoiner database";
As recommended in the DataJoiner Planning, Installation and Configuration Guide for AIX, SC26-9145, the COLLATE USING parameter is set to IDENTITY. This is because replication will store binary data (CHAR(nn) FOR BIT data in some DProp control columns) at the remote data source. Once the DataJoiner database has been successfully created, configure DB2 database connectivity between DataJoiner and the DB2 for OS/390 subsystem which is to act as the replication source. Connectivity for this case study is established using DRDA over TCP/IP using the following node and database definitions: CATALOG TCPIP NODE DB2INODE REMOTE MVSIP SERVER 33320; CATALOG DCS DATABASE SJ390DB1 AS DB2I; CATALOG DATABASE SJ390DB1 AT NODE DB2INODE AUTHENTICATION DCS;
At the same time, configure DataJoiner to accept TCP/IP clients by updating the Database Manager Configuration. The Replication Administration Workstation will need to be able to connect to the DataJoiner database and will also use DataJoiner as a DRDA gateway to communicate with the DB2 for OS/390 database. Refer to “Step 8—Enable DB2 clients to connect to the DataJoiner databases” of the Implementation Checklist for more details. Now that all DB2 connectivity has been established and verified, configure connectivity from DataJoiner to Oracle. This connectivity is configured by
Case Study 3—Feeding a Data Warehouse
215
populating the DataJoiner catalog tables with information about the remote database. The steps below outline the procedure to configure this connectivity: 1. Identify the remote database to DataJoiner by creating a Server Mapping. The following Server Mapping was used to identify the Oracle target database in this case study: CREATE SERVER MAPPING FROM azovora8 TO NODE "azov" TYPE ORACLE VERSION 8.0.4 PROTOCOL "net8" CPU RATIO 1 IO RATIO 1 COMM RATE 16000000;
2. Tell DataJoiner what options to use when accessing the remote database. The following Server Options where set for this case study: CREATE SERVER OPTION fold_id FOR SERVER azovora8 SETTING ’n’; CREATE SERVER OPTION fold_pw FOR SERVER azovora8 SETTING ’n’; CREATE SERVER OPTION password FOR SERVER azovora8 SETTING ’y’;
The fold_id and fold_pw options tell DataJoiner not to fold the userid and password specified in the CREATE USER MAPPING statement to either upper or lower case. The password option informs DataJoiner that it must send a password when connecting to Oracle. 3. Use the CREATE USER MAPPING statement to map the DataJoiner user to a user who is valid at the remote database. The following user mapping is defined for this case study: CREATE USER MAPPING FROM USER TO SERVER azovora8 AUTHID simon PASSWORD pwd;
More details on configuring DataJoiner to access non-IBM databases can be found in the DataJoiner Planning, Installation and Configuration Guide, SC26-9150 for your platform and the DataJoiner Application Programming and SQL Reference Supplement, SC26-9148 . Now that all database connectivity has been configured and verified, we need to start implementing the DProp Capture and Apply components. 8.3.2.2 Implement the Replication Subcomponents (Capture, Apply) Install and configure Capture for MVS on the OS/390 source system. In addition, install Apply if it has not already been installed on the DataJoiner AIX machine. Refer to 4.4.2, “Implement the Replication Subcomponents (Capture, Apply)” on page 71 for more detailed information.
216
The IBM Data Replication Solution
8.3.2.3 Set Up the Replication Administration Workstation Choose a 32-bit Windows platform to act as the Replication Administration Workstation and configure DB2 connectivity from this workstation to the source DB2 for OS/390 system and the DataJoiner database. Install the DataJoiner Replication Administration software (DJRA) from the DataJoiner CD, or download the latest version of DJRA from the Web. This tool acts as the replication administration focal point for configuring and monitoring DProp replication. For more details, refer to “Step 16—Install DJRA, the DataJoiner Replication Administration software” of the Implementation Checklist. When DJRA has been installed, proceed to “Step 17—Set up DJRA to access the source and target databases” of the Implementation Checklist to enable DJRA to communicate with all databases within the replication scenario. In this case study, this means DB2 for OS/390 and DataJoiner (because DataJoiner is used to establish connectivity to the Oracle database). 8.3.2.4 Create the Replication Control Tables Use the Create Replication Control Tables feature of DJRA to create the replication control tables at the source DB2 for OS/390 sub-system. Use the same feature to create the replication control tables at the DataJoiner database which is to act as the replication control server (DJDB). See 4.4.4, “Create the Replication Control Tables” on page 74 for more information. 8.3.2.5 Bind DProp Capture and DProp Apply Use the instructions detailed in 4.4.5, “Bind DProp Capture and DProp Apply” on page 74 to bind Capture against the source database (SJ390DB1) and Apply against the source, target and control databases (SJ390DB1 and DJDB). Once the bind has completed, you are now ready to start defining replication sources (called registrations) and their associated targets (called replication subscriptions). A summary of the steps required to configure the replication is detailed in 8.4, “Implementing the Replication Design” on page 217.
8.4 Implementing the Replication Design The following summarizes the implementation steps required to configure the replication environment for this case study. The steps are then expanded and explained in more detail in the remainder of the chapter.
Case Study 3—Feeding a Data Warehouse
217
Each step described below generates an SQL script from DJRA. Once the script has been modified as described within the detailed sections of this chapter, and saved, it should be executed using the Run menu option from the DJRA output window. This step is not explicitly described within the detailed sections—it is assumed. 1. Define the SALES_SET subscription set 2. Maintain a change history for suppliers • Register the Supplier table • Subscribe to the Supplier table • Add temporal history support to the Supplier table • Hide DProp control columns 3. Use target site views to denormalize outlet information • Register the Store and Region tables • Subscribe to the Store and Region tables • Add temporal history support to the Store table • Create the denormalization view 4. Use source site joins to denormalize product information • Register the Items, ProdLine and Brand tables • Subscribe to the Products view • Add temporal history support to the Products table 5. Use a CCD target table to manage sales facts • Register the Sales table • Subscribe to the Sales table 6. Add temporal history information to target tables 7. Maintain aggregate information 8. Push down the replication status to Oracle 9. Perform initial load of data into the data warehouse DProp Capture must be started after the replication definitions have been created (steps 1 to 8), but before populating the data warehouse for the first time (step 9). DProp Apply may be started after the data has been loaded into the warehouse.
218
The IBM Data Replication Solution
Several different approaches were used to replicate the tables from source to target. The aim here was to show the flexible nature of DProp and to compare and contrast the different techniques available.
8.4.1 Defining the Subscription Set The first practical step to perform with DProp is to create the replication subscription set which will own all the individual subscriptions. The SALES_SET was created using the Create Empty Subscription Sets option of DJRA - as shown in Figure 52.
Figure 52. Create the SALES_SET Subscription Set
The subscription set timing has been defined to execute every 1440 minutes (that is, once every 24 hours) at midnight (presumably when there is little activity on the servers or network). Advice: Another option to control the timing of the replication is to use event based timing. See 3.3.2.3, “Advanced Event Based Scheduling” on page 53 for an example of how to use event based timing to execute your subscriptions once a day at midnight, on week days only. The SQL generated by DJRA can be seen in Appendix E.1, “Output from Define the SALES_SET Subscription Set” on page 347. The generated SQL was saved and then executed using the Run menu option from the DJRA output window.
Case Study 3—Feeding a Data Warehouse
219
8.4.2 Maintaining a Change History for Suppliers This section describes the approach adopted to replicate the Supplier table from the source DB2 for OS/390 system to the target Oracle database. The target table is required to maintain historical information about suppliers and provide the ability for time consistent queries. These requirements can be satisfied with DProp by specifying the target table to be a complete, non-condensed CCD with an additional column to record expiry timestamps (for time consistent queries). These attributes are summarized in Table 9: Table 9. Attributes of Supplier Target Table
Warehouse Attribute
DProp equivalent
Maintain historical information
Complete, non-condensed CCD target table.
Maintain temporal histories
Additional TIMESTAMP column to record the expiry date of a record and SQL After to maintain the validity period.
Figure 53 shows the relationship between the source and target Supplier tables.
220
The IBM Data Replication Solution
Supplier Supp_no Supp_Name
Source Target
Supplier(CCD)
Supp_no Supp_Name IBMSNAP_INTENTSEQ IBMSNAP_OPERATION IBMSNAP_COMMITSEQ IBMSNAP_LOGMARKER EXPIRED_TIMESTAMP
Suppliers(view) Supplier_Number Supplier_Name Valid_From Valid_To
Figure 53. Transformation of Supplier Table
All columns pre-fixed with IBMSNAP are DProp control columns which are required and are automatically maintained by Apply for CCD target tables. A view named Suppliers will be created to hide these control columns from warehouse users and also to rename the IBMSNAP_LOGMARKER and EXPIRED_TIMESTAMP columns to more meaningful names (see 8.4.2.4, “Hiding DProp Control Columns” on page 228 for more details). 8.4.2.1 Register the Supplier Table The registration definition for the Supplier table was generated using the DJRA Define One Table as a Replication Source function (see Figure 54). Business intelligence (BI) applications generally only require After-images of columns to be replicated because most target warehouse tables will maintain histories of changes to the source table (and therefore already hold the Before-image in a separate record). Throughout this case study only After-images are replicated from source to target.
Case Study 3—Feeding a Data Warehouse
221
Figure 54. Define Supplier Table as a Replication Source
Disable Automatic Full Refresh The SQL generated by DJRA was modified to disable the automatic full refresh of the target table by setting ASN.IBMSNAP_REGISTER.DISABLE_REFRESH=1. This can be seen in the following code segment, which is taken from the DJRA generated SQL: -- insert a registration record into ASN.IBMSNAP_REGISTER INSERT INTO ASN.IBMSNAP_REGISTER(GLOBAL_RECORD,SOURCE_OWNER, SOURCE_TABLE,SOURCE_VIEW_QUAL,SOURCE_STRUCTURE,SOURCE_CONDENSED, SOURCE_COMPLETE,CD_OWNER,CD_TABLE,PHYS_CHANGE_OWNER,PHYS_CHANGE_TABLE, DISABLE_REFRESH,ARCH_LEVEL,BEFORE_IMG_PREFIX,CONFLICT_LEVEL, PARTITION_KEYS_CHG) VALUES(’N’,’ITSOSJ’,’SUPPLIER’, 0 , 1 ,’Y’,’Y’, ’ITSOSJ’,’CDSUPPLIER’,’ITSOSJ’,’CDSUPPLIER’, 1 ,’0201’,NULL,’0’,’N’);
For a complete listing of the SQL used for registering Supplier, see Appendix E.2, “Output from Register the Supplier Table” on page 348. Advice: Alternatively, the DJRA generated SQL could be executed unmodified and full refresh disabled at a later date using the SQL update statement: UPDATE ASN.IBMSNAP_REGISTER SET DISABLE_REFRESH=1 WHERE SOURCE_OWNER=’ITSOSJ’
This approach has the advantage that multiple registrations can have refresh disabled with a single SQL statement (in the example above, all those tables registered and owned by ITSOSJ), but suffers from the drawback that the
222
The IBM Data Replication Solution
change is not documented in the original SQL scripts. The preferred approach is to modify the generated SQL scripts before execution. After the modifications have been made and saved, the file was executed from the DJRA output window using the Run menu option, thus defining the Supplier table as a replication source. Advice: Full refresh is a costly process which involves direct access to source tables and not change capture from the logs. Ideally, it will happen only once, when the target is populated for the first time. It is a common procedure to disable this facility. Full refresh was also disabled in this particular case because we are maintaining a history (non-condensed CCD) table at the target. During an automatic full refresh, Apply issues a DELETE FROM TARGET_TABLE to remove all the rows from the target before it starts to copy the new data from the source table. This would destroy all the history information which had been maintained at the target—something we obviously want to avoid because we cannot regenerate this information. Therefore, for non-condensed CCD tables it is essential that full refresh be disabled. With full refresh disabled, the administrator must synchronize Capture and Apply before change capture replication can be enabled. This can be done either manually or by using the Off-line load option of DJRA. Refer to 8.4.9, “Initial Load of Data into the Data Warehouse” on page 261 for more details on performing this synchronization. If automatic full refresh was enabled, then Apply would automatically perform the full refresh and synchronize itself with Capture when it is started. 8.4.2.2 Subscribe to the Supplier Table Supplier is added to the SALES_SET Subscription Set created in 8.4.1, “Defining the Subscription Set” on page 219 by using the Add a Member to Subscription Set function of DJRA. Figure 55 shows this definition. Although the current version of DJRA does not allow the direct creation of CCD tables at non-IBM targets, it is possible to work around this by editing the generated SQL prior to execution. Future versions of DJRA may well support this function directly.
Case Study 3—Feeding a Data Warehouse
223
Figure 55. Subscription Definition for Supplier Table
Note the following from the subscription definition shown in Figure 55: • In this case, the Target table qualifier field is SIMON. This specifies the user and schema who will own the target CCD table in Oracle. This must be an already existing Oracle user. Advice: When creating a target table at a non-IBM database, the target table qualifier field must be set to a DataJoiner user who has a user mapping defined to the remote server where the target table is to be created. DJRA uses the remote authid from this user mapping to determine the schema and owner of the remote table. This is not the case when creating CCD tables in non-IBM targets because we are fooling DJRA into thinking the CCD table will be in the local DataJoiner database. Essentially we have to perform the mapping ourselves by specifying an existing Oracle user who will own the CCD table. If the mapping is not correct, the CREATE TABLE statement will fail during the subscription definition with the following error message: SQL0204N "SQLNET object: Unknown " is an undefined name. SQLSTATE=42704
• Target structure should be a CCD table and the DataJoiner non-IBM target server should be (None). DJRA will issue a message and will not generate
224
The IBM Data Replication Solution
any SQL if CCD and a non-IBM target are specified. We will edit the generated SQL in order to create the CCD in Oracle. • At this point, no primary key should be defined. Non-condensed CCD tables by definition cannot have the same primary key columns as the source table because they contain multiple records with the same source key values (the history). Indexes may be added to warehouse tables to improve performance of applications and queries. • Use the Setup button next to Target structure in order to set the properties of the CCD. We would like the target history table to start off as a complete copy of the source table and then to grow over time as it maintains historical information. Therefore, the target should be a complete, non-condensed CCD table - Figure 56 shows this definition. For more information on CCD table attributes, refer to the DB2 Replication Guide and Reference, SR5H-0999. Note: The Setup button is only available on DJRA versions 2.1.1.140 and later. If you are using an earlier version of DJRA, then you will have to edit the generated SQL to ensure that the following condition has been set: ASN.IBMSNAP_SUBS_MEMBR.TARGET_CONDENSED=’N’.
Figure 56. CCD Table Attributes for Supplier
Case Study 3—Feeding a Data Warehouse
225
Advice: It is also possible to use the CCD properties window to include additional DProp control columns (from the ASN.IBMSNAP_UOW table) in the target warehouse. For example, the IBMSNAP_AUTHID column in the UOW table maintains information about the userid who performed the SQL operation which changed the source table. This can be replicated to a target table and can be useful for audit and tracking purposes.
Create CCD Table in Oracle As previously mentioned, it is necessary to modify the SQL generated for this subscription so that the CCD table is created at the Oracle target and not in the DataJoiner database. To achieve this, modify the SQL as follows: 1. Comment out the CREATE TABLESPACE statement. Since the table will be created in Oracle and not DataJoiner, there is no need to create a DataJoiner tablespace to hold the target table. 2. Change the tablespace name in the CREATE TABLE statement to the DataJoiner Server Mapping for the Oracle database where the data warehouse will exist (in our example AZOVORA8). A DataJoiner nickname will automatically be created as part of the create table process. This uses a feature of DataJoiner called Data Definition Language (DDL) transparency which was introduced in DataJoiner V2.1.1. For more information about the DDL transparency feature of DataJoiner, refer to the Create Table statement in the DataJoiner Application Programming and SQL Reference Supplement, SC26-9148. 3. Modify the CREATE TABLE statement to add the EXPIRED_TIMESTAMP column to support temporal histories - see 8.4.6, “Adding Temporal History Information to Target Tables” on page 250 for more information. The SQL script segment below shows these modifications in bold: --* If you don’t see: -- now done interpreting REXX logic file --* TARGSVR.REX, then check your REXX code --* -- in TARGSVR.REX -- About to create a target table tablespace --CREATE TABLESPACE TSSUPPLIER MANAGED BY DATABASE USING (FILE --’/data/djinst5/djinst5/SUPPLIER.F1' 2000); -- now done interpreting REXX logic file TARGSVR.REX -- The target table does not yet exist -- Not remote to DataJoiner target -- Create the target table SIMON.SUPPLIER CREATE TABLE SIMON.SUPPLIER(SUPP_NO DECIMAL(7 , 0) NOT NULL,SUPP_NAME CHAR(25) NOT NULL,IBMSNAP_INTENTSEQ CHAR(10) FOR BIT DATA NOT NULL, IBMSNAP_OPERATION CHAR(1) NOT NULL,IBMSNAP_COMMITSEQ CHAR(10) FOR BIT DATA NOT NULL,IBMSNAP_LOGMARKER TIMESTAMP NOT NULL, EXPIRED_TIMESTAMP TIMESTAMP) IN AZOVORA8;
226
The IBM Data Replication Solution
For a complete listing of the SQL used to create the Supplier subscription, see Appendix E.3, “Output from Subscribe to the Supplier Table” on page 349. Advice: If using a version of DJRA earlier than 2.1.1.140, then the generated SQL would also have to be modified to remove the auto-registration of the CCD. This is the SQL insert into the ASN.IBMSNAP_REGISTER table at the end of the generated SQL. Failure to remove this record would result in SQL return code -30090, reason code 18 when Apply attempts to replicate the data to the target. This is because Apply thinks the CCD table is in DataJoiner and is attempting to update both the CCD table (in Oracle) and the Register table (in DataJoiner) in the same Unit Of Work (UOW). 8.4.2.3 Add Temporal History Support to the Supplier Table To maintain temporal history information for the Supplier target table, use the technique described in 8.4.6, “Adding Temporal History Information to Target Tables” on page 250. The specific SQL After statements used to maintain temporal histories for the Supplier table are: UPDATE SIMON.SUPPLIER A SET EXPIRED_TIMESTAMP = ( SELECT MIN(IBMSNAP_LOGMARKER) FROM SIMON.SUPPLIER B WHERE A.SUPP_NO = B.SUPP_NO AND A.EXPIRED_TIMESTAMP IS NULL AND B.EXPIRED_TIMESTAMP IS NULL AND (B.IBMSNAP_INTENTSEQ > A.IBMSNAP_INTENTSEQ)) WHERE A.EXPIRED_TIMESTAMP IS NULL AND A.IBMSNAP_OPERATION IN (’I’,’U’); UPDATE SIMON.SUPPLIER A SET EXPIRED_TIMESTAMP = ( SELECT B.IBMSNAP_LOGMARKER FROM SIMON.SUPPLIER B WHERE A.SUPP_NO = B.SUPP_NO AND B.IBMSNAP_OPERATION = ’D’ AND B.IBMSNAP_INTENTSEQ = A.IBMSNAP_INTENTSEQ) WHERE A.EXPIRED_TIMESTAMP IS NULL AND A.IBMSNAP_OPERATION = ’D’;
Use these statements in conjunction with the information described in 8.4.6, “Adding Temporal History Information to Target Tables” on page 250 to support temporal histories within the Supplier target table.
Case Study 3—Feeding a Data Warehouse
227
8.4.2.4 Hiding DProp Control Columns The following SQL was used to create a view in Oracle which hides the DProp CCD control columns not required by data warehousing applications from the end users: CREATE VIEW SIMON.SUPPLIERS AS SELECT S.SUPP_NO as SUPPLIER_NUMBER, S.SUPP_NAME as SUPPLIER_NAME, S.IBMSNAP_LOGMARKER as VALID_FROM, S.EXPIRED_TIMESTAMP as VALID_TO FROM SIMON.SUPPLIER S;
The view definition was stored in a file and executed directly from SQL*Plus. For some useful hints on using SQL*Plus, see Appendix B.1.2, “Using Oracle’s SQL*Plus” on page 325. Advice: IBMSNAP_LOGMARKER was not hidden (just renamed) because it is used as the starting timestamp of the records validity period.
8.4.3 Using Target Site Views to Denormalize Outlet Information This section describes the approach taken to replicate and denormalize the information held in the Store and Region tables into a single entity called Outlets. Outlets describes the relationship between individual stores and the regions in which they operate. The technique used in this case is to copy the individual source tables to target tables and perform denormalization through a view at the target site. This approach is adopted in this case study to compare and contrast the technique with the one discussed in 8.4.4, “Using Source Site Joins to Denormalize Product Information” on page 237. Performing the join at the target and not the source also alleviates the source system from having to perform join operations against base and CD tables (as discussed in 8.4.4, “Using Source Site Joins to Denormalize Product Information” on page 237). To understand the replication techniques used for Store and Region, we first have to understand the applications which work on the source data. For the Region table: • A record is inserted into the table when a new region is added. • There are no deletes from the Region table. When a region no longer contains any stores, the region information is maintained in the table and the CONTAINS_STORES flag is updated with an ’N’. • No other columns in the table are updated.
228
The IBM Data Replication Solution
Over a period of time, one particular region will always be uniquely identified by a certain Region_Id (although it may not contain any stores at that particular point in time). We also know that an update to the CONTAINS_STORES column would involve either the creation of the first store in that region, or the removal of the last store within that region. Therefore, Region is updated only in association with a change in the Store table. By analyzing the source application behavior, we can now define a replication solution which will denormalize the data and allow for time-consistent queries by replicating Region as a Point In Time (PIT) and Store as a CCD. The resultant join view between the two tables at the target will provide a history of changes to Store and Region. This view is called Outlets. The data warehouse attributes and their DProp equivalents for the Store and Region target tables are summarized in Table 10. Table 10. ’Attributes of Store and Region Target Tables
Warehouse attribute
DProp equivalent—Store
DProp equivalent—Region
Maintain historical information
Complete, non-condensed CCD target table.
Only replicating inserts to PIT will generate a history table.
Maintain temporal histories
Additional TIMESTAMP column to record expiry date of a record and SQL After to maintain the validity period.
Since a change in Region is always associated with a change in Store, maintaining expiry information for Store will be sufficient.
Denormalize data in tables
Denormalize at the target by a join view with Region.
Denormalize at the target by a join view with Store.
Figure 57 summarizes the relationship between the source and target Store and Region tables and their denormalization through a target site view.
Case Study 3—Feeding a Data Warehouse
229
Store
Store_Num CompNo Name Street City Zip Region_Id
Region Region_Id Region_Name Contains_Stores
Source Target
Store(CCD)
Store_Num CompNo Name Street City Zip Region_Id IBMSNAP_INTENTSEQ IBMSNAP_OPERATION IBMSNAP_COMMITSEQ IBMSNAP_LOGMARKER EXPIRED_TIMESTAMP
Region(PIT) Region_Id Region_Name IBMSNAP_LOGMARKER
Outlets(view)
Store_Num CompNo Name Street City Region_Id Region_Name Valid_From Valid_To
Figure 57. Transformation of Store and Region into Outlets
Outlets is a view defined over the Store and Region target tables. For details on the view definition, please refer to 8.4.3.4, “Create the Denormalization View” on page 236. Although in this case the target table type for Region is PIT, by analyzing the application behavior and only replicating inserts we will actually create a target table which maintains historic information (because records are only appended to it).
Replicating Inserts Only Replication of only the inserts for the Region table can be achieved by using the following predicate in the subscription definition for that table: IBMSNAP_OPERATION =’I’
230
The IBM Data Replication Solution
IBMSNAP_OPERATION is a column within the CD table which is automatically generated and maintained by Capture. It contains: • I — if the operation against the source record was an SQL INSERT. • U — if the operation against the source record was an SQL UPDATE. • D — if the operation against the source record was an SQL DELETE. Advice: There are three simple approaches for removing unwanted records from the target history table: 1. The first approach, described here, is to place a predicate on the subscription definition that prevents the unwanted records from replicating. This is probably the simplest method, but also means that full refresh for the source must be disabled. 2. The second approach would be to replicate the unwanted records, and simply create a view at the target which does not include these records. This has the disadvantage of replicating unwanted records, which would consume network resources and CPU cycles. However, if at a later date deletes are required in the target history, then this method simply requires the target view to be redefined. 3. The third approach is the most flexible: Define a view over the source table and register this view as a source for replication. However, before executing the generated SQL for this registration, modify the CREATE VIEW statement for the change data view in the generated SQL and add the predicate IBMSNAP_OPERATION='I'. This way, the subscription does not even know that the filtering is occurring and all the subscriptions will be simpler once the source is set up this way. This CD-view technique also works for both full refresh and differential refresh because the predicate is defined on the CD table view and subsequently will not be applied to the source during a full refresh. 8.4.3.1 Register the Store and Region Tables Create the registration definitions for the source Store and Region tables using the DJRA Define Multiple Tables as Replication Sources function (see Figure 58).
Case Study 3—Feeding a Data Warehouse
231
Figure 58. Registration of Store and Region Tables
Modify the generated SQL to disable full refresh (by setting DISABLE_REFRESH=1 in the Register table) for the Region and Store tables. The changed segments of the generated SQL are shown below: -- insert a registration record into ASN.IBMSNAP_REGISTER INSERT INTO ASN.IBMSNAP_REGISTER(GLOBAL_RECORD,SOURCE_OWNER, SOURCE_TABLE,SOURCE_VIEW_QUAL,SOURCE_STRUCTURE,SOURCE_CONDENSED, SOURCE_COMPLETE,CD_OWNER,CD_TABLE,PHYS_CHANGE_OWNER,PHYS_CHANGE_TABLE, DISABLE_REFRESH,ARCH_LEVEL,BEFORE_IMG_PREFIX,CONFLICT_LEVEL, PARTITION_KEYS_CHG) VALUES(’N’,’ITSOSJ’,’REGION’, 0 , 1 ,’Y’,’Y’, ’ITSOSJ’,’CDREGION’,’ITSOSJ’,’CDREGION’, 1 ,’0201’,NULL,’0’,’N’); -- insert a registration record into ASN.IBMSNAP_REGISTER INSERT INTO ASN.IBMSNAP_REGISTER(GLOBAL_RECORD,SOURCE_OWNER, SOURCE_TABLE,SOURCE_VIEW_QUAL,SOURCE_STRUCTURE,SOURCE_CONDENSED, SOURCE_COMPLETE,CD_OWNER,CD_TABLE,PHYS_CHANGE_OWNER,PHYS_CHANGE_TABLE, DISABLE_REFRESH,ARCH_LEVEL,BEFORE_IMG_PREFIX,CONFLICT_LEVEL, PARTITION_KEYS_CHG) VALUES(’N’,’ITSOSJ’,’STORE’, 0 , 1 ,’Y’,’Y’, ’ITSOSJ’,’CDSTORE’,’ITSOSJ’,’CDSTORE’, 1 ,’0201’,NULL,’0’,’N’);
The complete listing of the SQL executed to define these registrations can be found in Appendix E.4, “Output from Register the Store and Region Tables” on page 351. Automatic full refresh for Region is disabled because we are going to define a predicate in the subscription definition for Region, which prevents SQL updates from replicating. This predicate refers to the IBMSNAP_OPERATION column, which only exists in the CD table for Region. During full refresh,
232
The IBM Data Replication Solution
Apply would attempt to enforce this predicate directly against the source Region table. Since the Region table does not contain this column, the full refresh would fail with error message: SQL0206N "" is not a column in an inserted table....
Full refresh for the Store table is also disabled so that the historical information held in this table does not get lost during a full refresh from the source table. For details on how the Store and Region tables where loaded into the target database, refer to 8.4.9, “Initial Load of Data into the Data Warehouse” on page 261. 8.4.3.2 Subscribe to the Store and Region Tables Create and add both subscription definitions for Store and Region to the SALES_SET using the DJRA Add a Member to Subscription Set feature. The DJRA window used for defining the Region subscription member is shown in Figure 59.
Figure 59. Defining the Region Subscription Member
Note the following from the subscription definition shown in Figure 59:
Case Study 3—Feeding a Data Warehouse
233
• The Target table qualifier is DJINST5. This user must have a DataJoiner User Mapping defined to map the DataJoiner user DJINST5 to a valid Oracle user. In this case, a User Mapping exists which maps DataJoiner user DJINST5 to Oracle user SIMON. Therefore, DJRA will use SIMON as the schema name for the target table when it is created in Oracle. • Target structure is PIT. This type of target table is directly supported by DJRA to non-IBM targets. • The Source primary key is selected (Region_Id). PIT target tables must have a primary key (unlike CCD tables). • The IBMSNAP_OPERATION=’I’ predicate is added to ensure that updates are not replicated. The generated SQL from the DJRA tool for the Region subscription can be found in Appendix E.5, “Output from Subscribe to the Region Table” on page 353. The DJRA window used to define the Store replication subscription can be seen in Figure 60.
Figure 60. Subscription Definition for Store Table
234
The IBM Data Replication Solution
Note the following from the subscription definition shown in Figure 60: • Again, we are creating the CCD table in Oracle, and therefore the Target table qualifier field is SIMON. This is the user and schema who will own the target CCD table in Oracle. It must be an already existing Oracle user. Refer to advice on page 224 for more information on setting Target table qualifier. • Target structure is set to CCD and the DataJoiner non-IBM target server must be (None). The Setup button was also used to specify the CCD properties to be non-condensed, complete. • No primary key should be defined for the target CCD table.
Create CCD Table in Oracle Modify the generated SQL in order to add temporal history support and create the target CCD in the Oracle database. These modifications are shown below and are similar to those described in “Create CCD Table in Oracle” on page 226: --* If you don’t see: -- now done interpreting REXX logic file --* TARGSVR.REX, then check your REXX code --* -- in TARGSVR.REX -- About to create a target table tablespace -- CREATE TABLESPACE TSSTORE MANAGED BY DATABASE USING (FILE --’/data/djinst5/djinst5/STORE.F1’ 2000); -- now done interpreting REXX logic file TARGSVR.REX -- The target table does not yet exist -- Not remote to DataJoiner target -- Create the target table SIMON.STORE CREATE TABLE SIMON.STORE(COMPNO DECIMAL(3 , 0) NOT NULL,STORE_NUM DECIMAL(3 , 0) NOT NULL,NAME CHAR(25) NOT NULL,STREET CHAR(25) NOT NULL,ZIP DECIMAL(5 , 0) NOT NULL,CITY CHAR(20) NOT NULL,REGION_ID DECIMAL(3 , 0) NOT NULL,IBMSNAP_INTENTSEQ CHAR(10) FOR BIT DATA NOT NULL,IBMSNAP_OPERATION CHAR(1) NOT NULL,IBMSNAP_COMMITSEQ CHAR(10) FOR BIT DATA NOT NULL,IBMSNAP_LOGMARKER TIMESTAMP NOT NULL, EXPIRED_TIMESTAMP TIMESTAMP) IN AZOVORA8;
The full listing of the SQL used to define the Store subscription can be found in Appendix E.6, “Output from Subscribe to the Store Table” on page 355. 8.4.3.3 Add Temporal History Support to the Store Table Temporal history information only needs to be maintained for the Store table because a change in Region is always associated with a change in Store. To maintain temporal history information for this table use the technique described in 8.4.6, “Adding Temporal History Information to Target Tables” on page 250.
Case Study 3—Feeding a Data Warehouse
235
The specific SQL After statements used to maintain temporal information for the Store table are: UPDATE SIMON.STORE A SET EXPIRED_TIMESTAMP= ( SELECT MIN(IBMSNAP_LOGMARKER) FROM SIMON.STORE B WHERE A.STORE_NUM = B.STORE_NUM AND A.COMPNO = B.COMPNO AND A.EXPIRED_TIMESTAMP IS NULL AND B.EXPIRED_TIMESTAMP IS NULL AND (B.IBMSNAP_INTENTSEQ > A.IBMSNAP_INTENTSEQ)) WHEREA.EXPIRED_TIMESTAMP IS NULL AND A.IBMSNAP_OPERATION IN (’I’,’U’); UPDATE SIMON.STORE A SET EXPIRED_TIMESTAMP= ( SELECT B.IBMSNAP_LOGMARKER FROM SIMON.STORE B WHERE A.STORE_NUM = B.STORE_NUM AND A.COMPNO = B.COMPNO AND B.IBMSNAP_OPERATION = ’D’ AND B.IBMSNAP_INTENTSEQ = A.IBMSNAP_INTENTSEQ) WHEREA.EXPIRED_TIMESTAMP IS NULL AND A.IBMSNAP_OPERATION = ’D’;
Use these statements in conjunction with the information described in 8.4.6, “Adding Temporal History Information to Target Tables” on page 250 to support temporal histories within the Store target table. 8.4.3.4 Create the Denormalization View A view is created at the Oracle target database to make it easier for the business analyst to access the store and region information. The view performs a join between the Store and Region tables on the REGION_ID column. Because the tables are relatively small, performance of the data warehouse will not be affected. The following view definition was saved to a file and then executed from Oracle’s SQL*Plus: CREATE VIEW simon.outlets AS SELECT s.store_num, s.compno, s.name, s.street, s.city, s.region_id, r.region_name, s.ibmsnap_logmarker as valid_from, s.expired_timestamp as valid_to FROM simon.store s, simon.region r WHERE s.region_id = r.region_id;
To execute the file in SQL*Plus, start Oracle SQL*Plus and then type:
236
The IBM Data Replication Solution
start file_name
8.4.4 Using Source Site Joins to Denormalize Product Information This section describes the approach taken to replicate and denormalize the information held in the Item, ProdLine and Brand tables into a single target table called Products. Products contains information about individual products, the line which the product belongs to, and the brand of that product. The technique used is to create a view at the source site which performs the denormalization. This view is then used as the source for replication, the target table being the materialization of the source view. Since we would like to maintain historic data at the target, the target table type should be a non-condensed, complete CCD table. These attributes are summarized in Table 11. Table 11. Replication Attributes of Items, ProdLine and Brand Tables
Warehouse attribute
DProp equivalent
Maintain historical information
Complete, non-condensed CCD target table.
Maintain temporal histories
Additional TIMESTAMP column to record expiry date of a record and SQL After to maintain the temporal information.
Denormalize data in Items, ProdLine and Brand
Create a source site view which performs the denormalization and register this as a source for replication.
Figure 61 shows the relationship between the three source tables, the Products source site view, and the target CCD table.
Case Study 3—Feeding a Data Warehouse
237
Items
Item_Num Desc Prod_Line_Num Supp_No
ProdLine Prod_Line_Num Desc Brand_Num
Brand Brand_Num Desc
Products(view)
Item_Num Item_Description Prod_Line_Num Product_Line_Desc Supplier_Num Brand_Num Brand_Description
Source Target
Products(CCD)
Item_Num Item_Description Prod_Line_Num Product_Line_Desc Supplier_Num Brand_Num Brand_Description IBMSNAP_INTENTSEQ IBMSNAP_OPERATION IBMSNAP_COMMITSEQ IBMSNAP_LOGMARKER EXPIRED_TIMESTAMP
Figure 61. Transformation of Items, ProdLine and Brand
In order to be able to perform change replication from a source site view, DProp requires that the base tables which compose the view and the view itself be registered as sources for replication. The registration of the component base tables are straightforward registrations. The registrations of the Products view automatically performs the following: 1. Creates a view which joins the CD table of ProdLine with the base tables of Items and Brand and registers this view as a source for replication with SOURCE_VIEW_QUAL=1. 2. Creates a view which joins the CD table of Brand with the base tables of Items and ProdLine and registers this view as a source for replication with SOURCE_VIEW_QUAL=2.
238
The IBM Data Replication Solution
3. Creates a view which joins the CD table of Items with the base tables of ProdLine and Brand and registers this view as a source for replication with SOURCE_VIEW_QUAL=3. This can be seen in the generated SQL in Appendix E.8, “Output from Register the Products View” on page 361. DProp Apply will use these views to determine the change data to replicate to the target. Each of these views joins one CD table with all other base tables from the original view. Therefore, when Apply is serving this subscription cycle, it will be accessing the source tables directly (and joining these with CD tables). This is an important fact to consider when replicating from a source site view because DProp is no longer working purley from log based change capture, but is also accessing base tables directly. This may impact the performance of the source applications. 8.4.4.1 Register the Items, ProdLine and Brand Tables The first step when defining a view as a replication source is to define the base tables within the view as replication sources. Therefore, we have to register Items, Prodline and Brand as replication sources, using the Define Multiple Tables as Replication Sources feature of DJRA (see Figure 62).
Figure 62. Defining Multiple Base Tables as Replication Sources
Alter the generated SQL to prevent full refresh of all the base tables by setting ASN.IBMSNAP_REGISTER.DISABLE_REFRESH=1. The modified sections of the code can be seen below:
Case Study 3—Feeding a Data Warehouse
239
-- insert a registration record into ASN.IBMSNAP_REGISTER INSERT INTO ASN.IBMSNAP_REGISTER(GLOBAL_RECORD,SOURCE_OWNER, SOURCE_TABLE,SOURCE_VIEW_QUAL,SOURCE_STRUCTURE,SOURCE_CONDENSED, SOURCE_COMPLETE,CD_OWNER,CD_TABLE,PHYS_CHANGE_OWNER,PHYS_CHANGE_TABLE, DISABLE_REFRESH,ARCH_LEVEL,BEFORE_IMG_PREFIX,CONFLICT_LEVEL, PARTITION_KEYS_CHG) VALUES(’N’,’ITSOSJ’,’BRAND’, 0 , 1 ,’Y’,’Y’, ’ITSOSJ’,’CDBRAND’,’ITSOSJ’,’CDBRAND’, 1,’0201’,NULL,’0’,’N’); COMMIT; -- Disabled FULLREFRESH by setting DISABLE_REFRESH=1 above. -- insert a registration record into ASN.IBMSNAP_REGISTER INSERT INTO ASN.IBMSNAP_REGISTER(GLOBAL_RECORD,SOURCE_OWNER, SOURCE_TABLE,SOURCE_VIEW_QUAL,SOURCE_STRUCTURE,SOURCE_CONDENSED, SOURCE_COMPLETE,CD_OWNER,CD_TABLE,PHYS_CHANGE_OWNER,PHYS_CHANGE_TABLE, DISABLE_REFRESH,ARCH_LEVEL,BEFORE_IMG_PREFIX,CONFLICT_LEVEL, PARTITION_KEYS_CHG) VALUES(’N’,’ITSOSJ’,’ITEMS’, 0 , 1 ,’Y’,’Y’, ’ITSOSJ’,’CDITEMS’,’ITSOSJ’,’CDITEMS’, 1 ,’0201’,NULL,’0’,’N’); COMMIT; -- Disabled FULLREFRESH by setting DISABLE_REFRESH=1 above. -- insert a registration record into ASN.IBMSNAP_REGISTER INSERT INTO ASN.IBMSNAP_REGISTER(GLOBAL_RECORD,SOURCE_OWNER, SOURCE_TABLE,SOURCE_VIEW_QUAL,SOURCE_STRUCTURE,SOURCE_CONDENSED, SOURCE_COMPLETE,CD_OWNER,CD_TABLE,PHYS_CHANGE_OWNER,PHYS_CHANGE_TABLE, DISABLE_REFRESH,ARCH_LEVEL,BEFORE_IMG_PREFIX,CONFLICT_LEVEL, PARTITION_KEYS_CHG) VALUES(’N’,’ITSOSJ’,’PRODLINE’, 0 , 1 ,’Y’,’Y’, ’ITSOSJ’,’CDPRODLINE’,’ITSOSJ’,’CDPRODLINE’, 1 ,’0201’,NULL,’0’,’N’); COMMIT; -- Disabled FULLREFRESH by setting DISABLE_REFRESH=1 above.
The complete SQL used to register the Items, ProdLine and Brand tables is in Appendix E.7, “Output from Register the Items, ProdLine, and Brand Tables” on page 357.
Define and Register the Products View The view which denormalizes the Items, Prodline, and Brand tables is called Products and has the following definition: CREATE VIEW DB2RES5.PRODUCTS AS SELECT I.ITEM_NUM, SUBSTR(I.DESC,1,40) AS ITEM_DESCRIPTION, I.PROD_LINE_NUM, P.DESC as PRODUCT_LINE_DESC, I.SUPP_NO as SUPPLIER_NUM, P.BRAND_NUM, B.DESC as BRAND_DESCRIPTION FROM ITSOSJ.ITEMS I, ITSOSJ.PRODLINE P, ITSOSJ.BRAND B WHERE I.PROD_LINE_NUM=P.PROD_LINE_NUM AND P.BRAND_NUM=B.BRAND_NUM;
Advice: Remember to use correlation ids when creating the view. When registering the view as a replication source DJRA parses the view definition
240
The IBM Data Replication Solution
and expects correlation ids to be present. If correlation ids are not used, DJRA will not be able to parse the view definition and the registration will fail. The view uses the SQL SUBSTR function to perform some data manipulation on the DESC column on the source system. The view was created using SPUFI on the OS/390 source system. Once the view has been created, it can be registered as a replication source using the Define DB2 Views as Replication Sources function in DJRA (shown in Figure 63).
Figure 63. Defining a DB2 View as a Replication Source
As with the registration of the base tables, the generated SQL is modified to disable full refresh for the ProductsA, ProductsB and ProductsC views (shown below): -- register the base and change data views for component INSERT INTO ASN.IBMSNAP_REGISTER(GLOBAL_RECORD,SOURCE_OWNER, SOURCE_TABLE,SOURCE_VIEW_QUAL,SOURCE_STRUCTURE,SOURCE_CONDENSED, SOURCE_COMPLETE,CD_OWNER,CD_TABLE,PHYS_CHANGE_OWNER,PHYS_CHANGE_TABLE, DISABLE_REFRESH,CCD_OWNER,CCD_TABLE,CCD_OLD_SYNCHPOINT,SYNCHPOINT, SYNCHTIME,CCD_CONDENSED,CCD_COMPLETE,ARCH_LEVEL,BEFORE_IMG_PREFIX, CONFLICT_LEVEL,PARTITION_KEYS_CHG) VALUES(’N’,’DB2RES5’,’PRODUCTS’, 1 , 1 ,’Y’,’Y’,’DB2RES5’,’PRODUCTSA’,’ITSOSJ’,’CDPRODLINE’, 1 ,NULL,NULL, NULL,NULL,NULL ,NULL,NULL,’0201’,NULL,’0’,’N’); -- register the base and change data views for component 2 INSERT INTO ASN.IBMSNAP_REGISTER(GLOBAL_RECORD,SOURCE_OWNER, SOURCE_TABLE,SOURCE_VIEW_QUAL,SOURCE_STRUCTURE,SOURCE_CONDENSED, SOURCE_COMPLETE,CD_OWNER,CD_TABLE,PHYS_CHANGE_OWNER,PHYS_CHANGE_TABLE,
Case Study 3—Feeding a Data Warehouse
241
DISABLE_REFRESH,CCD_OWNER,CCD_TABLE,CCD_OLD_SYNCHPOINT,SYNCHPOINT, SYNCHTIME,CCD_CONDENSED,CCD_COMPLETE,ARCH_LEVEL,BEFORE_IMG_PREFIX, CONFLICT_LEVEL,PARTITION_KEYS_CHG) VALUES(’N’,’DB2RES5’,’PRODUCTS’, 2 , 1 ,’Y’,’Y’,’DB2RES5’,’PRODUCTSB’,’ITSOSJ’,’CDBRAND’, 1 ,NULL,NULL, NULL,NULL,NULL ,NULL,NULL,’0201’,NULL,’0’,’N’); -- register the base and change data views for component 3 INSERT INTO ASN.IBMSNAP_REGISTER(GLOBAL_RECORD,SOURCE_OWNER, SOURCE_TABLE,SOURCE_VIEW_QUAL,SOURCE_STRUCTURE,SOURCE_CONDENSED, SOURCE_COMPLETE,CD_OWNER,CD_TABLE,PHYS_CHANGE_OWNER,PHYS_CHANGE_TABLE, DISABLE_REFRESH,CCD_OWNER,CCD_TABLE,CCD_OLD_SYNCHPOINT,SYNCHPOINT, SYNCHTIME,CCD_CONDENSED,CCD_COMPLETE,ARCH_LEVEL,BEFORE_IMG_PREFIX, CONFLICT_LEVEL,PARTITION_KEYS_CHG) VALUES(’N’,’DB2RES5’,’PRODUCTS’, 3 , 1 ,’Y’,’Y’,’DB2RES5’,’PRODUCTSC’,’ITSOSJ’,’CDITEMS’, 1 ,NULL,NULL, NULL,NULL,NULL ,NULL,NULL,’0201’,NULL,’0’,’N’);
A complete listing of the SQL used to register the Products view as a replication source can be found in Appendix E.8, “Output from Register the Products View” on page 361. 8.4.4.2 Subscribe to the Products View The subscription definition for the Products target table uses a similar technique to that described in 8.4.2.2, “Subscribe to the Supplier Table” on page 223. Figure 64 shows the DJRA function used to add Products to the SALES_SET subscription set.
242
The IBM Data Replication Solution
Figure 64. Add Products View Subscription Definition
Note the following from the subscription definition shown in Figure 64: • Only the DB2RES5.PRODUCTS view needs to be selected as a source for replication. DJRA hides all the complexity of the base tables and generated views at this point. • Once again, because the target CCD is to be created in Oracle, the Target table qualifier field is set to SIMON. This is the user and schema who will own the target CCD table in Oracle. It must be an existing Oracle user. Refer to Advice on page 224 for more information on setting the Target table qualifier. • The DataJoiner non-IBM target is defined as (None). CCDs are not directly supported by DJRA to non-IBM targets. We will modify the generated SQL prior to execution in order to create the CCD table in Oracle. • No primary key is defined initially.
Create CCD Table in Oracle The SQL generated from the above DJRA definition is modified in a similar way to that described in “Create CCD Table in Oracle” on page 226. You can find the complete version of the SQL script in Appendix E.9, “Output from
Case Study 3—Feeding a Data Warehouse
243
Subscribe to the Products View” on page 362. Only the modified components are shown below: --* If you don’t see: -- now done interpreting REXX logic file --* TARGSVR.REX, then check your REXX code --* -- in TARGSVR.REX -- About to create a target table tablespace -- CREATE TABLESPACE TSPRODUCTS MANAGED BY DATABASE USING (FILE --’/data/djinst5/djinst5/PRODUCTS.F1’ 2000); -- now done interpreting REXX logic file TARGSVR.REX -- The target table does not yet exist -- Not remote to DataJoiner target -- Create the target table SIMON.PRODUCTS CREATE TABLE SIMON.PRODUCTS(ITEM_NUM DECIMAL(13 , 0) NOT NULL, ITEM_DESCRIPTION CHAR(40) NOT NULL,PROD_LINE_NUM DECIMAL(7 , 0) NOT NULL,PRODUCT_LINE_DESC CHAR(30) NOT NULL,SUPPLIER_NUM DECIMAL(13 , 0) NOT NULL,BRAND_NUM DECIMAL(7 , 0) NOT NULL,BRAND_DESCRIPTION CHAR(30) NOT NULL,IBMSNAP_INTENTSEQ CHAR(10) FOR BIT DATA NOT NULL, IBMSNAP_OPERATION CHAR(1) NOT NULL,IBMSNAP_COMMITSEQ CHAR(10) FOR BIT DATA NOT NULL,IBMSNAP_LOGMARKER TIMESTAMP NOT NULL, EXPIRED_TIMESTAMP TIMESTAMP) IN AZOVORA8; --* Commit work at target server DJDB --* COMMIT;
8.4.4.3 Add Temporal History Support to the Products Table Use the SQL After statements below in conjunction with the information described in 8.4.6, “Adding Temporal History Information to Target Tables” on page 250 to support temporal histories within the Products target table. UPDATE SIMON.PRODUCTS A SET EXPIRED_TIMESTAMP= ( SELECT MIN(IBMSNAP_LOGMARKER) FROM SIMON.PRODUCTS B WHERE A.ITEM_NUM = B.ITEM_NUM AND A.EXPIRED_TIMESTAMP IS NULL AND B.EXPIRED_TIMESTAMP IS NULL AND (B.IBMSNAP_INTENTSEQ > A.IBMSNAP_INTENTSEQ)) WHEREA.EXPIRED_TIMESTAMP IS NULL AND A.IBMSNAP_OPERATION IN (’I’,’U’); UPDATE SIMON.PRODUCTS A SET EXPIRED_TIMESTAMP= ( SELECT B.IBMSNAP_LOGMARKER FROM SIMON.PRODUCTS B WHERE A.ITEM_NUM = B.ITEM_NUM AND B.IBMSNAP_OPERATION = ’D’ AND B.IBMSNAP_INTENTSEQ = A.IBMSNAP_INTENTSEQ) WHEREA.EXPIRED_TIMESTAMP IS NULL AND A.IBMSNAP_OPERATION = ’D’;
244
The IBM Data Replication Solution
8.4.5 Using a CCD Target Table to Manage the Sales Facts This section describes the configuration used to replicate the Sales table from DB2 for OS/390 to the Oracle warehouse. The Sales table on the OS/390 contains daily records of all the transactions made within each of the stores. It holds these records for a period of 7 days. Each evening, a batch job is run to remove all records which are older than 7 days. By the nature of the source application, there will never be any updates made to the Sales table. SQL inserts are performed to record each sale transaction, and SQL deletes are performed in batch to remove records for housekeeping purposes. The batch deletes should not be replicated because they are only being performed for housekeeping purposes and have no significance within the warehouse (this will also help to reduce by half the number of changes made to the Sales table that are replicated). Since only inserts are copied, the Sales table can be replicated to either a PIT or CCD target and still maintain history information. A PIT target table would save space and take less network bandwidth compared to a CCD table because a CCD table has the overhead of maintaining three additional DProp control columns. However, a PIT target table requires a primary key, and one is not readily definable on the target table because the uniqueness of a row cannot be guaranteed even using all target columns. Therefore we have chosen to make the target table a CCD table. Advice: An alternative approach would be to generate some kind of artificial key on the target table and then replicate to a PIT target (the IBMSNAP_INTENSEQ DProp control column could be used as the artificial key in this case). This is a valid work-around in this example because there are no updates or deletes replicated to the target. If updates and/or deletes were replicated, Apply would attempt to use this artificial key to identify the rows in the target to update/delete. This would fail because the artificial key does not exist in the source table. Table 12 summarizes the attributes of the Sales target table: Table 12. Replication Attributes of Sales Table
Warehouse attribute
DProp equivalent
Maintain historical information.
Complete, non-condensed CCD target table.
Do not replicate batch delete from source to target.
Apply predicate to Sales subscription to prevent deletes from replicating.
Case Study 3—Feeding a Data Warehouse
245
Notice that temporal histories do not need to be maintained in the Sales table. Temporal histories are often required on the dimension tables of a star schema within a warehouse, but not on the central fact table (which usually records events). In this case, the event is a sale and the date of the sale is recorded in the SALE_DATE column. Figure 65 below shows the relationship between the source and target Sales tables.
Sales
Date BasArtNo Location Company StoreNo Pieces Out_Prc Tax Transfer_Date Process_Date
Source Target
Sales(CCD)
Sale_Date BasArtNo Location Company Pieces Out_Prc Tax Transfer_Date Process_Date IBMSNAP_INTENTSEQ IBMSNAP_OPERATION IBMSNAP_COMMITSEQ IBMSNAP_LOGMARKER
Figure 65. Transformation of Sales
8.4.5.1 Register the Sales Table Since the Sales table will be the most volatile table, it was decided to only Capture and replicate the columns which are required in the warehouse. Figure 66 shows the Define One Table as a Replication Source window of DJRA used to register the Sales table.
246
The IBM Data Replication Solution
Figure 66. Registration Definition for Sales Table
Edit the generated SQL to disable full refresh. Also modify the CREATE TABLESPACE statement to create a large DB2 for OS/390 tablespace with enough primary and secondary storage to hold the large amounts of change data expected for the Sales table. The modified SQL can be seen below: -- in SRCESVR.REX, about to create a change data tablespace --CREATE TABLESPACE TSSALES -- IN SJ390DB1 SEGSIZE 4 LOCKSIZE TABLE CLOSE NO CCSID EBCDIC; CREATE TABLESPACE TSSALES IN sj390db1 SEGSIZE 4 LOCKSIZE TABLE CLOSE NO CCSID EBCDIC USING STOGROUP SJDB1SG2 PRIQTY 180000 SECQTY 5000; --* Source table DB2RES5.SALES already has CDC attribute, no need to --* alter. -- selecting ’X’ as the before-image prefix character -- create the cd/ccd table for DB2RES5.SALES CREATE TABLE DB2RES5.CDSALES(IBMSNAP_UOWID CHAR(10) FOR BIT DATA NOT NULL,IBMSNAP_INTENTSEQ CHAR(10) FOR BIT DATA NOT NULL, IBMSNAP_OPERATION CHAR(1) NOT NULL,DATE DATE NOT NULL,BASARTNO DECIMAL( 13 , 0) NOT NULL,LOCATION DECIMAL(4 , 0) NOT NULL,COMPANY DECIMAL(3 , 0) NOT NULL,PIECES DECIMAL(7 , 0) NOT NULL,OUT_PRC DECIMAL(11 , 2) NOT NULL,TAX DECIMAL(11 , 2) NOT NULL,TRANSFER_DATE TIMESTAMP NOT NULL, PROCESS_DATE TIMESTAMP NOT NULL) IN SJ390DB1.TSSALES ; -- insert a registration record into ASN.IBMSNAP_REGISTER INSERT INTO ASN.IBMSNAP_REGISTER(GLOBAL_RECORD,SOURCE_OWNER, SOURCE_TABLE,SOURCE_VIEW_QUAL,SOURCE_STRUCTURE,SOURCE_CONDENSED, SOURCE_COMPLETE,CD_OWNER,CD_TABLE,PHYS_CHANGE_OWNER,PHYS_CHANGE_TABLE, DISABLE_REFRESH,ARCH_LEVEL,BEFORE_IMG_PREFIX,CONFLICT_LEVEL, PARTITION_KEYS_CHG) VALUES(’N’,’DB2RES5’,’SALES’, 0 , 1 ,’Y’,’Y’, ’DB2RES5’,’CDSALES’,’DB2RES5’,’CDSALES’, 1 ,’0201’,NULL,’0’,’N’);
Case Study 3—Feeding a Data Warehouse
247
A full listing of the SQL used to register the Sales table can be found in Appendix E.10, “Output from Register the Sales Table” on page 365. For more information on the DB2 for OS/390 CREATE TABLESPACE statement, refer to the DB2 for OS/390 V5 SQL Reference, SC26-8966. Advice: When working with source tables of high volatility, it is often necessary to modify the automatically generated CREATE TABLESPACE statement to ensure that the tablespaces for CD and CCD tables have sufficient disk space. 8.4.5.2 Subscribe to the Sales Table The subscription definition for the Sales table is the final subscription to be added to the SALES_SET subscription set. Special consideration needs to be taken when handling the Sales table because of its size and expected change volume. This is often the case when dealing with the central fact table in a star schema. The initial size of the Sales target table is 87Mb with an estimated change volume of 14Mb per day. To manage these large amounts of change data and the expected change volume, it is often necessary to define a tablespace at the target capable of managing large amounts of data. In this case, the following command was used to create a tablespace in Oracle capable of holding sales information: CREATE TABLESPACE BIGTS DATAFILE ’/oracle8/u01/oradata/ora8/bigts.dbf’ SIZE 90M AUTOEXTEND ON NEXT 15M ;
Define the tablespace directly from within SQL*Plus. It will have an initial size of 90M and will be able to automatically extend in chunks of 15M. For more information on managing Oracle tablespaces, see the Oracle8 Administrator’s Guide, A58397-01. Once the Oracle tablespace has been created, the Add a Member to Subscription Set feature of DJRA is used to create the subscription for the Sales table (see Figure 67).
248
The IBM Data Replication Solution
Figure 67. Subscription Definition for Sales Table
The Target table attributes are similar to those described in detail in 8.4.2.2, “Subscribe to the Supplier Table” on page 223. The Where clause was added to the subscription definition to prevent the batch deletes from replicating to the target table. Modify the SQL generated from this subscription as follows: 1. Change the insert into ASN.IBMSNAP_SUBS_COLS for the TARGET_NAME column from DATE to SALE_DATE. This is because Oracle does not allow a column named DATE to be defined within a table. 2. Correspondingly, change the CREATE TABLE statement to create a column called SALE_DATE and not DATE. 3. Comment out the CREATE TABLESPACE command because the CCD will be created in Oracle and not in DataJoiner. 4. Change the CREATE TABLE statement to replace the DataJoiner tablespace name with the Oracle Server Mapping so the CCD table will be created in Oracle.
Case Study 3—Feeding a Data Warehouse
249
5. Add the REMOTE OPTION clause to the CREATE TABLE statement to ensure that the target table is created in the specially defined Oracle tablespace (BIGTS). The modified SQL is shown below: -- Create a new row in IBMSNAP_SUBS_COLS INSERT INTO ASN.IBMSNAP_SUBS_COLS(APPLY_QUAL,SET_NAME,WHOS_ON_FIRST, TARGET_OWNER,TARGET_TABLE,COL_TYPE,TARGET_NAME,IS_KEY,COLNO,EXPRESSION) VALUES(’WHQ1’,’SALES_SET’ , ’S’,’SIMON’,’SALES’ ,’A’,’SALE_DATE’,’N’, 1 , ’DATE’); -- About to create a target table tablespace -- CREATE TABLESPACE TSSALES MANAGED BY DATABASE USING (FILE --’/data/djinst5/djinst5/SALES.F1’ 2000 ); -- now done interpreting REXX logic file TARGSVR.REX -- The target table does not yet exist -- Not remote to DataJoiner target -- Create the target table SIMON.SALES CREATE TABLE SIMON.SALES(SALE_DATE DATE NOT NULL,BASARTNO DECIMAL(13 , 0) NOT NULL,LOCATION DECIMAL(4 , 0) NOT NULL,COMPANY DECIMAL(3 , 0) NOT NULL,PIECES DECIMAL(7 , 0) NOT NULL,OUT_PRC DECIMAL(11 , 2) NOT NULL, TAX DECIMAL(11 , 2) NOT NULL,TRANSFER_DATE TIMESTAMP NOT NULL, PROCESS_DATE TIMESTAMP NOT NULL,IBMSNAP_INTENTSEQ CHAR(10) FOR BIT DATA NOT NULL,IBMSNAP_OPERATION CHAR(1) NOT NULL,IBMSNAP_COMMITSEQ CHAR(10) FOR BIT DATA NOT NULL,IBMSNAP_LOGMARKER TIMESTAMP NOT NULL) IN AZOVORA8 REMOTE OPTION ’TABLESPACE BIGTS’;
A full listing of the SQL used to define the Sales subscription can be found in Appendix E.11, “Output from Subscribe to the Sales Table” on page 366. Now that all the replication registrations and subscriptions have been defined, we need to look at more detailed information on how to use DProp to support temporal histories, maintain aggregate information and finally load the data into the warehouse.
8.4.6 Adding Temporal History Information to Target Tables DProp is designed to automatically maintain history target tables by using CCDs. By adding an additional column to the target and some SQL After to the subscription set, DProp can also easily maintain temporal history information within the target table. To maintain temporal histories using DProp: • Use the IBMSNAP_LOGMARKER DProp control column within the target table to represent the start of the validity period of each record. This column contains timestamp information of when the record was changed on the source system. DProp maintains this information and automatically replicates it to the target table.
250
The IBM Data Replication Solution
• Add an additional column to each target table for which a validity period (or temporal history) is required. This column will contain the end of the validity period for each record. Like the IBMSNAP_LOGMARKER column, this should be a TIMESTAMP format. • Define SQL After statements which maintain the validity period for the records. The additional column is added to the target table(s) by editing the DJRA generated SQL for the subscription. In this case study, the column is called EXPIRY_TIMESTAMP.
Using SQL After Statements to Maintain Validity Periods SQL After statements execute after the replication subscription set has successfully completed. The two pseudo SQL statements shown below can be used to maintain temporal history information for the table: UPDATE . A SET EXPIRED_TIMESTAMP= ( SELECT MIN(IBMSNAP_LOGMARKER) FROM . B WHERE A.<source_key_column1> = B.<source_key_column1> AND A.<source_key_column2> = B.<source_key_column2> AND A.EXPIRED_TIMESTAMP IS NULL AND B.EXPIRED_TIMESTAMP IS NULL AND (B.IBMSNAP_INTENTSEQ > A.IBMSNAP_INTENTSEQ)) WHEREA.EXPIRED_TIMESTAMP IS NULL AND A.IBMSNAP_OPERATION IN (’I’,’U’); UPDATE . A SET EXPIRED_TIMESTAMP= ( SELECT B.IBMSNAP_LOGMARKER FROM . B WHERE A.<source_key_column1> = B.<source_key_column1> AND A.<source_key_column2> = B.<source_key_column2> AND B.IBMSNAP_OPERATION = ’D’ AND B.IBMSNAP_INTENTSEQ = A.IBMSNAP_INTENTSEQ) WHEREA.EXPIRED_TIMESTAMP IS NULL AND A.IBMSNAP_OPERATION = ’D’;
The first SQL works by scanning through the table for records with the same source key column value(s) and placing a timestamp in the EXPIRED_TIMESTAMP column of the oldest of these records. The oldest record is identified as the one with the lowest value in IBMSNAP_INTENTSEQ. The IBMSNAP_LOGMARKER value of the new record is used as the timestamp which is inserted into the EXPIRED_TIMESTAMP column of the old record. In other words, the start of the validity period of the new record becomes the end of the validity period of
Case Study 3—Feeding a Data Warehouse
251
the old record. This method provides a validity period for all records in the table, so that a particular record is valid from the value in IBMSNAP_LOGMARKER to the value in EXPIRED_TIMESTAMP. If EXPIRED_TIMESTAMP is NULL, then the record is valid at the current point in time. The second SQL statement is used to provide additional handling for source records, which are deleted. This statement looks for records that record a delete operation against the source. It updates the EXPIRED_TIMESTAMP column of such records with the IBMSNAP_LOGMARKER of the same record. In effect, it closes the record’s validity period immediately. It is included to respect one of the basic principles of life-span modeling, which states that the start and end dates represent the time in which the object is true in the modeled reality. Any query requesting information on the object outside of its modeled validity period should result in false or an SQLCODE 100 being returned. If you leave the end date of a deleted record open, temporal queries will return true, which is the wrong answer. The point is that the object was logically deleted, and thus, the state history must reflect this. For example, consider the following target CCD table: KeyCol A A B C
IBMSNAP_LOGMARKER 1999-03-26-11.37.30.000000 1999-03-26-13.40.30.000000 1999-03-26-11.37.30.000000 1999-03-26-15.22.21.000000
IBMSNAP_OPERATION EXPIRED_TIMESTAMP I 1999-03-26-13.40.30.000000 U I I
The record with the source key column value ’A’ is updated at the source and ’B’ is deleted.These changes are replicated to the CCD table. After Apply has replicated these changes to the target, but before the SQL After statements which maintain the temporal histories are executed, the table will contain the following data: KeyCol A A A B B C
IBMSNAP_LOGMARKER 1999-03-26-11.37.30.000000 1999-03-26-13.40.30.000000 1999-03-26-18.12.08.000000 1999-03-26-11.37.30.000000 1999-03-26-18.12.08.000000 1999-03-26-15.22.21.000000
IBMSNAP_OPERATION EXPIRED_TIMESTAMP I 1999-03-26-13.40.30.000000 U U I D I
The update and the delete have been recorded in the CCD table, but the validity period has not been changed. Once the SQL After statements have been executed, the target table will contain: KeyCol A A A B B C
252
IBMSNAP_LOGMARKER 1999-03-26-11.37.30.000000 1999-03-26-13.40.30.000000 1999-03-26-18.12.08.000000 1999-03-26-11.37.30.000000 1999-03-26-18.12.08.000000 1999-03-26-15.22.21.000000
The IBM Data Replication Solution
IBMSNAP_OPERATION EXPIRED_TIMESTAMP I 1999-03-26-13.40.30.000000 U 1999-03-26-18.12.08.000000 U I 1999-03-26-18.12.08.000000 D 1999-03-26-18.12.08.000000 I
For both ’A’ and ’B’, the old record now has an expiry timestamp which is the same as the new records initial timestamp. The new record for ’A’ contains NULL in the EXPIRED_TIMESTAMP column which means it is valid at the current point in time. The new record for ’B’ has an expiry timestamp, which is the same as its initial timestamp, indicating that ’B’ does not exist at the current point in time. Advice: It is possible to combine the two SQL update statements used to maintain temporal histories into a single update statement containing an embedded CASE expression. Using this technique, temporal histories may be maintained within a single pass of the table (instead of the two-pass approach adopted here). However, Oracle does not support the expression, and therefore DataJoiner is forced to compensate for this lack of functionality by effectively adopting a two-pass approach. Therefore, in effect, no added benefit would be achieved (but the SQL would be more complex). When the SQL After or SQL Before becomes complex, it may be easier to write the SQL in the native dialect of the target (because you are more familiar with its SQL dialect), hold it in a stored procedure at the target database and use the technique described in 7.4.5, “Add Statements or Stored Procedures to Subscription Sets” on page 199 to call the stored procedure and execute the SQL directly at the target database. Figure 68 shows the Add Statements or Procedures to Subscription Sets function of DJRA used to add temporal history capability to the Supplier table.
Figure 68. Adding the SQL After to Support Temporal Histories
Case Study 3—Feeding a Data Warehouse
253
It is important to add the SQL After statements in the order in which they are presented at the start of this section. Similar SQL After statements were used to add temporal history support to the Store and Products target tables. For details of the specific SQL used, refer to 8.4.2.3, “Add Temporal History Support to the Supplier Table” on page 227 for the Supplier table; 8.4.3.3, “Add Temporal History Support to the Store Table” on page 235 for the Store table; and 8.4.4.3, “Add Temporal History Support to the Products Table” on page 244 for the Products table. The SQL generated to add SQL After statements to the Supplier table can be seen in Appendix E.12, “SQL After to Support Temporal Histories for Supplier Table” on page 369.
Tuning the SQL After To optimize performance of the SQL After, which maintains temporal histories, it is advisable to create a unique index on the IBMSNAP_INTENTSEQ column and the column(s) used as the primary key on the source table. For Supplier, the following unique index is created on the target table using Oracle’s SQL*Plus: CREATE UNIQUE INDEX tempidx ON simon.supplier (SUPP_NO, IBMSNAP_INTENTSEQ); ANALYZE TABLE simon.supplier COMPUTE STATISTICS;
The ANALYZE command is used to gather statistics for the table and index and is similar to DB2’s RUNSTATS command. We recommended creating the index and analyzing the data after the initial load of the data into the target. This way, the statistics will be more accurate. DataJoiner will not automatically recognize the new Oracle index. To make DataJoiner aware of the index, connect to the DataJoiner database and create an index on the Supplier nickname using the following syntax: CREATE UNIQUE INDEX tempidx ON simon.supplier (SUPP_NO, IBMSNAP_INTENTSEQ);
This does not actually create an index on the nickname; it just populates the DataJoiner global catalog so that DataJoiner knows there is an index on the Oracle table. It is also advisable to use DB2 RUNSTATS against the nickname in order to ensure that the DataJoiner global statistics are up-to-date. For the Supplier target table, the following SQL was issued from the DB2 Command Line while connected to the DataJoiner database in order to update the global statistics:
254
The IBM Data Replication Solution
RUNSTATS ON TABLE simon.supplier WITH DISTRIBUTION AND INDEXES ALL
Advice: If the tables are large, performing RUNSTATS against nicknames may take a long time to complete. In this case, use the getstats utility, which can be downloaded from http://www.software.ibm.com/data/datajoiner (the DataJoiner home page). Alternatively, the DataJoiner global statistics can be updated manually by using the SYSSTAT views. See the DataJoiner SQL Reference and Application Programming Supplement, SC26-9148 for more information. Finally, we need to tell DataJoiner that the collating sequence used within the Oracle database is the same as the collating sequence used within the local DataJoiner database. This allows DataJoiner to push down order-dependent operations (such as ORDER BY, MIN, MAX, SELECT DISTINCT) to Oracle. If we do not set this option, DataJoiner must retrieve the necessary data from Oracle, and perform the ordering locally—this is usually far less efficient because far more data is transferred from Oracle to DataJoiner. We use the DataJoiner COLSEQ server option to do this. In this case study, the option is created for the AZOVORA8 server mapping by issuing the following SQL from the DB2 Command Line: CREATE SERVER OPTION colseq FOR SERVER azovora8 SETTING ’y’
This server option only needs to be created once, as it applies to the whole Oracle server. By creating the COLSEQ server option and setting it to "Y", performance can improve dramatically. For example, consider the Products target table which contains 37,000 rows. Without the server option, the SQL After statement took several minutes to execute. After creating the option, execution time for the SQL After was less than 5 seconds. For more details on DataJoiner server options, please refer to the DataJoiner Application Programming and SQL Reference Supplement, SC26-9148. For more information about tuning DataJoiner in the heterogeneous environment, please refer to the DataJoiner Administration Supplement, SC26-9146. 8.4.6.1 Defining a Time Consistent Query Time consistent data can now be returned from the data warehouse by adding the following predicates to queries: (SALE_DATE >= IBMSNAP_LOGMARKER) AND (SALE_DATE < IBMSNAP_LOGMARKER OR EXPIRED_TIMESTAMP IS NULL)
However, it is important to establish proper business rules in order to interpret timestamps correctly during analysis. For example, the above predicates
Case Study 3—Feeding a Data Warehouse
255
assume that a record is valid and will appear in the answer set if its validity period starts at exactly the same time as the sale occurred. Is this a valid assumption, or should the record be valid when its validity end period exactly matches the sale date? This can only be determined when proper business rules have been defined. Advice: Instead of using a NULL as the default in the EXPIRED_TIMESTAMP column, a date such as "9999-12-31" could be used as the default instead. Therefore, a value of "9999-12-31" in the EXPIRED_TIMESTAMP column would indicate that the record is currently valid. The above predicate to return time consistent data could then be simplified to: (SALE_DATE >= IBMSNAP_LOGMARKER AND SALE_DATE < EXPIRED_TIMESTAMP)
8.4.7 Maintaining Aggregate Information Pre-defined summary tables are very useful in data warehousing scenarios where most of the common summary information requested by users can be pre-determined. Therefore, a common warehousing technique is to pre-calculate these summaries and store them in table(s) at the data warehouse. 8.4.7.1 DProp Support for Aggregate Tables DProp provides support for two types of aggregate target tables: 1. A base aggregate is the result of a query against the base source table involving one or more SQL column functions and a GROUP BY clause. 2. A change aggregate is the result of a query against the change data table involving one or more SQL column functions and a GROUP BY clause. Change aggregates are useful for trend analysis—they summarize recent activity, but they do not summarize the overall state of your data. That is, they can help tell you where your business is going, but not where your business is. Apply does not maintain base aggregate tables from log based changed data capture. It maintains base aggregates by querying the application base tables directly. These tables may be large and contention may occur between Apply and your OLTP transactions when Apply is accessing the source table(s). Change aggregates are relatively inexpensive to maintain because Apply queries the change data table, and not the base table. Not only does this avoid contention with your OLTP applications, but change data tables are usually much smaller than application tables. For more information on DProp
256
The IBM Data Replication Solution
aggregate tables, refer to the DB2 Replication Guide and Reference, S95H-0999. 8.4.7.2 Maintaining a Base Aggregate Table from a Change Aggregate Subscription The technique described below combines the benefits of using base aggregate target tables to summarize your source data with the low cost maintenance option of change aggregate tables. It is a practical implementation of the technique described in the white paper D13 Using Data Replication in Data Warehousing Scenarios. This paper can be found on the Web at http://www.software.ibm.com/data/dpropr/library.html and contains useful information on how to maintain data warehouses using DProp. Figure 69 shows pictorially how the scheme works. 1. The base aggregate subscription (BASEAGG_SET) runs once to populate the target aggregate table (SIMON.AGGREGATES). 2. Once the base aggregate subscription has finished, it is disabled and the change aggregate subscription (CHGAGG_SET) takes over maintenance of the target base aggregate table. The change aggregate set aggregates the change data into the movement table (SIMON.MOVEMENT). 3. This information is then used to adjust the values in the base aggregate table at the end of the subscription cycle (using SQL After).
Case Study 3—Feeding a Data Warehouse
257
Source Table
DB2RES5.SALES
Initia
The Target Aggregation Base Aggregate
lize
1
Maintain
SIMON.AGGREGATES
CD Table
3
Maintain
Change Aggregate
2
SIMON.MOVEMENT
UOW Table
The Movement Table
Figure 69. Maintain Base Aggregate Table from Change Aggregate Subscription
Let us consider an example for this case study. A common query against the warehouse is to find the total number of items sold and the total price of all these items broken down by store. The following SQL statement can be used to provide this analysis: SELECT company, location, sum(pieces), sum(out_prc) FROM sales GROUP BY company,location
We would like to have this information precalculated and stored within the warehouse. By using the process described, it is possible to maintain such a target aggregate table from a change aggregate subscription. The SQL script detailed in Appendix E.13, “Maintain Base Aggregate Table from Change Aggregate Subscription” on page 370 was used to maintain the aggregate shown above within the data warehouse (and contains detailed comments on how the scheme works). Advice: Use the Replication Analyzer with the DEEPCHECK option to check the validity of the SQL Before and SQL After statements before starting the subscription.
258
The IBM Data Replication Solution
In the example used in this case study, if we select the data from the base aggregate table, then a summation of the total number of items sold and the total price of those items grouped by store number and company number is displayed. However, a simple join with the Outlets view will yield the store name and city. For example, the following query may be issued from SQL*Plus against the data warehouse: SELECT o.NAME, o.CITY, a.SUM_PIECES, a.SUM_OUTPRC FROM OUTLETS o, SIMON.AGGREGATES a WHERE o.STORE_NUM=a.LOCATION AND o.COMPNO=a.COMPANY AND o.Valid_To is NULL ORDER BY a.SUM_OUTPRC DESC;
This will produce the following output: NAME -----------------------PARF. SIMON SUED GMBH PARF. SIMON NORD GMBH PARF. SIMON SUED GMBH CHRIS’ PARFUEMERIE GMBH PARF. SIMON SUED GMBH PARF. SIMON SUED GMBH PARF. SIMON SUED GMBH PARF. SIMON WEST GMBH CHRIS’ PARFUEMERIE GMBH PARF. SIMON WEST GMBH PARF. SIMON SUED
CITY SUM_PIECES SUM_OUTPRC -------------------- ---------- ---------NUERNBERG 40121 766878.23 HAMBURG 19885 748414.80 FRANKFURT 25932 597813.25 DORTMUND 12260 539272.99 MANNHEIM 15828 514384.91 MUENCHEN 13428 510742.99 WIESBADEN 17003 431994.30 BREMEN 9560 397343.00 ESSEN 8801 388603.62 DUEREN 13668 371753.06 DARMSTADT 17521 367486.95
The Valid_To is NULL predicate is added to ensure that only those records from the Outlets table which are valid at the present time are used. The ORDER BY clause will order the data so that the stores which take the most money will appear first in the report. This technique will work for SQL column functions AVG, COUNT and SUM. It is not possible to use the technique with the MIN and MAX column functions (these functions will still have to be maintained directly from the source tables by using standard base aggregate subscriptions). If Capture is cold-started, then you will probably need to reactivate the base aggregate set and deactivate the change aggregate set to refresh the base aggregate table. Once the refresh is complete, the base aggregate set will automatically be deactivated, and the change aggregate set will be activated.
8.4.8 Pushing Down the Replication Status to Oracle DProp maintains current replication status information in the Apply Subscription Set table (ASN.IBMSNAP_SUBS_SET). DBAs can use the information within this table to determine the success or failure of a subscription cycle.
Case Study 3—Feeding a Data Warehouse
259
The Subscription Set table is located at the control server, which is always a DB2 database (in this case study, the DB2 database within DataJoiner). The DBA of the multi-vendor target database is likely to want the status information in a format (and database) which is familiar to them. They are unlikely to want to logon to a DB2 database just to find the status of the replication subscriptions. The technique described below is used to create a status table in the heterogeneous target database and populate the contents of that table with replication status information from the Subscription Set table: 1. Connect to the DataJoiner database that is acting as the control server and create the status table in Oracle using syntax similar to the following: CREATE TABLE SIMON.DPROP_STATUS( APPLY_QUAL CHAR(18), SET_NAME CHAR(18), STATUS SMALLINT, LASTRUN TIMESTAMP, LASTSUCCESS_RUN TIMESTAMP, CONSISTENT_TO TIMESTAMP) IN AZOVORA8;
By using DataJoiner’s DDL transparency feature, the DPROP_STATUS table is created in the Oracle target server (AZOVORA8) and a nickname is automatically generated for that table. The DPROP_STATUS table will contain one row for each subscription set recording information about the current status of the subscription, the last time it was run, the last time it was successfully run, and a timestamp indicating the point in time to which the target data is consistent with the source data. 2. Insert an initial record into the DPROP_STATUS table for each subscription set which requires monitoring: INSERT INTO SIMON.DPROP_STATUS VALUES( ’WHQ1’,’SALES_SET’,NULL,NULL,NULL,NULL);
3. Now insert an SQL After statement into the ASN.IBMSNAP_SUBS_STMTS table for each subscription set being monitored. The SQL After statement updates the DPROP_STATUS Oracle table (by using the nickname) with the current information for that subscription from the Subscription Set table. The sample code below was used for the WHQ1 subscription set: -- insert the SQL AFTER into the table INSERT INTO ASN.IBMSNAP_SUBS_STMTS(APPLY_QUAL,SET_NAME,WHOS_ON_FIRST, BEFORE_OR_AFTER,STMT_NUMBER,EI_OR_CALL,SQL_STMT,ACCEPT_SQLSTATES) VALUES(’WHQ1’,’SALES_SET’,’S’,’A’, 4,’E’,
260
The IBM Data Replication Solution
’UPDATE SIMON.DPROP_STATUS SET (STATUS, LASTRUN, LASTSUCCESS_RUN, CONSISTENT_TO) = (SELECT STATUS, LASTRUN, LASTSUCCESS, SYNCHTIME FROM ASN.IBMSNAP_SUBS_SET WHERE APPLY_QUAL=’’WHQ1’’ AND SET_NAME=’’SALES_SET’’ AND WHOS_ON_FIRST=’’S’’)’, ’0000002000’);
Be sure that this is the last SQL After statement for the subscription set. This is done by setting the STMT_NUMBER column higher than any other value for that set. The SQL After statement can be added to the subscription set either by using the DJRA Add Statements or Procedures to Subscription Sets function, or by inserting the information directly into the Subscription Statements table (as shown above). 4. We also need to update the subscription set table to tell Apply that we have added an SQL After statement to the set: UPDATE ASN.IBMSNAP_SUBS_SET SET AUX_STMTS=AUX_STMTS+1 WHERE APPLY_QUAL=’WHQ1’ AND SET_NAME=’SALES_SET’ AND WHOS_ON_FIRST=’S’;
Now, when the WHQ1 subscription set has executed, Apply will automatically update status information into the Subscription Set table. The SQL After statement will then be executed, which will copy this status information into Oracle by using DataJoiner. The multi-vendor DBA can now access DProp status information using the tools and techniques which they are familiar with. For example, the Oracle DBA could use the following SQL query from SQL*Plus to obtain the status of the last replication cycle: SELECT APPLY_QUAL, SET_NAME, STATUS, TO_CHAR(LASTRUN,’IYYY-MM-DD-HH24:MI:SS’), TO_CHAR(LASTSUCCESS_RUN,’IYYY-MM-DD-HH24:MI:SS’), TO_CHAR(CONSISTENT_TO,’IYYY-MM-DD-HH24:MI:SS’) FROM DPROP_STATUS;
8.4.9 Initial Load of Data into the Data Warehouse Now that all the replication definitions are in place, we can proceed with loading the data into the data warehouse and then synchronizing Capture and Apply so that the target tables can be maintained by change capture from the source tables.
Case Study 3—Feeding a Data Warehouse
261
Note: Before performing the steps detailed in this section, Capture must be running at the source server. Refer to the DB2 Replication Guide and Reference, S95H-0999 for details on how to start Capture for MVS. Since the subscription definition has full refresh disabled, the initial full refresh of the data and synchronization of Capture and Apply must be performed manually. The DJRA Off-line Load utility can be used to help load data into the warehouse manually. The utility will only unload/load data one subscription set at a time. Therefore, we will have to unload/load all the data from the SALES_SET at once. The four steps that off-line load utility guides you through are these: 1. Prepare the tables for the off-line load: • Disable full refresh for the subscription set members. • Disable the subscription set. • Initiate change capture by performing synchpoint translation for each source table. 2. Unload the data from the source tables. 3. Load the data into the target tables. 4. Reactivate the subscription set. Steps 1 and 4 are performed by the Off-line Load utility. Steps 2 and 3, the unloading and loading of the data, must be performed manually by the replication administrator. There are many ways in which the unload and load tasks can be performed. The most suitable method is usually determined by the volume of data being loaded into the target. Several of the most common alternatives are described below. 8.4.9.1 Using SQL INSERT....SELECT.... from DataJoiner A quick and easy way to transfer relatively small amounts of data is to use the distributed request facility inherent in DataJoiner. To use this, a DataJoiner Server Mapping and User Mapping need to be defined for the DB2 for OS/390 system where the source data resides. We assume that a server mapping already exists for the Oracle target database. Once the DB2 for OS/390 Server Mapping has been defined, create a nickname for the replication source tables.
262
The IBM Data Replication Solution
Now for a PIT target type, the table can be populated using SQL similar to the following (which was used for the Region table): INSERT INTO DJINST5.REGION (REGION_ID, REGION_NAME, IBMSNAP_LOGMARKER) SELECT REGION_ID, REGION_NAME, ’1997-12-01’ AS IBMSNAP_LOGMARKER FROM DJINST5.REGION_SOURCE
Advice: We need to ensure that the timestamp we initially load into the IBMSNAP_LOGMARKER column is either the same or earlier than the minimum date from the central fact table (because we use this column to denote the start of the validity period). If we do not do this, then the predicate described in 8.4.6.1, “Defining a Time Consistent Query” on page 255 may not return all the valid rows because the SALE_DATE may be after the initial timestamp marking the start of that records validity period. In this case study, the following SQL was issued against the source Sales table to find the correct timestamp to use: SELECT MIN(DATE) FROM DB2RES5.SALES
For a CCD table, we have to generate values for the additional DProp control columns. The example below shows the SQL used to populate the Supplier table: INSERT INTO SIMON.SUPPLIER (SUPP_NO, SUPP_NAME, IBMSNAP_INTENTSEQ, IBMSNAP_OPERATION, IBMSNAP_COMMITSEQ, IBMSNAP_LOGMARKER) SELECT SUPP_NO, SUPP_NAME, x’00000000000000000001’ as IBMSNAP_INTENTSEQ, ’I’ as IBMSNAP_OPERATION, x’00000000000000000001’ as IBMSNAP_COMMITSEQ, ’1997-12-01’ as IBMSNAP_LOGMARKER FROM DJINST5.SUPPLIER_SOURCE
Default values must be generated for the DProp control columns because they do not exist within the source table and the columns are defined as NOT NULL on the target table. 8.4.9.2 Using DataJoiner’s EXPORT/IMPORT Utilities The IMPORT and EXPORT utilities within DataJoiner can also be useful for transferring data from source to target. The utilities can be used directly
Case Study 3—Feeding a Data Warehouse
263
against nicknames, or if a direct connection is made to a DB2 family database, then the utilities can be used against the table directly. The simplest format to use when transferring data using these utilities is IXF. We used the following SQL script to export data from the Products view on DB2 for OS/390 and import data into the Oracle Products table by using a DataJoiner nickname: -- Manual addition to export the data CONNECT TO SJ390DB1 USER db2res5 using; EXPORT TO products.ixf OF IXF SELECT ITEM_NUM, ITEM_DESCRIPTION, PROD_LINE_NUM, PRODUCT_LINE_DESC, SUPPLIER_NUM, BRAND_NUM, BRAND_DESCRIPTION, x’00000000000000000001’ as IBMSNAP_INTENTSEQ, ’I’ as IBMSNAP_OPERATION, x’00000000000000000001’ as IBMSNAP_COMMITSEQ, ’1997-12-01’ as IBMSNAP_LOGMARKER FROM db2res5.products; CONNECT RESET; -- Manual addition to import the data CONNECT TO DJDB USER djinst5 using; IMPORT FROM products.ixf OF IXF INSERT INTO simon.products; CONNECT RESET;
Since the target Oracle table is a CCD, additional column values are generated for the DProp control columns on export. Advice: EXPORT will create the IXF file on the machine where the EXPORT command is issued. If there is a significant amount of data in the file, then it should be transferred to the machine where the Oracle target database resides before using the IMPORT command. This will dramatically improve the performance of the IMPORT because DataJoiner will be able to perform the SQL inserts locally against the Oracle database (and not across the network). Of course this is only possible if DataJoiner is on the same machine as Oracle. 8.4.9.3 Using DSNTIAUL and Oracle’s SQL*Loader Utility For large amounts of data—such as those that will be encountered in the initial population of a data warehouse, or when populating Very Large Databases (VLDBs)—an alternative unload and load mechanism must be found. The method described below by-passes DataJoiner completely and uses native unload/load mechanisms.
264
The IBM Data Replication Solution
The most efficient method to transfer large amounts of data between DB2 for OS/390 and Oracle is to use the DB2 for OS/390 DSNTIAUL sample program to unload the data and Oracle’s SQL*Loader utility to import the data. This section describes how this mechanism can be used to perform the initial load of the Sales target table (which contains 787,000 records and is approximately 87MB).
Use DSNTIAUL to Unload the Data DSNTIAUL is a DB2 for OS/390 sample assembler program which is shipped in the sample library. It is an assembler program whose source code is part of the DB2 for OS/390 machine readable materials. SQL select statements can be used as input to DSNTIAUL, the outputs being two types of Basic Sequential Access Method (BSAM) fixed format block files—containing data which is suitable for the DB2 for OS/390 LOAD utility. This data can also be used as input for the Oracle SQL*Loader utility. The JCL used for invoking the DSNTIAUL program in our environment is shown below: //DB2RES5$ JOB (999,POK),’DSNTIAUL’, // CLASS=A,MSGCLASS=T,MSGLEVEL=(1,1),TIME=1440, // NOTIFY=DB2RES5 //* //DELETE EXEC PGM=IDCAMS //SYSPRINT DD SYSOUT=* //SYSIN DD * DELETE DB2RES5.SYSREC00; DELETE DB2RES5.SYSPUNCH; SET MAXCC = 0; /* //* //UNLOAD EXEC PGM=IKJEFT01,DYNAMNBR=20,COND=(4,LT) //STEPLIB DD DISP=SHR,DSN=DB2V510.SDSNLOAD //DBRMLIB DD DISP=SHR,DSN=DB2V510I.DBRMLIB.DATA //SYSPRINT DD SYSOUT=* //SYSUDUMP DD SYSOUT=* //SYSREC00 DD DSN=DB2RES5.SYSREC00,UNIT=SYSDA, // VOL=SER=PSOFT6,SPACE=(CYL,(500,0),RLSE),DISP=(,CATLG) //SYSPUNCH DD DSN=DB2RES5.SYSPUNCH,UNIT=SYSDA, // VOL=SER=SAP007,SPACE=(1024,(15,15)),DISP=(,CATLG) //SYSTSPRT DD SYSOUT=* //SYSTSIN DD * DSN S(DB2I) RUN PROGRAM(DSNTIAUL) PLAN(DSNTIB51) PARMS(’SQL’) LIB(’DB2V510I.RUNLIB.LOAD’) /* //SYSIN DD * SELECT CHAR(DATE,ISO) AS DATE, CHAR(BASARTNO) AS BASARTNO, CHAR(LOCATION) AS LOCATION, CHAR(COMPANY) AS COMPANY, CHAR(PIECES) AS PIECES, CHAR(OUT_PRC) AS OUT_PRC, CHAR(TAX) AS TAX, CHAR(DATE(TRANSFER_DATE),ISO) AS TRANSFER_DATE,
Case Study 3—Feeding a Data Warehouse
265
CHAR(DATE(PROCESS_DATE),ISO) AS PROCESS_DATE, CHAR(CURRENT DATE,ISO) AS IBMSNAP_LOGMARKER FROM DB2RES5.SALES; /* //
This JCL will probably need modification to meet particular site requirements and configurations. When using DSNTIAUL it is important to estimate the size of the dataset which will be created and use the SPACE allocation of the SYSREC00 DD statement to ensure there is sufficient disk space available. Use the RLSE parameter to shorten the data set to the space occupied by the data at the time the data set is closed. In the above job, two files are created: SYSREC00
contains the exported data from the Sales table.
SYSPUNCH
contains the corresponding control statements for the DB2 for OS/390 LOAD utility.
The SYSPUNCH file is similar to the control file required by Oracle’s SQL*Loader and can be transferred to AIX along with the data file and then edited to comply with the Oracle control format. The SQL select statement used to extract the data is at the bottom of the JCL file. In order to overcome the differences in the representations of various data types (for example, decimal, integer) between OS/390 and AIX, the externalized data must be converted to character prior to the unload. It is then converted back to the corresponding data type for the target tables during the load. The Sales table contained three data types which were converted to CHARACTER using the techniques described below: • DECIMAL columns are converted to CHARACTER format by using the CHAR SQL function. For example: CHAR(TAX) AS TAX . • DATE columns are converted to CHARACTER format using the CHAR SQL function with an additional parameter indicating the format of the date within the character field. For example: CHAR(DATE,ISO) AS DATE . • TIMESTAMP columns are converted to CHARACTER by first of all using the DATE function to convert the TIMESTAMP to a DATE type. The result of this was subsequently converted to CHARACTER using the same method as that described for the DATE type above. For example: CHAR(DATE(PROCESS_DATE),ISO) AS PROCESS_DATE. The time information in the TIMESTAMP is lost when it is converted to a DATE. This is acceptable in this situation because even though the TRANSFER_DATE and PROCESS_DATE columns where of TIMESTAMP type, they only contained DATE information.
266
The IBM Data Replication Solution
For more information on CHAR, DATE and other DB2 functions, refer to the DB2 for OS/390 V5 SQL Reference, SC26-8966. Although the target Sales table is a CCD, many of the DProp control columns can be omitted from the SQL used by DSNTIAUL. This is because they can be added as constant values from the SQL*Loader control file. This reduces the amount of data which is held in the export file created by DSNTIAUL, and consequently the amount of data which will be transferred across the network. Advice: The only DProp control column to be added to the export file is IBMSNAP_LOGMARKER. This can be added as the CURRENT DATE DB2 special register and not the CURRENT TIMESTAMP special register. This is because Oracle does not have the same precision for TIMESTAMPS as DB2. In fact, Oracle stores all its time and date information in columns of type DATE, which can only hold data accurate to the second (not 100,000 ths of a second like DB2). The SYSPUNCH output file generated by DSNTIAUL is shown below: LOAD DATA LOG NO INDDN SYSREC00 INTO TABLE TBLNAME ( DATE POSITION( CHAR( 10) , BASARTNO POSITION( CHAR( 15) , LOCATION POSITION( CHAR( 6) , COMPANY POSITION( CHAR( 5) , PIECES POSITION( CHAR( 9) , OUT_PRC POSITION( CHAR( 17) , TAX POSITION( CHAR( 17) , TRANSFER_DATE POSITION( CHAR( 10) , PROCESS_DATE POSITION( CHAR( 10) , IBMSNAP_LOGMARKER POSITION( CHAR( 10) )
1
)
11
)
26
)
32
)
37
)
46
)
63
)
80
)
90
)
100
)
The first few lines of the SYSREC00 output dataset follows: 1997-12-01 0000000012211. 0246. 063. 0000001. 000000004.301900-01-011997-12-041999-03-26 1997-12-01 0000000019471. 0083. 061. 0000001. 000000003.851900-01-011997-12-041999-03-26 1997-12-01 0000000019489. 0083. 061. 0000001. 000000003.851900-01-011997-12-041999-03-26 1997-12-01 0000000019513. 0054. 063. 0000001. 000000002.871900-01-011997-12-041999-03-26
000000033.00 000000029.50 000000029.50 000000022.00
Case Study 3—Feeding a Data Warehouse
267
1997-12-01 0000000019794. 0109. 063. 0000001. 000000001.561900-01-011997-12-041999-03-26 1997-12-01 0000000019935. 0041. 061. 0000001. 000000002.861900-01-011997-12-041999-03-26 1997-12-01 0000000019976. 0116. 061. 0000001. 000000002.601900-01-011997-12-041999-03-26 1997-12-01 0000000019984. 0086. 062. 0000001. 000000001.041900-01-011997-12-041999-03-26 1997-12-01 0000000022913. 0054. 063. 0000001. 000000001.301900-01-011997-12-041999-03-26 1997-12-01 0000000022913. 0063. 062. 0000001. 000000000.521900-01-011997-12-041999-03-21
000000011.95 000000021.95 000000019.95 000000007.95 000000009.95 000000003.95
As can be seen, all the data is in a readable character format. For more information on supporting data movements between different platforms with DSNTIAUL refer to Chapter 5 "Data Extractions" of the redbook, Migrating and Managing Data on RS/6000 SP with DB2 Parallel Edition, SG24-4658.
Transfer the Data and Control File to AIX Transfer both the SYSPUNCH control file and the SYSREC00 data file to the AIX machine hosting Oracle using ftp (in ASCII format). Rename the files to sales.ctl and sales.dat respectively to conform to Oracle’s naming standards (although this is not required). Prepare the Control File for SQL*Loader Edit the sales.ctl file to create the following Oracle SQL*Loader control file: LOAD DATA -- LOG NO INDDN "sales.dat" DISCARDFILE "sales.dis" INSERT -- REPLACE INTO TABLE simon.sales ( SALE_DATE BASARTNO LOCATION COMPANY PIECES OUT_PRC TAX TRANSFER_DATE PROCESS_DATE IBMSNAP_INTENTSEQ IBMSNAP_OPERATION IBMSNAP_COMMITSEQ IBMSNAP_LOGMARKER )
POSITION(1:10) DATE ’YYYY-MM-DD’ , POSITION(11:25) DECIMAL EXTERNAL , POSITION(26:31) DECIMAL EXTERNAL , POSITION(32:36) DECIMAL EXTERNAL , POSITION(37:45) DECIMAL EXTERNAL , POSITION(46:62) DECIMAL EXTERNAL , POSITION(63:79) DECIMAL EXTERNAL , POSITION(80:89) DATE ’YYYY-MM-DD’ , POSITION(90:99) DATE ’YYYY-MM-DD’ , CONSTANT ’00000000000000000001’, CONSTANT ’I’ , CONSTANT ’00000000000000000001’, POSITION(100:109) DATE ’YYYY-MM-DD’
As you can see, this file is somewhat similar in format to the SYSPUNCH file generated by DSNTIAUL. A brief summary follows:
268
The IBM Data Replication Solution
1. The header information identifies the following items to the SQL*Loader: • Name of the input file ( INDDN) • Name of the file used to hold records that failed the WHEN clause (DISCARDFILE) • Table load method (INSERT) • Name of the table to be loaded ( INTO TABLE) 2. Column 1 of the main body of the control file is the name of the target column into which the data will be loaded. The name of the DATE column was changed to SALE_DATE because Oracle does not allow columns named DATE. 3. Column 2 contains positional information describing the starting and ending positions of the data within the data file. If this column contains the keyword CONSTANT, then the value following the keyword will be inserted into all rows. 4. Column 3 contains the data type specification: • DECIMAL EXTERNAL means that the data is a decimal number represented in character format. • DATE ’YYYY-MM-DD’ means that the data is a date and describes the format of that date. The format can be any valid Oracle data mask used with the TO_DATE function. Full details of the control file format and SQL*Loader parameters can be found in the Oracle8 Utilities Guide, A58244-01.
Load the Data into Oracle using SQL*Loader The final step is to use Oracle’s SQL*Loader utility to load the data. SQL*Loader is invoked by using the following command from the AIX command line: sqlload USERID=simon/simon CONTROL=sales.ctl DATA=sales.dat DIRECT=TRUE
The DIRECT=TRUE parameter tells the loader to use the direct path option. This option creates preformatted data blocks and inserts these blocks directly into the table. This avoids the overhead of issuing multiple SQL inserts and the associated database logging which will occur. This is similar to the DB2 UDB LOAD utility. Beside the discard file, SQL*Loader also creates a .log file which contains a log of the work done and a .bad file which contains all the records which could not be loaded into the target.
Case Study 3—Feeding a Data Warehouse
269
8.5 A Star Join Example Against the Data Warehouse Target Tables A typical query which may be used against the data warehouse described in this case study is shown below: SELECT T.YEAR, T.MONTH, O.REGION_NAME, P.PRODUCT_LINE_DESC, SUM(S.OUT_PRC) FROM SIMON.SALES S, SIMON.OUTLETS O, SIMON.PRODUCTS P, SIMON.TIME T WHERE S.BASARTNO = P.ITEM_NUM AND S.LOCATION = O.STORE_NUM AND S.COMPANY = O.COMPNO AND T.YEAR IN (1997,1998) AND S.SALE_DATE >= O.VALID_FROM AND (S.SALE_DATE < O.VALID_TO OR O.VALID_TO IS NULL) AND S.SALE_DATE >= P.IBMSNAP_LOGMARKER AND (S.SALE_DATE < P.EXPIRED_TIMESTAMP OR P.EXPIRED_TIMESTAMP IS NULL) GROUP BY T.YEAR, T.MONTH, O.REGION_NAME, P.PRODUCT_LINE_DESC ORDER BY T.YEAR, T.MONTH, O.REGION_NAME, 5 DESC;
The query produces a summary of the total sales recorded in the Sales table during 1997 and 1998 grouped by region and product line. Essentially, it tells us the best (and worst) selling product lines by region over a 2-year period.
8.6 Summary During this chapter we have seen how to maintain a data warehouse within Oracle from changes captured from a DB2 for OS/390 system. Specific techniques have been discussed showing how to use DProp to maintain historic information, denormalize data, and maintain temporal histories within the target data warehouse. Advanced techniques showing how to push down the replication status to the warehouse and how to maintain base aggregate tables from change aggregate subscriptions have also been demonstrated. Many of the techniques discussed within this chapter apply not only to data warehousing situations, but to replication situations where DProp is used as the product of choice.
270
The IBM Data Replication Solution
Chapter 9. Case Study 4—Sales Force Automation, Insurance This scenario will illustrate the update-anywhere capability of DProp. The changes performed into the source tables will be replicated towards the target tables, and the changes performed into the target tables will be replicated back towards the source tables. We will also illustrate the conflict detection mechanism involved with update-anywhere replication. The scenario also illustrates the use of data replication in an occasionally connected, mobile environment. In such an environment, not all the data will be replicated from the source server towards all the target servers. Each target database will receive only a subset of rows that are of interest for that particular target database. The subsetting will be done according to a geographical criterion (agency code for example). Since the partitioning data is not present in every source table, this scenario will also illustrate the use of view registrations to implement the subsetting technique. The objectives of the scenario may be summarized as follows: • Illustrate update-anywhere replication between DB2 UDB and Microsoft Access. • Explain the row-subsetting technique, based on the use of source views. • Illustrate the conflict detection mechanism between DB2 UDB updates and Microsoft Access updates. Remark
At the time this book was written, the DB2 DataPropagator for Microsoft Jet product was still in test phase, so the results described below should be considered with some degree of caution.
9.1 The Business Problem The scenario describes a sales force automation application, for an insurance company. The insurance company’s head office owns the corporate data and runs the reference applications. The insurance company has several agencies spread all over the country, and sales representatives in each agency.
© Copyright IBM Corp. 1999
271
Each sales representative in the company is equipped with a laptop running Windows 95 and Microsoft Access. On his laptop, the sales representative has all the information he needs to prepare his customers’ interviews and manage his business. That way, sales representatives can visit their customers more often and more efficiently. Each sales representative is attached to only one agency, and each customer is usually managed by only one sales representative. The sales representative’s Microsoft Access tables contain all the data pertaining to all the customers that are attached to the sales representative’s agency. If a sales representative is not available, he can ask one of his colleagues from the same agency to replace him for a specific customer case. Sales representatives do not have access to the data that belong to other agencies. Periodically, once a day for example, the sales representatives connect to the head office. They start the replication to transmit their updates to the head office, and refresh their own data with the updates from the head office. The updates from the head office are the ones that were originated directly by the head office people, or that were previously originated by other sales representatives, from other laptops. A customer is allowed to move from one agency to another, if his address changes, for example. We will explain how to deal with this specific issue.
9.1.1 Data Model There are four source DB2 UDB tables at the head office (see Figure 70): • CUSTOMERS • CONTRACTS • VEHICLES • ACCIDENTS Also, there are the equivalent four target tables in Microsoft Access. The target tables have the same structure as the source tables.
272
The IBM Data Replication Solution
CUSTOMERS
CONTRACTS
CUSTNO ....... AGENCY
CONTRACT ....... CUSTNO
VEHICLES
ACCIDENTS
PLATENUM ....... CUSTNO
CUSTNO ACCNUM
Figure 70. Data Model
The SQL statements that we used to create the tables are shown in Appendix F.1, “Structures of the Tables” on page 381. In our scenario the source database is called SJNTDWH1, and the schema of the source tables is called IWH.
9.1.2 Comments about the Table Structures When you look at the fields that are present in the tables, you notice that the customer number is present in all the tables, but the agency number is present only in the CUSTOMERS table. Since we want to be able to transmit, to a specific sales representative, only the rows that relate to the customers of his agency, we will need to create join views between the CUSTOMERS table and the three other tables. These views, which will be used as replication sources, are detailed in a later section. Since we will be replicating join views, we must take care of the issue involving "double-delete" (or "simultaneous-delete"). What happens if a row that was deleted from the CUSTOMERS table and the corresponding row,
Case Study 4—Sales Force Automation, Insurance
273
from the join point of view, is also deleted from the CONTRACTS table during the same replication cycle? The problem is that, since the row was deleted from the two components of the join, it does not appear in the views (base views and CD-views) and so the double-delete is not replicated. There are ways to deal with this issue. A well-known technique is to define a side CCD table for one of the components of the join. This CCD table should be condensed and non-complete (you can define it as complete, but this is not necessary) and located on the target server. The IBMSNAP_OPERATION column of this CCD table is used to detect the deletes. The most common way to do this is to add an SQL after statement in the definition of the subscription set. The SQL statement will remove, from the target table, all the rows for which the IBMSNAP_OPERATION is equal to "D" in the CCD table. But in this scenario we are replicating between DB2 and Microsoft Access, and we cannot create a CCD table into a Microsoft Access database. Furthermore, DB2 DataPropagator for Microsoft Jet does not allow the use of SQL After statements. So if we wanted to really deal with the double-delete issue in this scenario we would need to: • Create a CCD table on the source server. This means that we would have an extra Apply program running on the source server to feed the CCD table. • Insert some code in the ASNJDONE user exit so that, after each replication, it would connect to the source server, read the content of the CCD table, and delete the target rows if IBMSNAP_OPERATION is equal to "D". In fact, in most real production environments, the double-delete issue does not exist, because one of the components of the join never has deletes, for example. In most cases the CUSTOMERS table will only have inserts and updates, no deletes. If a customer cancels all his contracts, an indicator will be updated in the CUSTOMERS table to show that this customer is no longer active. In our scenario, we will assume that the CUSTOMERS table has no deletes, and so the double-delete issue disappears.
9.2 Update-Anywhere Replication versus Multi-Site Update Programming In a traditional multi-site update application, the program simultaneously accesses all the databases, and must be written in such a way that if one
274
The IBM Data Replication Solution
database update fails, all the other databases are also rolled back (this kind of programming uses a database feature called two-phase commit). If the number of sites increases, and specifically when those sites are laptops, such a synchronous application is not viable because you can never have all the connections up and running at the same time. That is why an asynchronous solution, such as Dprop replication, is so helpful. Each site receives the updates when it needs or wishes to receive them. But each solution has its own constraints. With an asynchronous update-anywhere replication scenario, you must be aware that conflicts can occur if several sites update the same table row during the same replication cycle. In the sales force automation application described above, the risk of conflict is limited by the fact that each sales representative only receives the data that pertain to customers attached to his own agency. Each sales representative is asked to update only his own customers’ data, and any exception needs the approval of the sales representative who is responsible for the customer, so that two sales representatives do not update the same data at the same time. In a DB2-towards-Microsoft-Access replication scenario, the Microsoft Access tables are called row-replicas. Row-replica is the only target table type allowed for such a scenario. The conflict detection mechanism that is used with row-replicas differs from the one that is used with other types of update-anywhere scenarios (DB2-to-DB2 scenarios, for example), so we will illustrate how the conflict detection mechanism operates here and show how the conflict is reported. In this scenario, the conflicts would normally be limited to cases where a sales representative updates a table row, and a staff member at the head office also updates the same row during the same day, for example. If a conflict occurs, the head office (source server) will always win. You should always try to find a way to prevent conflicts from occurring. For example, in our scenario, a convention should be established between sales representatives and the head office, so that they do not update the same tables the same day.
9.3 Architecting the Replication Solution This chapter details the software components that are necessary to implement the scenario. We will be replicating data between a DB2 UDB for Windows NT server and several Microsoft Access databases located on other Windows NT servers or
Case Study 4—Sales Force Automation, Insurance
275
on laptops running Windows 95, using DB2 DataPropagator for Microsoft Jet (also referred to as ASNJET in this chapter).
9.3.1 Ms Jet Update-Anywhere Scenario—System Design This scenario does not require the full functions of the DataJoiner product. Only the administration component (DJRA) and the ASNJET component (equivalent of Dprop Apply, but for Microsoft Access) are required from the DataJoiner installation CD-ROM. The complete solution involves the following components: • Source site: • DB2 UDB Enterprise Edition for Windows NT, including the Capture component of Dprop. • Target workstations (either Windows 95 or Windows NT): • Microsoft Access. • CAE (DB2 Client Application Enabler). • And the ASNJET component of Dprop (IBM DB2 DataPropagator for Microsoft Jet). There is no separate Capture component on the target side. ASNJET provides both the Capture and the Apply functions. • Administration workstation. • DataJoiner Replication Administration (DJRA). The administration workstation will be used to create and fill the Dprop control tables. It can be a separate Windows NT or Windows 95 box, or it can be any one of the target workstations. Probably the best solution is to install DJRA on a target workstation that will also be used for tests. The administration workstation is only required during the set-up phase and then to maintain, if necessary, the replication environment. It is not required by the run-time components. In the implementation described here, the connections between the different components of the solution use TCP/IP. If the source server were a DB2 for OS/390 host or an AS/400, the only difference would be the need to install DDCS or DB2 Connect (or DB2 UDB Enterprise Edition, since it includes DB2 Connect, or DataJoiner since it includes DDCS), either on a separate Windows NT server that would operate as a gateway (DB2 Connect Enterprise Edition), or on each target workstation (DB2 Connect Personal Edition).
276
The IBM Data Replication Solution
9.3.2 MS Jet Update Anywhere—Replication Design ASNJET will use a pull/push replication mode. Pull the updates from DB2 UDB towards Microsoft Access; push the updates from Microsoft Access towards DB2 UDB. In this replication scenario, the Dprop control tables must be located in a DB2 database. So we will create them in the source server. This is important because it means that the administration workstation will only need to have access to the source server. You can define all the replication sources and all the Subscriptions Sets even before you configure the target workstations. You can even let ASNJET create the target database and the target tables for you. If you do not create them yourself, ASNJET will create them automatically the first time it is run. The Dprop control tables are the same as for any other Dprop replication scenario, except that there are two additional tables: • ASN.IBMSNAP_SCHEMA_CHG: Used to signal modifications to a subscription. • ASN.IBMSNAP_SUBS_TGTS: Used by ASNJET to maintain the list of the row-replica table names. It enables ASNJET to automatically delete a row-replica table if the corresponding subscription definition was removed since the last synchronization.
9.4 Setting Up the System Environment In this section, we discuss the system topology needed to support this case study, and explain which steps are necessary to install and configure the corresponding components.
9.4.1 The System Topology In Figure 71 we show the overall picture for this scenario.
Case Study 4—Sales Force Automation, Insurance
277
Win NT
HEAD-OFFICE
Admin. Workstation Win 95 DB2 UDB Tables
CAE
CAPTURE
DJ RA
DPropR Control Tables
Win 95 CAE
SALES REP 1 Laptop
ASNJET
MS ACCESS Tables
APPLICATION
Win 95 CAE
SALES REP 2 Laptop
ASNJET
MS ACCESS Tables
APPLICATION
Win 95 CAE
A P P L I C A T I O N S
SALES REP 3 Laptop
ASNJET
MS ACCESS Tables
APPLICATION
Figure 71. Case Study 4—System Topology
From the system topology diagram shown above, you can see that ASNJET replaces the function of the Apply component. No additional functionality of DataJoiner is needed in this scenario. The control tables are located at the central DB2 UDB server, which acts as the master-copy for the mobile clients. To establish the database connectivity, DB2 CAE is implemented on the mobile clients. The sales representatives access their local copy of the
278
The IBM Data Replication Solution
customer information in MS Access through locally installed applications (for example, for claims handling, or updates of customer data). In the insurance company’s head-office, applications typically do batch-type processing of claims and policies, accessing the DB2 UDB database, which contains the information gathered from all the sales representatives. The replication administration workstation is also located in the head-office and basically contains the DJRA component.
9.4.2 Configuration Tasks This chapter details the implementation steps that you have to follow. Just use it as a cookbook. The first section gives you a summary implementation checklist, to be used as a reminder, and the second section explains the scenario’s specific steps in more detail. The steps that are not specific to a heterogeneous replication environment, such as the installation of DB2 UDB or CAE (DB2 Client Application Enabler), for example, are not detailed here, because they are already explained in the product documentations. Remark: You will notice that, unlike other Apply components, ASNJET does not require any bind operation. The only necessary binds are: • The bind for Capture on the source server. • The binds for CAE (from the administration workstation and target workstations towards the source server). Note that if CAE is at the same level of maintenance on all the workstations, the binds for CAE need only be done once. 9.4.2.1 Summary Implementation Checklist Some activities described in the General Implementation Guidelines (see Chapter 4, “General Implementation Guidelines” on page 61) are not necessary for this scenario. In particular, all the activities related to the DataJoiner product itself are not necessary. This means that the section called "Set Up the Database Middleware Server" is not needed. It is replaced by a section called "Set up the Database Connectivity between the Source Server and the Target Workstations". You can see the general implementation diagram in Figure 72.
Case Study 4—Sales Force Automation, Insurance
279
Implementation Guidelines
Setup Database Connectivity Source/Targets
Implement the Replication Subcomponents
Create the Replication Control Tables
Setup Replication Administration Workstation
Bind DProp CAPTURE (Not APPLY)
Figure 72. General Implementation Steps
We also assume that DB2 UDB is already installed on the source server, that you have already created the source database and the source tables, and that Microsoft Access is already installed on the target workstations. The summary implementation checklist is as follows: Set up database connectivity between the source and targets: • On the source server: • Set up the underlying communication subsystem (TCP/IP, ...) • Update the database manager configuration with the appropriate configuration settings • Set the DB2COMM environment variable • On each target workstation: • Set up the underlying communication subsystem (TCP/IP, ...) • Install CAE (DB2 Client Application Enabler) • Use the Client Configuration Assistant to configure the connectivity and to register the source database as an ODBC data source. • Connect to the source database and bind the db2cli.lst and db2ubind.lst files. If you install the same level of CAE on all the 280
The IBM Data Replication Solution
workstations, these binds need only be done once, from the first configured workstation. Implement the replication subcomponents (see Chapter 4.4.2, “Implement the Replication Subcomponents (Capture, Apply)” on page 71): • On the source server, since the Capture component of DProp is included in DB2 UDB for Windows NT, it is already installed. But you need to: • Modify the source database configuration to use LOGRETAIN ON • Increase the APPLHEAPSZ parameter • Perform a backup of the source database • On each target workstation, install the following software: • Microsoft DAO (Data Access Objects) Note: DAO is the programming interface to Microsoft Jet (the database engine that Microsoft Access uses). It provides a framework for directly accessing and manipulating database objects. • IBM DB2 DataPropagator for Microsoft Jet Set-up your replication administration workstation (see Chapter 4.4.3, “Set Up the Replication Administration Workstation” on page 71): • Install CAE. • Set-up the database connectivity between the administration workstation and the source server. No database connection is necessary between the administration workstation and the target workstations, because all the Dprop control tables will be located at the source server. • Install DJRA (DataJoiner Replication Administration). • Perform the DJRA set-up. • Create the directories where you will store the SQL scripts that will be generated by DJRA. Create the replication control tables (see Chapter 4.4.4, “Create the Replication Control Tables” on page 74): • On the administration workstation, use DJRA to generate and run an SQL script to create the Dprop control tables in the source server database. Bind DProp Capture (see Chapter 4.4.5, “Bind DProp Capture and DProp Apply” on page 74): • On the source server, bind the Capture component on the source database.
Case Study 4—Sales Force Automation, Insurance
281
Final status of your replication setup (see Chapter 4.4.6, “Status After Implementing the System Design” on page 76): • On the source server, create the join views over the source tables, to implement the subsetting technique. • On the administration workstation, use DJRA to generate and run SQL scripts to define the replication sources. • On the administration workstation, use DJRA to generate and run SQL scripts to define the replication targets, for only one target workstation. Then duplicate the SQL scripts, adapt the scripts for the other target workstations, and run the scripts. • On the source server, start Capture. • On each target workstation, create the password file that will be used by ASNJET to connect to the source server. • On each target workstation, start ASNJET. 9.4.2.2 Detailed Implementation Steps Follow the detailed steps indicated below.
Set-up Database Connectivity Source / Targets • On the source server: • Set up the underlying communication subsystem (TCP/IP, ...). Not detailed here. • Update the database manager configuration with the appropriate configuration settings: db2 update dbm cfg using SVCENAME=xxxx
• Set the DB2COMM environment variable: DB2COMM=TCPIP
• On each target workstation: • Set up the underlying communication subsystem (TCP/IP, ...). Not detailed here. • Install CAE (DB2 Client Application Enabler). Not detailed here. • Use the Client Configuration Assistant to configure the connectivity and to register the source database as an ODBC data source. Not detailed here. • Connect to the source database and bind the db2cli.lst and db2ubind.lst files. If you install the same level of CAE on all the
282
The IBM Data Replication Solution
workstations, these binds need only be done once, from the first configured workstation. To do this, select the DB2 Command Window, then go to the SQLLIB\BND directory, and use the following commands: db2 db2 db2 db2
connect to SJNTDWH1 user USERID using PASSWORD bind @db2cli.lst blocking all grant public bind @db2ubind.lst blocking all grant public terminate
Implement the Replication Subcomponents (See Chapter 4.4.2, “Implement the Replication Subcomponents (Capture, Apply)” on page 71) • On the source server, since the Capture component of DProp is included in DB2 UDB for Windows NT, it is already installed. But you need to: • Modify the source database configuration to use LOGRETAIN ON • Increase the APPLHEAPSZ parameter • Perform a backup of the source database To do this, use the following commands: db2 update database configuration for SJNTDWH1 using LOGRETAIN ON db2 update database configuration for SJNTDWH1 using APPLHEAPSZ 2048 db2 backup database SJNTDWH1
• On each target workstation, install the following software: • Microsoft DAO (Data Access Objects). (you can find it at the following address: http://www.nesbitt.com/bctech.html ). Not detailed here. Note: DAO is the programming interface to Microsoft Jet (the database engine that Microsoft Access uses). It provides a framework for directly accessing and manipulating database objects. • IBM DB2 DataPropagator for Microsoft Jet: During the installation of DB2 DataPropagator for Microsoft Jet, you will be prompted to enter a value for the ASNJETPATH environment variable. This variable indicates in which directory DB2 DataPropagator for Microsoft Jet will find the files it needs (the password file for example, see details below), and where it will create its own files (log, trace, ...). It is also in this directory that DB2 DataPropagator for Microsoft Jet will create the target Microsoft Access database (file DBSR0001.MDB for sales representative 1, for example), if the target database does not exist.
Case Study 4—Sales Force Automation, Insurance
283
For example, indicate the value C:\SQLLIB\BIN for the ASNJETPATH variable. If the Target workstation is running Windows NT, you can also create this variable as a Windows NT system variable. Remark: Be careful that the variable had perhaps already been defined as a user variable. If this is the case, remove the user variable definition. If the target workstation is running Windows 95, you can set the value for ASNJETPATH just before starting ASNJET, using the following command: set ASNJETPATH=C:\SQLLIB\BIN
Set Up the Replication Administration Workstation (See Chapter 4.4.3, “Set Up the Replication Administration Workstation” on page 71.) • Install DB2 CAE. • Set-up the database connectivity between the administration workstation and the source server. No database connection is necessary between the administration workstation and the target workstations, because all the Dprop control tables will be located at the source server. Do exactly as for the target workstations. Make sure the source database is registered as an ODBC data source. If it is not, add it using either the Client Configuration Assistant or the ODBC utility from the Windows NT’s Control Panel (the source database must be registered as a System DSN). Not detailed here. • Install DJRA (DataJoiner Replication Administration). • Perform the DJRA set-up. To do this, follow the following path: Start => Programs => DataJoiner for Windows NT => Replication => Replication Administration. The DB2 DataJoiner Replication Administration main panel is then displayed (see Figure 73):
284
The IBM Data Replication Solution
Figure 73. DB2 DataJoiner Replication Administration Main Panel
Select File => Preference, then select the Connection tab. Select the source database, then choose Modify, and enter the userid and password that will be used to connect to the source database, then select OK. Note: The userid and password must have been defined on the source server. Select OK again to return to DJRA’s main panel. • Create the directories where you will store the SQL scripts that will be generated by DJRA. For example: • ASNJET\SCRIPTS\CONTROL: Will contain the SQL script used to create the Dprop control tables (script generated by DJRA). • ASNJET\SCRIPTS\SOURCES: Will contain the SQL scripts used to define the source tables and the source views as replication sources (scripts generated by DJRA). • ASNJET\SCRIPTS\TARGET1: Will contain the SQL scripts used to create the subscription sets and the subscription members for sales representative 1 (scripts generated by DJRA). • ASNJET\SCRIPTS\TARGET2: Will contain the SQL scripts used to create the subscription sets and the subscription members for sales representative 2 (scripts copied from ASNJET\SCRIPTS\TARGET1 and then adapted).
Case Study 4—Sales Force Automation, Insurance
285
• ASNJET\SCRIPTS\TARGETx: .......
Create the Replication Control Tables (See Chapter 4.4.4, “Create the Replication Control Tables” on page 74.) • On the Administration Workstation, use DJRA to generate and run an SQL script to create the Dprop control tables in the source server database. To do this, select the Create Replication Control Tables option from DJRA’s main panel. The following panel is then displayed (see Figure 74):
Figure 74. Create Replication Control Tables
Select Generate SQL. DJRA opens an editor window and generates the SQL script to create the control tables. Check that you have the Satisfactory completion message at the end of the generated script. You can directly update the generated script, if you wish to change the names of the tablespaces, for example. Then select File, Save As... , and give a name to the generated SQL script. You can then run the SQL script by selecting the Run option that is present in the toolbar of the editor’s window. Then select the Cancel option from the Create Replication Control Tables panel. Another way to run an SQL script is to use the Run or Edit an SQL file option from DJRA’s main panel.
Bind DProp Capture (See Chapter 4.4.5, “Bind DProp Capture and DProp Apply” on page 74.) • On the source server, bind the Capture component on the source database. This activity must be done only once, after the control tables have been created.
286
The IBM Data Replication Solution
To do this, select the DB2 command window, then go to the SQLLIB\BND directory, and use the following commands: db2 connect to SJNTDWH1 user USERID using PASSWORD db2 bind @capture.lst isolation ur blocking all grant public db2 terminate
9.5 Implementing the Replication Design After establishing the infrastructure for the replication solution, we will now discuss how to set up the source registrations, and define the subscriptions to meet the requirements for mobile update anywhere replication.
9.5.1 Creating Source Views to Enable Subsetting On the source server, create the join views over the source tables, to implement the subsetting technique. We want to be able to transmit, to a specific Microsoft Access database, only the rows that relate to the customers of a specific agency. The customer number is present in all the tables, but the AGENCY column is present only in the CUSTOMERS table. So we must create join views, joining the CUSTOMERS table with all the source tables that do not have the AGENCY code, and it is these views that will be used as replication sources: -- View VCONTRACTS (Contracts + Agency code): CREATE VIEW IWH.VCONTRACTS (CONTRACT, CONTYPE, CUSTNO, LIMITED, BASEFARE, TAXES, CREDATE, AGENCY) AS SELECTA.CONTRACT, A.CONTYPE, A.CUSTNO, A.LIMITED, A.BASEFARE, A.TAXES, A.CREDATE, B.AGENCY FROM IWH.CONTRACTS A, IWH.CUSTOMERS B WHERE A.CUSTNO = B.CUSTNO ; -- View VVEHICLES (Vehicles + Agency code): CREATE VIEW IWH.VVEHICLES (PLATENUM, CONTRACT, CUSTNO, BRAND, MODEL, COACHWORK, ENERGY, POWER, ENGINEID, VALUE, FACTORDATE, ALARM, ANTITHEFT, AGENCY) AS SELECTA.PLATENUM, A.CONTRACT,
Case Study 4—Sales Force Automation, Insurance
287
A.CUSTNO, A.BRAND, A.MODEL, A.COACHWORK, A.ENERGY, A.POWER, A.ENGINEID, A.VALUE, A.FACTORDATE, A.ALARM, A.ANTITHEFT, B.AGENCY FROM IWH.VEHICLES A, IWH.CUSTOMERS B WHERE A.CUSTNO = B.CUSTNO ; -- View VACCIDENTS (Accidents + Agency code): CREATE VIEW IWH.VACCIDENTS (CUSTNO, ACCNUM, TOWN, REPAIRCOST, STATUS, ACCDATE, AGENCY) AS SELECTA.CUSTNO, A.ACCNUM, A.TOWN, A.REPAIRCOST, A.STATUS, A.ACCDATE, B.AGENCY FROM IWH.ACCIDENTS A, IWH.CUSTOMERS B WHERE A.CUSTNO = B.CUSTNO ;
Using these join views as replication sources, we will be able to define subscriptions with rows-selection predicates such as: (WHERE) AGENCY = 25
Remark: In join views used as replication sources (VCONTRACTS for example), all the copied columns must come from only one of the tables referenced in the join view. For example, the columns we will define in the target CONTRACTS row-replica table will all come from the source CONTRACTS table. The other columns in the join view (the AGENCY column) can only be referenced in the subscription predicate. This means we are not allowed to include the AGENCY column in the target CONTRACTS table. And it also means that the replication from Microsoft Access to DB2 UDB is not able to update more than one component of the join view.
288
The IBM Data Replication Solution
9.5.2 Registering the Replication Sources On the administration workstation, use DJRA to generate and run SQL scripts to define the replication sources (see the details below). You must first define the physical tables as replication sources before you can define the join views as replication sources. For our scenario, we performed the following tasks: • Define the CUSTOMERS, CONTRACTS, VEHICLES and ACCIDENTS tables as replication sources. • Define the VCONTRACTS, VVEHICLES and VACCIDENTS views as replication sources. When you generate an SQL script, always choose a meaningful script name so that you will be able to remember the purpose of the script. For example, we generated the following scripts: regcust.sql, regcont.sql, regvehi.sql, regacci.sql, regvcont.sql, regvvehi.sql and regvacci.sql (reg stands for "registration", which is a synonym for "define a replication source"). 9.5.2.1 Registering the Contracts Table Select the Define One Table as a Replication Source option from DJRA’s main panel. The following panel is then displayed (see Figure 75):
Figure 75. Define One Table as a Replication Source
Case Study 4—Sales Force Automation, Insurance
289
Indicate the source table qualifier (IWH), and press the Build List Using Filter button. Then choose the CONTRACTS table from the list of source tables, specify that you will need all the source columns, that you want to capture both before-images and after-images, that you want to capture the updates as updates, and choose a standard conflict detection level. The panel should now look like this (see Figure 76):
Figure 76. Define the CONTRACTS Table as a Replication Source
Select Generate SQL to generate the regcont.sql script. See the generated SQL script in Appendix F.2, “SQL Script to Define the CONTRACTS Table as a Replication Source” on page 383. Save and run the generated SQL script, then select Cancel to come back to DJRA’s main panel. Remarks: • If you want DJRA to generate an SQL script that uses your own naming conventions (names of the CD tables for example), you can press the Edit Logic button before you generate the SQL script. • For the CUSTOMERS table, we chose exactly the same parameters, except for the update capture policy. We decided that since a customer
290
The IBM Data Replication Solution
can move from one agency to another and since the AGENCY field will be used in the filtering predicate, any update of the AGENCY field should be captured as a delete-and-insert pair. That way, if a customer moves from one agency to another, the "old" agency will receive the delete row, and the "new" agency will receive the insert row. If this option is not chosen, the "new" agency will receive the new record, but the old record will remain in the "old" agency. Of course, depending on your real business needs, you will choose one logic or the other, because perhaps you would not want to remove the history of "old" customers from the "old" agency. 9.5.2.2 Registering the VContracts View Select the Define DB2 Views as a Replication Sources option from DJRA’s main panel. The following panel is then displayed (see Figure 77):
Figure 77. Define DB2 Views as Replication Sources
Indicate the source view qualifier (IWH), and press the Build List Using Filter button. The following panel is then displayed (see Figure 78):
Case Study 4—Sales Force Automation, Insurance
291
Figure 78. Define DB2 Views as Replication Sources - Continued...
Select the IWH.VCONTRACTS view, then select Generate SQL. See the generated SQL script in Appendix F.3, “SQL Script to Define the VCONTRACTS View as a Replication Source” on page 385. Save and run the generated SQL script, then select Cancel to come back to DJRA’s main panel.
9.5.3 Defining the Replication Subscriptions So far we have defined the replication sources (tables and views). We will now use DJRA to define the replication targets: On the administration workstation, use DJRA to generate and run SQL scripts to define the replication targets for the first target workstation (see the details below). After you have done this, you will duplicate the SQL scripts, adapt the scripts for the other target workstations, and run the scripts. The definition of a subscription (that is, a replication target) using DJRA is a two-step process. You must first define an empty subscription set, and then you must add members to this empty subscription set. For this scenario, we will create one subscription set for each table, so we will have only one member per subscription set. An alternative would have been, for example, to create one subscription set including the four members. The performance of the replication would have been a little bit better, but you must
292
The IBM Data Replication Solution
be aware that all the tables in the same set "share the same fate". For example, if one needs a full-refresh, they all need a full-refresh; and if one is in error, they are all considered in error. Remark: If you intend to define referential integrity constraints between the Microsoft Access tables, which is very likely since it is an update-anywhere environment, then you must have the same subscription set for all the tables that are linked by referential integrity constraints. In general, you will define the same referential integrity constraints between the Microsoft Access tables as the ones you have defined between the DB2 UDB tables. For example, if we had defined referential integrity constraints between our four tables: CUSTOMERS, CONTRACTS, VEHICLES and ACCIDENTS —using the CUSTNO field as a foreign key—we would create only one subscription set, CUST0001. Warning: Neither DJRA nor ASNJET will automatically create the referential integrity constraints between the Microsoft Access tables. You will have to define these constraints yourselves. This is important because we will see later that if you do not create the Microsoft Access tables yourself, ASNJET will create them for you, the first time it is run, but it will not create the constraints. In that case you will have to add the referential integrity constraints after ASNJET has created the tables. 9.5.3.1 Defining Subscriptions for the First Sales Representative Select the Create Empty Subscription Sets option from DJRA’s main panel. The following panel is then displayed (see Figure 79):
Case Study 4—Sales Force Automation, Insurance
293
Figure 79. Create Empty Subscription Sets
Select the Microsoft Jet check box for Target servers , and enter the name of the Microsoft Access database, for example, DBSR0001 (for DataBase for Sales Representative 0001 ). Each sales representative will have only one Microsoft Access database. The control server must be the same as the source server. Choose the Apply Qualifier. It must be unique in the replication network. Choose for example AQSR0001 (for Apply Qualifier for Sales Representative 0001). Each sales representative will use only one Apply Qualifier. Set name: We decided to create one set per target table, so you can choose a set name such as CUST0001 (for Set for the CUSTOMERS table for sales representative 0001 ). Subscription set timing: Choose a small value for testing purposes (2 minutes, for example). In production, the ASNJET program will most probably be run with the MOBILE option, and so this frequency information will not be used. Your DJRA panel should now look like this (see Figure 80):
294
The IBM Data Replication Solution
Figure 80. Create Empty Subscription Sets - Continued ...
Select Generate SQL. See the generated SQL script in Appendix F.4, “SQL Script to Create the CUST0001 Empty Subscription Set” on page 386. Save and run the generated SQL script. Always remember to give a meaningful script name (such as SETCUST.SQL, for example). Then select Cancel to come back to DJRA’s main panel. Repeat the same operations for the three other subscription sets: CONT0001 (for CONTRACTS), VEHI0001 (for VEHICLES) and ACCI0001 (for ACCIDENTS). From DJRA’s main panel, select the Add a member to Subscription Sets option. The following panel is then displayed (see Figure 81):
Case Study 4—Sales Force Automation, Insurance
295
Figure 81. Add a Member in Subscription Sets
Select SJNTDWH1 as control server (same as source server), then click on the top Build List button. The four subscription sets are displayed. Select CONT0001. You will receive a message saying Target structure must be row replica for server DBSR0001. Simply answer OK. Then select the second Build List button. This will display the list of defined replication sources. Select IWH.VCONTRACTS (VCONTRACTS appears twice in the list. Simply select the first one). Do not select IWH.CONTRACTS. Specify that you want all columns, and indicate the target table characteristics: • Qualifier: IWH • Target table name: CONTRACTS (it does not need to be VCONTRACTS) • Target structure: Row-replica
296
The IBM Data Replication Solution
• And since the source is a view, you must indicate the name(s) of the primary key column(s): CONTRACT+ Then enter the filtering predicate in the where clause field: (AGENCY = 25)
The screen should now look like this (see Figure 82):
Figure 82. Add a Member in Subscription Sets - Continued...
Select Generate SQL. See the generated SQL script in Appendix F.5, “SQL Script to Add a Member to the CONT0001 Empty Subscription Set” on page 387. Save and run the generated SQL script. Select a meaningful script name (such as MBRCONT.SQL, for example). Then select Cancel to come back to DJRA’s main panel. Repeat the same operations to add members for VEHICLES, ACCIDENTS and CUSTOMERS.
Case Study 4—Sales Force Automation, Insurance
297
Warning: For the CUSTOMERS table, there is a difference: The source is not a view, it is the base table itself. So, in the list of source tables, select the IWH.CUSTOMERS table. The SQL script generated for the CUSTOMERS table will, of course, have some differences compared to the other ones. The SQL scripts generation using DJRA is now finished. 9.5.3.2 Defining Subscriptions for the Other Sales Representatives We have seen the definition of the replication targets for the first target workstation. We will now explain how we can easily copy the definitions we have done, for the other target workstations. So far, we have generated and run the SQL scripts to define the subscription sets and subscription members for the first sales representative. We stored these SQL scripts in directory ASNJET\SCRIPTS\TARGET1: • SETCUST.SQL: Subscription set for the target CUSTOMERS table • SETCONT.SQL: Subscription set for the target CONTRACTS table • SETVEHI.SQL: Subscription set for the target VEHICLES table • SETACCI.SQL: Subscription set for the target ACCIDENTS table • MBRCUST.SQL: Subscription member for the target CUSTOMERS table • MBRCONT.SQL: Subscription member for the target CONTRACTS table • MBRVEHI.SQL: Subscription member for the target VEHICLES table • MBRACCI.SQL: Subscription member for the target ACCIDENTS table Now, we will create the equivalent SQL scripts for sales representative 2. To do this, use the following steps: • Copy the content of ASNJET\SCRIPTS\TARGET1 towards ASNJET\SCRIPTS\TARGET2. • Update SETCUST.SQL: Replace the string ’0001’ by ’0002’ everywhere. • Update SETCONT.SQL: Replace the string ’0001’ by ’0002’ everywhere. • Update SETVEHI.SQL: Replace the string ’0001’ by ’0002’ everywhere. • Update SETACCI.SQL: Replace the string ’0001’ by ’0002’ everywhere. • Update MBRCUST.SQL: • Replace the string ’0001’ by ’0002’ everywhere. • Find the filtering predicate ’AGENCY = 25’ (there should be only one occurrence) and replace the 25 by the appropriate value for sales representative 2.
298
The IBM Data Replication Solution
• Update MBRCONT.SQL: • Replace the string ’0001’ by ’0002’ everywhere. • Find the filtering predicate ’AGENCY = 25’ (Important: There are several occurrences) and replace the 25 by the appropriate value for sales representative 2. • Update MBRVEHI.SQL and MBRACCI.SQL exactly like you did for MBRCONT.SQL. • Then, run all these new SQL scripts: SETxxxx.SQL first and MBRxxxx.SQL second. To run the scripts you can use either the Run or Edit an SQL File option from DJRA’s main panel, or the following commands from a DB2 command window: db2 -tvf C:\ASNJET\SCRIPTS\TARGET2\SETCUST.SQL |more db2 -tvf C:\ASNJET\SCRIPTS\TARGET2\SETCONT.SQL |more and so on...
9.5.3.3 Finalizing the Replication Setup So far we have defined all the content of the Dprop control tables. We will now see the remaining activities that are necessary to complete the setup. • On the source server, start Capture. Go to the DB2 command window, and type the following command: asnccp SJNTDWH1 cold trace
Or you can configure the Capture program to run as an NT service. Not detailed here—see the Replication Guide and Reference, S95H-0999 book for more information. • On each target workstation, create the password file that will be used by ASNJET to connect to the source server. The password file must be created in the directory from where ASNJET will be started. It must also be the directory indicated by the ASNJETPATH variable. The simplest way is to create it in C:\SQLLIB\BIN, but be careful if you uninstall CAE and reinstall it later, because you will have to check that the password file is still present in C:\SQLLIB\BIN. The name of the password file must be Apply_Qualifier.PWD So, in our scenario, it will be AQSR0001.PWD for sales representative 1, AQSR0002.PWD for sales representative 2, and so on.
Case Study 4—Sales Force Automation, Insurance
299
The content will be identical for all the sales representatives: SERVER=SJNTDWH1 USER=USERID PWD=PASSWORD
In this expression, USERID and PASSWORD are the userid and password that ASNJET will use to connect to the source server (SJNTDWH1). • On each target workstation, start ASNJET. To do this, go to the DB2 command window, and type the following command (for sales representative 1): ASNJET AQSR0001 SJNTDWH1 NOTIFY MOBILE TRCFLOW
For sales representative 2, the command would be the same except for the Apply Qualifier. Indicate AQSR0002 instead of AQSR0001. The MOBILE parameter tells ASNJET that it must process all the eligible subscription sets only once and then stop. If you do not use the MOBILE parameter (NOMOBILE is the default value), you will have to stop ASNJET yourself. Use one of the two possible methods: • Ctrl-Break • Or the following command (for sales representative 1): ASNJSTOP AQSR0001
9.5.4 Focus on Major Pitfalls • Increase the value of the APPLHEAPSZ parameter in the Source database configuration, to avoid an SQLCODE -954, SQLSTATE 57011 error: Not enough space is available in the application heap to process the statement.
• Before starting ASNJET, you should close any row-replica table that you previously opened and updated through Microsoft Access, so that the updates you made can be replicated by ASNJET. • When you are replicating from a join view (VCONTRACTS for example), all the copied columns must come from only one of the tables referenced in the join view. For example, the columns we defined in the target CONTRACTS row-replica table all come from the source CONTRACTS table. The other columns in the join view (the AGENCY column) can only be referenced in the subscription predicate. This means we are not allowed to include the AGENCY column in the target CONTRACTS table. • Run ASNJET with the NOTIFY parameter so that detected conflicts are reported in the conflict table in the Microsoft Access target database.
300
The IBM Data Replication Solution
• Remember that the level of conflict detection is equivalent to standard, and the CONFLICT_LEVEL associated with the source table registration is ignored. But there are differences compared to a homogeneous (DB2 to DB2) update-anywhere scenario: • The conflict detection is performed at row level only (row by row, not transaction by transaction). • If a conflict is detected, the source always wins. • If an insert or update collides with a delete, the delete wins regardless of source or target. This is not considered as a conflict. • Check that you have followed these restrictions: • A row-replica table can only have one source, either a user table or a registered view. • A row-replica table cannot be a multi-site union. • A row-replica table must be referenced by one and only one subscription set. • The SQL statements or CALL statements features are not supported. • The source table primary key columns must be one of the following data types: INTEGER, SMALLINT, DECIMAL, CHAR, VARCHAR, DATE, TIME. • Choose different Apply Qualifier names, subscription sets names, and target database names, for all the targets. In this scenario we chose: • AQSRxxxx for the Apply Qualifiers • DBSRxxxx for the target databases • CONTxxxx, CUSTxxxx, VEHIxxxx, and ACCIxxxx for the subscription sets In these expressions, xxxx is a number that uniquely identifies the target.
9.6 Replication Results for Sales Representative 1 For sales representative 1, we want to replicate only the data that is related to AGENCY 25. Let’s have a look at the source tables and the Dprop control tables just before we start the ASNJET program for the first time. In fact, two laptops (for sales representatives 1 and 2) have been configured at that time, both having the same subsetting predicate (AGENCY 25). Capture is running on the source DB2 UDB database.
Case Study 4—Sales Force Automation, Insurance
301
9.6.1 Contents of the Source Tables at the Beginning The source tables (and views) contain the following data for AGENCY 25: CUSTOMERS table: db2 select CUSTNO, LNAME, FNAME, AGENCY, SALESREP from IWH.CUSTOMERS where AGENCY = 25 CUSTNO -------00000001 00000003 00000004 00000007 00000008 00000009 00000014 00000015 00000017 00000018
LNAME -------------------SMITH HARRIS LENKE MUSSET GOLDRING PURNELL LOSA BARRON YU LI
FNAME AGENCY --------------- ----------John 25 Simon 25 Christian 25 Cecile 25 Rob 25 Micks 25 Veronique 25 Elsa 25 Percy 25 Shu 25
SALESREP -------250001. 250002. 250001. 250001. 250001. 250002. 250001. 250002. 250001. 250002.
VCONTRACTS view: db2 select CONTRACT, CUSTNO, BASEFARE, CREDATE, AGENCY from IWH.VCONTRACTS where AGENCY = 25 CONTRACT ----------1 3 4 7 8 9 14 15 17 18
CUSTNO BASEFARE CREDATE AGENCY -------- --------- ---------- ----------00000001 1250.00 05/25/1998 25 00000003 2500.00 09/13/1998 25 00000004 1250.00 10/26/1998 25 00000007 1250.00 12/08/1998 25 00000008 1000.00 12/12/1998 25 00000009 1250.00 12/25/1998 25 00000014 1000.00 02/05/1999 25 00000015 1250.00 02/08/1999 25 00000017 1250.00 02/14/1999 25 00000018 2500.00 02/15/1999 25
VVEHICLES view: db2 select PLATENUM, CONTRACT, CUSTNO, BRAND, MODEL, AGENCY from IWH.VVEHICLES where AGENCY = 25
302
The IBM Data Replication Solution
PLATENUM CONTRACT CUSTNO BRAND ------------ ---------- -------- ---------CA-000000001 1 00000001 VOLVO CA-000000003 3 00000003 RENAULT CA-000000004 4 00000004 TOYOTA CA-000000007 7 00000007 GM CA-000000008 8 00000008 MAZDA CA-000000009 9 00000009 MERCEDES CA-000000014 14 00000014 CHRYSLER CA-000000015 15 00000015 JAGUAR CA-000000017 17 00000017 RENAULT CA-000000018 18 00000018 MERCEDES
MODEL AGENCY -------- -------440 25 LAGUNA 25 V2 25 1000 25 COROLA 25 XTRA 25 VOYAGER 25 XXS 25 SAFRANE 25 300 25
VACCIDENTS view: db2 select * from IWH.VACCIDENTS where AGENCY = 25 CUSTNO ACCNUM TOWN REPAIRCOST -------- ------- --------------- -----------00000009 1. SAN JOSE 0.00 00000018 1. SANTA CRUZ 7500.00
STATUS -----E R
ACCDATE AGENCY ---------- ----------01/02/1999 25 03/05/1999 25
9.6.2 Contents of the Main Control Tables at the Beginning (Only the most interesting columns are shown below.) ASN.IBMSNAP_REGISTER table: SOURCE_ SOURCE_ TABLE VIEW_QUAL ---------- --------CUSTOMERS 0 CONTRACTS 0 VEHICLES 0 ACCIDENTS 0 VACCIDENTS 1 VACCIDENTS 2 VVEHICLES 1 VVEHICLES 2 VCONTRACTS 1 VCONTRACTS 2
CD_TABLE ----------CDCUSTOMERS CDCONTRACTS CDVEHICLES CDACCIDENTS VACCIDENTSA VACCIDENTSB VVEHICLESA VVEHICLESB VCONTRACTSA VCONTRACTSB
PHYS_CHANGE _TABLE ----------CDCUSTOMERS CDCONTRACTS CDVEHICLES CDACCIDENTS CDCUSTOMERS CDACCIDENTS CDCUSTOMERS CDVEHICLES CDCUSTOMERS CDCONTRACTS
BEFORE_IMG _PREFIX ---------X X X X -
CONFLICT _LEVEL -------1 1 1 1 0 0 0 0 0 0
PARTIT. KEYS_CHG -------Y N N N N N N N N N
Case Study 4—Sales Force Automation, Insurance
303
Notice that there is 1 row in the REGISTER table for each source physical table, and 2 rows for each view defined as a Replication Source. The SOURCE_VIEW_QUAL column indicates whether the row is for a table or for a view. ASN.IBMSNAP_SUBS_SET table: APPLY_QUAL SET_NAME WHOS_ON SOURCE_ _FIRST SERVER ---------- --------- ------- -------AQSR0001 CUST0001 S SJNTDWH1 AQSR0001 CUST0001 F MSJET AQSR0001 CONT0001 S SJNTDWH1 AQSR0001 CONT0001 F MSJET AQSR0001 VEHI0001 S SJNTDWH1 AQSR0001 VEHI0001 F MSJET AQSR0001 ACCI0001 S SJNTDWH1 AQSR0001 ACCI0001 F MSJET AQSR0002 CUST0002 S SJNTDWH1 AQSR0002 CUST0002 F MSJET AQSR0002 CONT0002 S SJNTDWH1 AQSR0002 CONT0002 F MSJET AQSR0002 VEHI0002 S SJNTDWH1 AQSR0002 VEHI0002 F MSJET AQSR0002 ACCI0002 S SJNTDWH1 AQSR0002 ACCI0002 F MSJET
SOURCE_ ALIAS -------SJNTDWH1 DBSR0001 SJNTDWH1 DBSR0001 SJNTDWH1 DBSR0001 SJNTDWH1 DBSR0001 SJNTDWH1 DBSR0002 SJNTDWH1 DBSR0002 SJNTDWH1 DBSR0002 SJNTDWH1 DBSR0002
TARGET_ SERVER -------MSJET SJNTDWH1 MSJET SJNTDWH1 MSJET SJNTDWH1 MSJET SJNTDWH1 MSJET SJNTDWH1 MSJET SJNTDWH1 MSJET SJNTDWH1 MSJET SJNTDWH1
TARGET_ ALIAS -------DBSR0001 SJNTDWH1 DBSR0001 SJNTDWH1 DBSR0001 SJNTDWH1 DBSR0001 SJNTDWH1 DBSR0002 SJNTDWH1 DBSR0002 SJNTDWH1 DBSR0002 SJNTDWH1 DBSR0002 SJNTDWH1
In the SUBS_SET table, notice that there are 2 rows for each subscription set. The row with a WHOS_ON_FIRST value of "F" represents the replication from Microsoft Access towards DB2 UDB, and the row with a WHOS_ON_FIRST value of "S" represents the replication from DB2 UDB towards Microsoft Access. You can also notice that the MSJET string is used as a generic database name for Microsoft Access databases, and the real name of the Microsoft Access database is indicated in the SOURCE_ALIAS and TARGET_ALIAS columns. ASN.IBMSNAP_SUBS_MEMBR table: APPLY_QUAL SET_NAME WHOS SOURCE_ SOURCE TARGET_ TARG._ _ON_ TABLE _VIEW TABLE STRUCT FIRST _QUAL ---------- --------- ----- ---------- ------ --------- -----AQSR0001 CUST0001 S CUSTOMERS 0 CUSTOMERS 9 AQSR0001 CUST0001 F CUSTOMERS 0 CUSTOMERS 1 AQSR0001 CONT0001 S VCONTRACTS 1 CONTRACTS 9 AQSR0001 CONT0001 S VCONTRACTS 2 CONTRACTS 9
304
The IBM Data Replication Solution
PREDICATES
------------(AGENCY = 25) (AGENCY = 25) (AGENCY = 25)
AQSR0001 AQSR0001 AQSR0001 AQSR0001 AQSR0001 AQSR0001 AQSR0001 AQSR0002 AQSR0002 AQSR0002 AQSR0002 AQSR0002 AQSR0002 AQSR0002 AQSR0002 AQSR0002 AQSR0002 AQSR0002
CONT0001 VEHI0001 VEHI0001 VEHI0001 ACCI0001 ACCI0001 ACCI0001 CUST0002 CUST0002 CONT0002 CONT0002 CONT0002 VEHI0002 VEHI0002 VEHI0002 ACCI0002 ACCI0002 ACCI0002
F S S F S S F S F S S F S S F S S F
CONTRACTS VVEHICLES VVEHICLES VEHICLES VACCIDENTS VACCIDENTS ACCIDENTS CUSTOMERS CUSTOMERS VCONTRACTS VCONTRACTS CONTRACTS VVEHICLES VVEHICLES VEHICLES VACCIDENTS VACCIDENTS ACCIDENTS
0 1 2 0 1 2 0 0 0 1 2 0 1 2 0 1 2 0
CONTRACTS VEHICLES VEHICLES VEHICLES ACCIDENTS ACCIDENTS ACCIDENTS CUSTOMERS CUSTOMERS CONTRACTS CONTRACTS CONTRACTS VEHICLES VEHICLES VEHICLES ACCIDENTS ACCIDENTS ACCIDENTS
1 9 9 1 9 9 1 9 1 9 9 1 9 9 1 9 9 1
(AGENCY (AGENCY (AGENCY (AGENCY (AGENCY (AGENCY (AGENCY (AGENCY (AGENCY (AGENCY (AGENCY -
= 25) = 25) = 25) = 25) = 25) = 25) = 25) = 25) = 25) = 25) = 25)
In the SUBS_SET table, notice that the TARGET_STRUCTURE column has a new value of 9 to indicate Microsoft Access row-replica tables. If you look at the rows that deal with CONTRACTS, for example, you can also notice that DJRA was clever enough to understand that the replication from DB2 UDB towards Microsoft Access should use the view as source and the real table as target, whereas the other way (Access towards DB2) should use the real table as source and also the real table as target. The target is not a view. And finally, notice that the subsetting predicate is only present for the DB2 towards Access replication. ASN.IBMSNAP_PRUNCNTL table: TARGET TARGET_ _SERVER TABLE
SYNCH SOURCE_ POINT TABLE
------MSJET MSJET MSJET MSJET MSJET MSJET MSJET MSJET MSJET MSJET MSJET
-----
--------CUSTOMERS CONTRACTS CONTRACTS VEHICLES VEHICLES ACCIDENTS ACCIDENTS CUSTOMERS CONTRACTS CONTRACTS VEHICLES
---------CUSTOMERS VCONTRACTS VCONTRACTS VVEHICLES VVEHICLES VACCIDENTS VACCIDENTS CUSTOMERS VCONTRACTS VCONTRACTS VVEHICLES
SOURCE _VIEW _QUAL -----0 1 2 1 2 1 2 0 1 2 1
APPLY_ QUAL
SET_NAME CNTL_ SERVER
-------AQSR0001 AQSR0001 AQSR0001 AQSR0001 AQSR0001 AQSR0001 AQSR0001 AQSR0002 AQSR0002 AQSR0002 AQSR0002
-------CUST0001 CONT0001 CONT0001 VEHI0001 VEHI0001 ACCI0001 ACCI0001 CUST0002 CONT0002 CONT0002 VEHI0002
TARG. STRUC
-------- ----SJNTDWH1 9 SJNTDWH1 9 SJNTDWH1 9 SJNTDWH1 9 SJNTDWH1 9 SJNTDWH1 9 SJNTDWH1 9 SJNTDWH1 9 SJNTDWH1 9 SJNTDWH1 9 SJNTDWH1 9
Case Study 4—Sales Force Automation, Insurance
305
MSJET MSJET MSJET
VEHICLES ACCIDENTS ACCIDENTS -
VVEHICLES VACCIDENTS VACCIDENTS
2 AQSR0002 VEHI0002 SJNTDWH1 1 AQSR0002 ACCI0002 SJNTDWH1 2 AQSR0002 ACCI0002 SJNTDWH1
9 9 9
ASN.IBMSNAP_SCHEMA_CHG table: APPLY_QUAL ---------AQSR0001 AQSR0001 AQSR0001 AQSR0001 AQSR0002 AQSR0002 AQSR0002 AQSR0002
SET_NAME -------CONT0001 ACCI0001 VEHI0001 CUST0001 CUST0002 CONT0002 VEHI0002 ACCI0002
LAST_CHANGED -------------------------1999-03-17-11.26.24.906000 1999-03-17-11.28.50.343000 1999-03-17-11.30.40.250000 1999-03-17-11.32.10.359001 1999-03-17-15.00.29.953000 1999-03-17-15.00.46.843001 1999-03-17-15.00.59.296001 1999-03-17-15.01.09.953000
ASN.IBMSNAP_SUBS_TGTS is empty.
ASN.IBMSNAP_TRACE table: OPERATION --------INIT PARM
DESCRIPTION ---------------------------------------------------------ASN0100I: The Capture program initialization is successful ASN0103I: The Capture program started with SERVER_NAME SJNTDWH1; the START_TYPE is COLD00000000000084..
9.6.3 Start ASNJET to Perform the Initial Full-Refresh Start ASNET to perform the initial full-refresh for sales representative 1 by entering the following command from the laptop: ASNJET AQSR0001 SJNTDWH1 NOTIFY MOBILE TRCFLOW
9.6.4 Results of the Initial Full-Refresh Since ASNJET was started with the MOBILE option, it processed all the subscriptions only once, and then it stopped. It first created the target Microsoft Access database (DBSR0001), then it created the target tables inside this target database, and performed the initial full-refresh.
306
The IBM Data Replication Solution
A first look at the content of the ASN.IBMSNAP_APPLYTRAIL table shows that everything worked fine (STATUS = 0 and SET_INSERTED is correct): (Not all the columns are shown below.) APPLY_ QUAL -------AQSR0001 AQSR0001 AQSR0001 AQSR0001 AQSR0001 AQSR0001 AQSR0001 AQSR0001
SET_NAME WHOS _ON_ FIRST -------- ----CUST0001 S CUST0001 F CONT0001 S CONT0001 F ACCI0001 S ACCI0001 F VEHI0001 S VEHI0001 F
MASS EFF SET_ STAT SYNCHPOINT SOURCE_ TARGET_ DEL. MBR INS. SERVER SERVER ----Y N Y N Y N Y N
--- ---- ---- ----------1 10 0 0 0 0 x’30..3035’ 1 10 0 0 0 0 x’30..3039’ 1 2 0 0 0 0 x’30..3133’ 1 10 0 0 0 0 x’30..3137’
-------SJNTDWH1 MSJET SJNTDWH1 MSJET SJNTDWH1 MSJET SJNTDWH1 MSJET
-------MSJET SJNTDWH1 MSJET SJNTDWH1 MSJET SJNTDWH1 MSJET SJNTDWH1
Notice that the EFFECTIVE_MEMBERS column is equal to 0 for the replication from Microsoft Access towards DB2 UDB. When you look at the ASN.IBMSNAP_TRACE table, you can see that Capture has been triggered by ASNJET to start capturing the updates for the source tables (see the GOCAPT messages): OPERATION --------INIT PARM GOCAPT GOCAPT GOCAPT GOCAPT GOCAPT GOCAPT GOCAPT
DESCRIPTION ---------------------------------------------------------ASN0100I: The Capture program initialization is successful The Capture program started with SERVER_NAME SJNTDWH1; ... Change Capture started for ... table name is CUSTOMERS ... Change Capture started for ... table name is CUSTOMERS ... Change Capture started for ... table name is CONTRACTS ... Change Capture started for ... table name is CUSTOMERS ... Change Capture started for ... table name is ACCIDENTS ... Change Capture started for ... table name is CUSTOMERS ... Change Capture started for ... table name is VEHICLES ...
There are several GOCAPT messages for table CUSTOMERS because it is used in several views registrations. The ASN.IBMSNAP_PRUNCNTL table also shows that Capture has started capturing the updates (the fields SYNCHTIME and SYNCHPOINT are no longer NULL):
Case Study 4—Sales Force Automation, Insurance
307
TARGET_ SERVER ------MSJET MSJET MSJET MSJET MSJET MSJET MSJET MSJET MSJET MSJET MSJET MSJET MSJET MSJET
TARGET_ TABLE --------CUSTOMERS CONTRACTS CONTRACTS VEHICLES VEHICLES ACCIDENTS ACCIDENTS CUSTOMERS CONTRACTS CONTRACTS VEHICLES VEHICLES ACCIDENTS ACCIDENTS
SYNCHTIME
SYNCHPOINT
----------1999-03-.. 1999-03-.. 1999-03-.. 1999-03-.. 1999-03-.. 1999-03-.. 1999-03-.. -
--------------x’0..08497B115’ x’0..08497C33B’ x’0..08497C4E9’ x’0..08497EF7F’ x’0..08497F13D’ x’0..08497D95D’ x’0..08497DB0B’ x’0..08497B115’ x’0..08497C33B’ x’0..08497C4E9’ x’0..08497EF7F’ x’0..08497F13D’ x’0..08497D95D’ x’0..08497DB0B’
SOURCE_ TABLE ---------CUSTOMERS VCONTRACTS VCONTRACTS VVEHICLES VVEHICLES VACCIDENTS VACCIDENTS CUSTOMERS VCONTRACTS VCONTRACTS VVEHICLES VVEHICLES VACCIDENTS VACCIDENTS
APPLY_ QUAL -------AQSR0001 AQSR0001 AQSR0001 AQSR0001 AQSR0001 AQSR0001 AQSR0001 AQSR0002 AQSR0002 AQSR0002 AQSR0002 AQSR0002 AQSR0002 AQSR0002
SET_NAME -------CUST0001 CONT0001 CONT0001 VEHI0001 VEHI0001 ACCI0001 ACCI0001 CUST0002 CONT0002 CONT0002 VEHI0002 VEHI0002 ACCI0002 ACCI0002
The ASN.IBMSNAP_SCHEMA_CHG table has also been updated. Only the rows for sales representative 2 remain, all the others have been removed: APPLY_QUAL -----------------AQSR0002 AQSR0002 AQSR0002 AQSR0002
SET_NAME -----------------CUST0002 CONT0002 VEHI0002 ACCI0002
LAST_CHANGED -------------------------1999-03-17-15.00.29.953000 1999-03-17-15.00.46.843001 1999-03-17-15.00.59.296001 1999-03-17-15.01.09.953000
And the ASN.IBMSNAP_SUBS_TGTS, that was empty before the full-refresh, now contains 4 rows: APPLY_ QUAL -------AQSR0001 AQSR0001 AQSR0001 AQSR0001
SET_NAME WHOS _ON_ FIRST -------- ----CUST0001 S CONT0001 S ACCI0001 S VEHI0001 S
TARGET_TABLE LAST_POSTED
-----------CUSTOMERS CONTRACTS ACCIDENTS VEHICLES
-------------------------1999-03-18-13.18.44.187000 1999-03-18-13.18.50.562001 1999-03-18-13.18.55.890001 1999-03-18-13.19.01.031001
When you look at the target side, you can see that in fact two Microsoft Access databases were created by ASNJET (see Figure 83):
308
The IBM Data Replication Solution
Figure 83. Microsoft Access Databases Created by ASNJET
DBSR0001.mdb is the real target database. IBMSNAP_DUMMY_DBSR0001.mdb is a dummy database. It needs to be there, but you do not need to be concerned about it! The target database (DBSR0001) contains the four target tables, plus some complementary control tables (see Figure 84):
Case Study 4—Sales Force Automation, Insurance
309
Figure 84. Tables in the Target Database DBSR0001
The complementary control tables are the following: • IBMSNAP_ERROR_INFO: If an error had occurred, this table would contain additional error information to identify the row-replica table and the row that caused the error. This table is empty now. • IBMSNAP_ERROR_MESSAGE: If an error had occurred, this table would contain the error codes and error messages. This table is empty now. • IBMSNAP_GUID_KEY: Maps Microsoft Jet table identifiers and row identifiers to primary key values. Now it contains exactly 32 rows: 10 corresponding to CUSTOMERS, 10 corresponding to CONTRACTS, 10 corresponding to VEHICLES, plus 2 that correspond to the 2 rows in ACCIDENTS. • IBMSNAP_S_GENERATION: Synchronization generations table used to prevent cyclic updates from replicating back to the DB2 UDB database. It contains only one row. • IBMSNAP_SIDE_INFORMATION: If conflicts are detected, this table contains the names of the conflict tables. This table is empty now.
310
The IBM Data Replication Solution
• And one IBMSNAP_target_table_CONFLICT table associated to each target table. If conflicts are detected, these tables contain the rejected updates, that is, the updates that were done on the Microsoft Access side, but that were rejected because they were in conflict with updates from the DB2 UDB database. These tables are all empty now. Now, open each target table to check that the content is equivalent to that of the corresponding source table, according to the subsetting predicate (Agency = 25). For example, the content of the target CONTRACTS table is the following (see Figure 85):
Figure 85. Content of the CONTRACTS Table
9.6.5 Replicating Updates Since we have successfully performed the initial full-refresh, we should now check that updates (including deletes, inserts, updates) are correctly replicated both ways. We will do this in a 3-step process: 1. Update a source DB2 UDB table, start ASNJET, and check that the update is replicated towards Microsoft Access. 2. Update a row-replica table, start ASNJET, and check that the update is replicated towards DB2 UDB. 3. Update a row in a source DB2 UDB table, update the same row in the corresponding row-replica table, start ASNJET, and check how ASNJET processes the conflict.
Case Study 4—Sales Force Automation, Insurance
311
9.6.5.1 Replication from DB2 UDB towards Microsoft Access Use the following command, for example, to update the CONTRACTS table in DB2 UDB: db2 update IWH.CONTRACTS set taxes = 500 where contract = 14
Before starting ASNJET, check that Capture has had the time to capture the update. For example, you can perform a SELECT over the Change Data table (IWH.CDCONTRACTS) that is associated with the CONTRACTS table. Then start ASNJET, using the same command as before: ASNJET AQSR0001 SJNTDWH1 NOTIFY MOBILE TRCFLOW
After ASNJET has stopped, enter Microsoft Access and open the CONTRACTS table. The TAXES column now contains a value of 500 for contract number 14. If you query the ASN.IBMSNAP_APPLYTRAIL table (on the DB2 UDB side), you can also see that ASNJET has added the following rows: APPLY QUAL -------AQSR0001 AQSR0001 AQSR0001 AQSR0001 AQSR0001 AQSR0001 AQSR0001
SET_NAME WHOS _ON_ FIRST -------- ----CUST0001 F CONT0001 S CONT0001 F ACCI0001 S ACCI0001 F VEHI0001 S VEHI0001 F
MASS EFF SET SET SET SOURCE DEL. MBR INS DEL UPD SERVER
TARGET SERVER
---- --- --- --- --- -------N 0 0 0 0 MSJET N 1 0 0 1 SJNTDWH1 N 0 0 0 0 MSJET N 0 0 0 0 SJNTDWH1 N 0 0 0 0 MSJET N 0 0 0 0 SJNTDWH1 N 0 0 0 0 MSJET
-------SJNTDWH1 MSJET SJNTDWH1 MSJET SJNTDWH1 MSJET SJNTDWH1
9.6.5.2 Replication from Microsoft Access towards DB2 UDB Enter Microsoft Access and open the CONTRACTS table. Update the TAXES value for contract number 8. Set the TAXES value to 800. Then close the Microsoft Access CONTRACTS table. Then start ASNJET, using the same command as before: ASNJET AQSR0001 SJNTDWH1 NOTIFY MOBILE TRCFLOW
After ASNJET has stopped, query the CONTRACTS table in DB2 UDB. The TAXES column now contains a value of 800 for Contract Number 8. If you query the ASN.IBMSNAP_APPLYTRAIL, you can also see that ASNJET has added the following rows:
312
The IBM Data Replication Solution
APPLY QUAL -------AQSR0001 AQSR0001 AQSR0001 AQSR0001 AQSR0001 AQSR0001 AQSR0001 AQSR0001
SET_NAME WHOS _ON_ FIRST -------- ----CUST0001 S CUST0001 F CONT0001 S CONT0001 F ACCI0001 S ACCI0001 F VEHI0001 S VEHI0001 F
MASS EFF SET SET SET SOURCE DEL. MBR INS DEL UPD SERVER
TARGET SERVER
---- --- --- --- --- -------- -------N 0 0 0 0 SJNTDWH1 MSJET N 0 0 0 0 MSJET SJNTDWH1 N 0 0 0 0 SJNTDWH1 MSJET N 1 0 0 1 MSJET SJNTDWH1 N 0 0 0 0 SJNTDWH1 MSJET N 0 0 0 0 MSJET SJNTDWH1 N 0 0 0 0 SJNTDWH1 MSJET N 0 0 0 0 MSJET SJNTDWH1
9.6.5.3 Conflict Detection Now we will update the same row on both sides and see how ASNJET behaves and how it reports the conflict. Update the CONTRACTS table in DB2 UDB: db2 update IWH.CONTRACTS set taxes = 2000 where contract = 17 (We put 2000 instead of 100)
Update the CONTRACTS table in Microsoft Access. For contract 17, change the BASEFARE column from 1250 to 5000. Then close the Microsoft Access table. We now have: • 17 - 1250 - 2000 in DB2 UDB • 17 - 5000 - 100 in Microsoft Access Check that Capture has captured the update on the DB2 UDB side (query the Change Data table: IWH.CDCONTRACTS). Then restart ASNJET. When ASNJET has ended, check the content of the CONTRACTS tables. We now have: • 17 - 1250 - 2000 in DB2 UDB (unchanged) • 17 - 1250 - 2000 In Microsoft Access So we can see that the DB2 update has won the conflict and both databases are left in a consistent state. In the Microsoft Access database, look at the IBMSNAP_IWH_CONFLICT_CONTRACTS table. It contains one row that
Case Study 4—Sales Force Automation, Insurance
313
shows that one update was rejected because of the conflict detection (see Figure 86).
Figure 86. Content of the Conflict Table for Contracts
9.6.5.4 Data Integrity Considerations Within a network of DB2 databases, DB2 DataPropagator supports an update-anywhere model that is able to detect transaction conflicts. ASNJET supports an update-anywhere model, but with weaker row-conflict detection (similar to the standard Microsoft Jet replication model). As we have seen in the preceding section, ASNJET reports synchronization conflicts in conflict tables in a very similar way to the built-in Microsoft Jet replication feature. This process can result in a loss of updates. Synchronization conflicts are handled on a row-by-row basis, so some updates might be flagged as conflicting while other updates replicate to the DB2 database. If this situation is not acceptable, you need to program your own resolutions for all potential update conflicts.
9.7 Operational Aspects In this section we will study only the operational aspects that are specific to the DB2 / Microsoft Access replication scenario. The operational aspects that are not specific to this scenario are described in the first part of this book.
9.7.1 Operational Implications Following is a discussion of operational and administrative topics that are important during the production phase of the new environment. 9.7.1.1 Network Any kinds of networks (for example, WAN, LAN, and phone lines) are suitable for DataPropagator for Microsoft Jet. But DataPropagator for Microsoft Jet will be most often used to replicate data towards occasionally connected workstations, using phone lines. This means that special attention must be
314
The IBM Data Replication Solution
paid to the subsetting predicates, to limit the number of rows that each workstation will receive. This is particularly important for the initial full-refresh, that might last several hours for large tables if no subsetting is defined. The same is true after the initial full-refresh, if some jobs (monthly jobs for example) update most rows of a large source table. Important: ASNJET does not automatically establish the communication link between the target and the source. It does not dial any phone number itself, for example. You must establish the communication before ASNJET is started, and you must end the communication after ASNJET has ended. This can be achieved using any communications software. 9.7.1.2 Security The general database security considerations apply. In addition, a password file must be defined on each target workstation, named the following way: Apply_Qualifier.PWD This file contains the userids and passwords that are used by ASNJET when it connects to the source server and to the control server. If the control server is co-located with the source server, the password file contains only one row. 9.7.1.3 Scheduling ASNJET can be started with either the MOBILE parameter or with the NOMOBILE (which is the default) parameter: • With the NOMOBILE parameter, the general scheduling considerations of any replication scenario apply. This means that the subscription sets can be processed either according to a timing frequency or according to the arrival of specific events, or both. In this mode, ASNJET does not stop automatically, and so the user must stop it when he so wishes. • With the MOBILE parameter, ASNJET does not really take care about the timing frequency that is defined in the control tables. It processes all the eligible subscription sets only once, and then it stops automatically . This is probably the option that will be chosen most often, especially if the Target workstations are occasionally-connected laptops. 9.7.1.4 Locking The general locking considerations of any replication scenario apply here. Additionally, the user should close any Microsoft Access table he has been updating, before starting the ASNJET program.
Case Study 4—Sales Force Automation, Insurance
315
9.7.1.5 Recovery In case of severe problems on the laptop, such as the accidental loss of one of the target tables, it is very easy to recover. Simply delete the target database file (DBSR0001.mdb in our scenario) and the dummy database file (IBMSNAP_DUMMY_DBSR0001.mdb in our scenario). ASNJET will automatically recreate everything the next time it is run. 9.7.1.6 Administration and Maintenance Locate all the DProp control tables at the source server so that you can easily check, from a single point, whether everything is working fine, whether each workstation is getting its data and when, and so on. Furthermore, since the process to define the source and target tables does not require any connection to the target Microsoft Access databases, all the setup can be prepared even before the target workstations are configured. ASNJET contains a part of the administration functions. This particularity enables ASNJET to: • Automatically create the target Microsoft Access databases and tables on the laptop. • Automatically maintain the structure of the target tables according to the definitions contained in the control tables. For example, if a column must be added to the structure of a source table, you use DJRA to add the new column’s definition into the replication control tables, and ASNJET will automatically add the new column to the target the next time it is run. The same mechanism also applies if you remove a subscription set from the DProp control tables. ASNJET will automatically drop the corresponding target Microsoft Access tables.
9.7.2 Monitoring and Problem Determination The first place to look to check if everything is running fine, and if all the target workstations have received (and transmitted) their data, is the ASN.IBMSNAP_SUBS_SET table, in the DB2 UDB database. In this table there are four columns that you should check: • ACTIVATE: Indicates whether a subscription set is active (value 1) or not (value 0). If it is not active (value 0), it is probably because you decided that this subscription set should not be processed. So, in fact, you only need to check the values of the three other columns listed below, for the rows that have the ACTIVATE column equal to 1.
316
The IBM Data Replication Solution
• STATUS: This should be 0 for all the active subscription sets. If the value is -1 for some rows, it means that these subscription sets had errors. Please read the following section to know what to do next. • LASTRUN: This indicates the time when the subscription set was processed for the last time. Check that it is recent for all the active subscription sets. If some appear to be old, it means that some target workstations have not started their ASNJET program recently. Perhaps you should contact the corresponding users to know why they did not start the replication. • LASTSUCCESS: This indicates the time when the subscription set was successfully processed for the last time. Check that it is recent and equal to the LASTRUN value. If it is not equal to LASTRUN, this means that at least the last processing was not successful. Now that you have looked at the ASN.IBMSNAP_SUBS_SET table, you know which subscription sets are OK, and which ones are not. For those that have a problem, you must now determine what went wrong. To do this, first have a look at the ASN.IBMSNAP_APPLYTRAIL table. In most cases you will find helpful information there. The most interesting columns to look at are: • SQLCODE: This gives the SQL error code. Look at the DB2 reference documentation to retrieve the description of SQLCODEs. • SQLSTATE: This gives the SQLSTATE code. Look at the DB2 reference documentation to retrieve the description of SQLSTATEs. • APPERRM: This gives the error message. Please make sure that you are only checking the rows that correspond to the LASTRUN time, and not older rows. Sometimes the information from the ASN.IBMSNAP_APPLYTRAIL table will not be enough to directly understand the cause of the error. There are also some cases when errors are not reported at all in the ASN.IBMSNAP_APPLYTRAIL table. For example, if ASNJET encountered a severe error before it had the time to write its information to the ASN.IBMSNAP_APPLYTRAIL table. That is why it is so important to check the content of the ASN.IBMSNAP_SUBS_SET table first. In these cases where the APPLYTRAIL information is not enough to determine the cause of errors, you should restart ASNJET with the TRCFLOW parameter, like this: ASNJET AQSR0001 SJNTDWH1 NOTIFY MOBILE TRCFLOW > TRACE.TRC
Case Study 4—Sales Force Automation, Insurance
317
This will create a file called TRACE.TRC, containing all the details about what ASNJET has been doing. In some cases you will find, for example, that the error is an ODBC error, and you should then: • Refer to the ODBC documentation to retrieve the meaning of the ODBC error message. • Check whether you have the latest level of ODBC driver. On the Capture side, there are not many things to check: Just have a look at the ASN.IBMSNAP_TRACE table to see if there are error messages. If you have the feeling that updates were not captured, you should check that the IBMSNAP_TRACE table contains a GOCAPT message for the source table. And you can also of course start Capture with a trace (be careful, the parameter is TRACE, not TRCFLOW). On the Target side you can also find useful error information in the following Microsoft Access tables: • IBMSNAP_ERROR_MESSAGE: This contains the error codes and error messages. • IBMSNAP_ERROR_INFO: This contains error information that helps identify the row-replica table and the row that caused the error. If you think that some updates should have been replicated from the Microsoft Access tables towards the DB2 tables, and were not, it is probably because a conflict was detected, and you must have a look at the two following tables in the Microsoft Access database: • IBMSNAP_SIDE_INFORMATION: This contains the names of the conflict tables. • IBMSNAP_target_table_CONFLICT: This contains the rejected updates.
9.8 Benefits of this Solution There are several advantages in using DataPropagator for Microsoft Jet: • Mobile professionals such as sales representatives can connect to their corporate network occasionally, and start ASNJET to automatically receive a subset of the corporate data, and to transmit their own updates to the head-office. After the replication process has ended, they disconnect from the network and use their laptop applications with the data that is stored locally.
318
The IBM Data Replication Solution
• The process to define the source and target tables does not require any connection to the target Microsoft Access databases. This enables an easy centralized management of the whole setup. • The target Microsoft Access databases and tables can be created automatically on the laptops when ASNJET is run for the first time. • If the structure of the source tables must be changed, to add a new column to a table, for example, you only need to update the replication control tables, using DJRA. ASNJET will automatically add the new column the next time it is run. The same mechanism also applies if you remove a subscription set from the DProp control tables. ASNJET will automatically drop the corresponding target tables. • In case of severe problems on the laptop, such as the accidental loss of one of the target tables, it is very easy to recover. Simply delete the target database file on the laptop (DBSR0001.mdb in our scenario), delete the dummy database file (IBMSNAP_DUMMY_DBSR0001.mdb in our scenario) and ASNJET will automatically recreate everything the next time it is run.
9.8.1 Other Configuration Options There are not many configuration options for this scenario: • The control server must reside in a DB2 database, so it is convenient to co-locate it with the source server. Theoretically, any other DB2 server could be used, provided that it is accessible from the target workstations. But if the target workstations are laptops that use phone lines to access the source DB2 database, you must locate the control server in the source server. • You can define the target database and the target tables yourself, but since ASNJET is able to create them for you, why not use this facility?
9.9 Summary In this scenario we have illustrated the following capabilities of the DB2 DataPropagator for Microsoft Jet component (ASNJET): • Update-anywhere replication between DB2 and Microsoft Access, in an occasionally-connected, mobile environment. We have seen that to achieve this goal, the ASNJET program uses both the push and the pull modes of replication.
Case Study 4—Sales Force Automation, Insurance
319
• Data subsetting, meaning that each target user (sales representative) receives only the rows that are of interest for him. We have also discussed the following related topics: • Replication from join views defined as replication sources. The purpose of the views was to add the column involved in the rows-subsetting criterion (AGENCY) to the replication sources definition. • The use of a partitioning column (Agency) led us to ask Capture to capture the updates to the CUSTOMERS table as delete-and-insert pairs, to enable a customer to move from one agency to another. • And we also discussed the double-delete issue, which must be studied each time join views are used as replication sources. • Conflict detection, which is the process that enables ASNJET to deal with the cases when the same rows are updated in DB2 tables and the corresponding Microsoft Access tables, during the same replication cycle. In the implementation checklist, we have also seen the following interesting points: • The implementation does not require the installation of the complete DataJoiner product. Only the ASNJET and DJRA components are required from the DataJoiner installation CD-ROM. • The process to define the source and target tables does not require any connection to the target Microsoft Access databases, so that it is possible to define the whole content of the control tables even if the target workstations have not yet been configured. • The target Microsoft Access databases and tables do not need to be created before starting the replication, because ASNJET is able to create them automatically when it is started for the first time. Among the operational aspects, we have also seen a very important aspect: • You can very easily recover from any important loss of data in the target Microsoft Access tables. Simply delete the target database files and ASNJET will automatically recreate the tables the next time it is run. So this replication solution perfectly fits the needs of people who want to exchange data between geographically dispersed micro-computers, equipped with Microsoft Access, and a central DB2 server.
320
The IBM Data Replication Solution
Appendix A. Index to Data Replication Tips, Tricks, Techniques This Appendix contains a table (Table 13) that points you to all the tips, tricks, and smart techniques described within this redbook. It provides a quick and easy way to find a certain technique in the book. Table 13. Index to Data Replication Tips, Tricks, and Techniques
Tip, Trick, Technique
Where to look
Defining read-only copies with Referential Integrity constraints
3.3.1.2, “Read-only Copies with Referential Integrity” on page 50
Creating an event generator
3.3.2.3, “Advanced Event Based Scheduling” on page 53
Listing server mappings
“Step 9: Create Server Mappings for All the Non-IBM Databases” on page 69
Listing user mappings
“Step 11: Create User Mappings” on page 70
Listing nicknames
“Step 19: Create the Control Tables at Replication Source Servers” on page 74
Understanding how an automatic full refresh is performed
5.2.2.1, “Initial Refresh Maintained by Apply” on page 86
How to perform a manual refresh or off-line load
5.2.2.3, “Manual Refresh / Off-line Load” on page 89
How to prune CCD tables (including internal CCD’s)
5.3.2.2, “Pruning of CCD Tables” on page 92
How to automatically prune the Apply Trail table
5.3.2.3, “Pruning of the APPLYTRAIL Table” on page 93
Deleting AS/400 Journal receivers
5.3.2.4, “Journals Management on AS/400” on page 95
Checking to see if the Capture process is running
5.4.3.1, “Monitoring the Capture Process” on page 101
How to check for Capture errors
5.4.3.2, “Detecting Capture Errors” on page 102
How to determine the current Capture lag
5.4.3.3, “Capture Lag” on page 102
How to resolve a gap with a Capture cold start
, “Resolving the Gap with a Capture COLD Start” on page 104
© Copyright IBM Corp. 1999
321
322
Tip, Trick, Technique
Where to look
How to resolve a gap without a Capture cold start
, “Resolving the Gap Manually” on page 104
Checking to see if the Apply process is running
5.4.4.1, “Monitoring Apply Processes” on page 106
Monitoring the status of a subscription set
5.4.4.2, “Monitoring the Subscription Status” on page 106
Monitoring subscription set latency
5.4.4.4, “Monitoring Subscription Set Latency” on page 108
Adding customized logic to a subscription (ASNDONE)
5.4.4.7, “Utilizing Apply’s ASNDONE User Exit” on page 111
Monitoring DataJoiner
5.4.5, “Monitoring the Database Middleware Server” on page 116
Tuning replication performance
5.5, “Tuning Replication Performance” on page 117
How to defer pruning for multi-vendor replication sources
5.5.13.2, “How to Defer Pruning for Multi-Vendor Sources” on page 127
How to deactivate subscription sets
5.6.1, “Deactivating Subscription Sets” on page 129
How to disable full refresh for all subscriptions
5.6.2.1, “Disable Full Refresh for All Subscriptions” on page 129
How to disable full refresh for certain subscriptions
5.6.2.2, “Allow Full Refresh for Certain Subscriptions” on page 130
Forcing a full refresh
5.6.3, “Full Refresh on Demand” on page 132
Dropping Capture triggers
5.6.4, “Dropping Unnecessary Capture Triggers for Non-IBM Sources” on page 133
Changing the Apply Qualifier or set name for a subscription set
5.6.6, “Changing Apply Qualifier or Set Name for a Subscription Set” on page 134
Using SPUFI on OS/390 to access non-IBM databases
6.4, “Nice Side Effect: Using SPUFI to Access Multi-Vendor Data” on page 158
Invoking stored procedures at a remote, non-IBM target server
7.2.2.3, “Invoking Stored Procedures at the Target Database” on page 185
How to maintain a change history (CCD) table in a non-IBM target
8.4.2, “Maintaining a Change History for Suppliers” on page 220
The IBM Data Replication Solution
Tip, Trick, Technique
Where to look
How to denormalize data using target-site views
8.4.3, “Using Target Site Views to Denormalize Outlet Information” on page 228
How to denormalize data using source-site views
8.4.4, “Using Source Site Joins to Denormalize Product Information” on page 237
How to only replicate certain SQL operations
8.4.3, “Using Target Site Views to Denormalize Outlet Information” on page 228
Using Point-in-Time target tables to maintain historic information
8.4.3, “Using Target Site Views to Denormalize Outlet Information” on page 228
How to maintain temporal histories using DProp
8.4.6, “Adding Temporal History Information to Target Tables” on page 250
How to maintain base aggregate information from a change aggregate subscription
8.4.7, “Maintaining Aggregate Information” on page 256
How to push down the replication status to non-IBM targets
8.4.8, “Pushing Down the Replication Status to Oracle” on page 259
How to load data from a DB2 for OS/390 source to an Oracle target by using DataJoiner’s INSERT...SELECT...
8.4.9.1, “Using SQL INSERT....SELECT.... from DataJoiner” on page 262
How to load data from a DB2 for OS/390 source to an Oracle target by using DataJoiner’s EXPORT/IMPORT utilities
8.4.9.2, “Using DataJoiner’s EXPORT/IMPORT Utilities” on page 263
How to load data from a DB2 for OS/390 source to an Oracle target by using DSNTIAUL and Oracle’s SQL*Loader.
8.4.9.3, “Using DSNTIAUL and Oracle’s SQL*Loader Utility” on page 264
Dealing with the double delete issue when replicating join views
9.1.2, “Comments about the Table Structures” on page 273
Index to Data Replication Tips, Tricks, Techniques
323
324
The IBM Data Replication Solution
Appendix B. Non-IBM Database Stuff In this appendix we document some useful hints and tips on multi-vendor databases which were discovered during the writing of this book. It can be used as a quick reference for performing some simple tasks. For full information about configuring the non-IBM clients and databases, always refer to the documentation for that particular database software.
B.1 Oracle Stuff B.1.1 Configuring Oracle Connectivity The Oracle client really comes in two components—SQL*Net or Net8 and SQL*Plus. The basic network connectivity is provided by SQL*Net for Oracle version 7 and net8 for Oracle8. SQL*Plus sits ontop of the basic network connectivity component and provides a command line interpreter (similar to the DB2 Client Application Enabler). We advise you to install both of these components when working with Oracle. Connectivity from Oracle server to client is established by creating the tnsnames.ora file. This file contains the necessary information for the client to access the Oracle server and is equivalent to the DB2 node and database directories. It is a simple text file which is usually created using the Oracle utilities (for Oracle8, use the Net8 Assistant to configure connectivity and create the tnsnames.ora file). Advice: Many Oracle DBA’s have copies of the tnsnames.ora files used within their organizations. Ask the DBA for permission to copy this pre-configured file to your workstation. For more information about configuring Oracle clients, see the Oracle Net8 Administrator’s Guide, A58230-01. B.1.2 Using Oracle’s SQL*Plus Once the client has been configured, the connection from client to Oracle server can be tested using SQL*Plus. To connect to the Oracle server, type the following on the operating system command line: sqlplus username/password@servicename
In this command, servicename is the entry in the tnsnames.ora file which identifies the Oracle server.
© Copyright IBM Corp. 1999
325
For example: sqlplus scott/tiger@HQ
Scott is the sample userid provided with Oracle, and tiger is Scott’s password. If this userid has been revoked, then contact the Oracle DBA for valid userid and password. When logged on, the SQL*Plus prompt will be SQLPLUS>. Here are a few useful tips once you have logged onto the Oracle server: • End all SQL*Plus commands with a semicolon ( ; ) • To find the structure of an Oracle table use this command: DESCRIBE ;
• To find out who you are logged onto Oracle as, issue the command: SELECT * FROM USER_USERS
• To invoke SQL*Plus and use a file as input, use the command: sqlplus user/pwd@orainst @
Put a quit; at the end of the input_file and SQL*Plus stops when finished. • Use spool ; to dump output to an output file, and spool off; to stop dumping the output to a file. • Use COMMIT; to commit the changes. There is no auto-commit. B.1.3 The Oracle Data Dictionary The Oracle data dictionary is divided into three sets of views. Those views which start with: USER_ contain information about objects owned by the current user. ALL_ contain all information from USER_ views, plus objects to which the user has been granted privileges on. DBA_ contain information on all objects within the database.
Table 14 provides details on some of the more useful Oracle data dictionary views.
326
The IBM Data Replication Solution
Table 14. Useful Oracle Data Dictionary Tables
Table Name
Columns
Description
Synonym
USER_TABLES
Table_Name, Tablespace_Name, Num_Rows,...
All tables owned by the user
Tabs
USER_OBJECTS
Object_Name, Object_Type, Status,......
All database objects owned by user.
Obj
USER_TAB_COLUMNS
Table_Name, Column_Name, Data_Type, Data_Length, Data_Precision, Data_Scale,...
All columns in tables owned by user.
Cols
B.1.4 Oracle Error Messages To get details for Oracle error codes, use the oerr facility. From the operating system command line, type: oerr yyy xxx
In this command: yyy is the three letter error code prefix xxx is the sql return code.
For example, if the error was ORA12154, then type: oerr ora 12154
Advice: If your path does not include $ORACLE_HOME/bin, change to directory $ORACLE_HOME/bin before issuing the oerr command. B.1.5 Oracle Server The Oracle server manager can be used to start and stop the Oracle server and to perform other administration functions. The server manager can be started in either line or menu mode. Use: svrmgrl - to start the server manager in line mode svrmgrm - to start the server manager in menu mode
To start Oracle, issue the startup command from the server manager, to stop Oracle use the shutdown command. Before issuing these commands you usually have to issue the connect internal command. For more information
Non-IBM Database Stuff
327
about using the Oracle server manager, see the Oracle8 Administrator’s Guide, A58397-01 . B.1.6 Oracle Listener The Oracle listener is the process which allows remote client to connect to the server. The listener must be running on the server before clients can connect to the Oracle database. If the Oracle database is running, but you are still unable to connect, ensure that the listener service has been started on the server. The listener can be started using the following command: lsnrctl start
To obtain a list of other possible listener command parameters, type: lsnrctl help
B.1.7 Other Useful Oracle Tools TNSPING and TRCROUTE are two very useful Oracle tools to aid in debugging
connectivity problems. TNSPING is similar to the TCP/IP ping command, except that it pings the Oracle
database to see if basic database connectivity is working. For example, if your Oracle server is named AZOV, then type the following from the operating system command line: tnsping azov
The Trace Route Utility ( TRCROUTE) allows you to discover what path or route a connection is taking from a client to a server. If a problem is encountered, TRCROUTE returns an error stack to the client, which makes troubleshooting easier. For information on how to use the TRCROUTE utility, see the Oracle8 Administrator’s Guide, A58397-01 . B.1.8 More Information The Oracle homepage on the Web is at: http://www.oracle.com
328
The IBM Data Replication Solution
B.2 Informix Stuff B.2.1 Configuring Informix Connectivity The Informix client is configured by updating the sqlhosts file (similar to the DB2 node and database directories). On UNIX, the sqlhosts file resides, by default, in the $INFORMIXDIR/etc directory. As an alternative, you can set the INFORMIXSQLHOSTS environment variable to the full pathname and filename of a file that contains the sqlhosts file information. You can enter information in the sqlhosts file by using a standard text editor (copy a sample from $INFORMIXDIR/etc/sqlhosts.std). The table-like structure of the file is shown in the example below: dbservername sjazov_ifx01 sjstar_ifx01 sjsky_ifx01
nettype onsoctcp onsoctcp onsoctcp
hostname azov azov sky
port 2800 2801 2810
options
Advice: Like the Oracle tnsnames.ora file, many Informix DBA’s will have a copy of this file customized for use within their organization. If you ask them nicely, I am sure they will allow you to copy the file to your Informix client. B.2.2 Using Informix’s dbaccess Once the sqlhosts file has been configured, use Informix’s client interface dbaccess to connect from the client to the Informix server. The Informix client
program dbaccess can operate in a number of different modes depending on how it is invoked. Use: dbaccess to start dbaccess in interactive (menu) mode. dbaccess to start dbaccess in command line mode and connect to the database specified. The dash ( - ) at the end is required. Once logged onto dbaccess, the prompt will be >. End all dbaccess commands and SQL with a semi-colon ( ; ). dbaccess .sql to execute commands and SQL
statements from an input file. Be aware that dbaccess only accepts input from files with the .sql extension. dbaccess -e .sql > 2>&1 to
execute commands and SQL statements from an input file, redirecting the output to an output file. The -e option echoes the output to standard output.
Non-IBM Database Stuff
329
Advice: There is no auto-commit on/off switch in dbaccess. All statements in an SQL file are automatically committed, unless you explicitly open a transaction, using the BEGIN WORK; statement. To end the transaction, use either COMMIT; or ROLLBACK;. For example: BEGIN WORK; INSERT INTO.... INSERT INTO.... INSERT INTO.... INSERT INTO.... COMMIT;
VALUES VALUES VALUES VALUES
(...); (...); (...); (...);
B.2.3 Informix Error Messages To find out the meaning of Informix error codes, use the finderr facility. From the operating system command line, type: finderr <msgnum>
Advice: If your path does not include $INFORMIXDIR/bin, change to directory $INFORMIXDIR/bin before issuing the finderr command. B.2.4 More Information The Informix home page is at: http://www.informix.com
Informix on-line manuals can be downloaded from the Web.
B.3 Microsoft SQL Server Stuff B.3.1 Configuring Microsoft SQL Server Connectivity Microsoft SQL Server uses ODBC as its native database connectivity protocol. ODBC drivers are automatically installed on your Windows workstation when you install SQL Server client utilities. However, you will still need to create specific Data Source Names (DSN’s) after the ODBC driver has been installed. Use the ODBC Data Source Administrator tool, which can be found in the Windows NT Control Panel, to configure a DSN. When configuring the DSN, be sure to configure it as a System DSN.
330
The IBM Data Replication Solution
B.3.2 Using the Microsoft Client OSQL Microsoft supplies a command line utility called OSQL with SQL Server 7. This provides the same functionality as ISQL (which was supplied with SQL Server 6.5 and used DB-Library), but uses ODBC as the underlying database communication protocol. You can use OSQL to run Transact-SQL statements, system procedures and script files. The syntax for invoking OSQL is summarized in the following statement: osql -Y -S<server> -D -U<user> -P -c -i -p
The invocation parameters for OSQL are detailed in Table 15. Table 15. Invocation Parameters for OSQL
Parameter
Meaning
-Y
Changed transaction - autocommit off
-S
Microsoft server to connect to
-D
Database at server <server>
-U
SQL Server logon
-P
SQL Server password
-c
Overwrite the SQL Server default command termination character (the default termination ’character’ is ’go’); example: -c";"
To change databases in SQL Server, specify: use
Smart Advice: To execute commands in SQL Server, type go after entering the command in OSQL. Microsoft also provides a graphical user interface called the SQL Server Query Analyzer. B.3.3 Microsoft SQL Server Data Dictionary Table 16 summarizes some of the more useful Microsoft SQL Server data dictionary tables. These data dictionary tables are owned by user dbo.
Non-IBM Database Stuff
331
Table 16. Useful SQL Server Data Dictionary Tables
Table Name
Columns
Description
sysobjects
name, type
Information on all objects in the database.
syscolumns
name, type, length
All columns in tables, views and arguments in stored procedures.
sysusers
name
Specifies who can use the database.
B.3.4 Helpful SQL Server Stored Procedures Microsoft SQL Server makes extensive use of stored procedures. Table 17 below lists some of the more useful stored procedures. Table 17. Microsoft SQL Server Stored Procedures
Stored Procedure name
Description
sp_help
Lists the objects in the database.
sp_help
Gives the table structure for .
sp_helpdb
Provides information about the databases on the server.
B.3.5 Microsoft SQL Server Error Messages To display a description for an error code, use OSQL to logon to a Microsoft SQL Server, and then type: use master go select error, description from sysmessages where error=<errorno> go
B.3.6 Microsoft SQL Server Administration Microsoft SQL Server can be started from a command prompt by typing one of the following: net start mssqlserver sqlservr net start SQLServerAgent
332
The IBM Data Replication Solution
The server can be shutdown using the SHUTDOWN Transact-SQL statement which can be issued from any query tool capable of issuing Transact-SQL. Use SHUTDOWN NOWAIT to immediately stop SQL Server without waiting for transactions to complete (but recovery time will be increased). B.3.7 ODBCPing This utility checks database connectivity from client to Microsoft SQL Server databases accessed via ODBC. The syntax of the command is: ODBCPING [-S Server | -D DSN] [-U login id] [-P Password]
B.3.8 More Information The Microsoft home page is at: http://www.microsoft.com
The homepage for SQL Server is at: http://www.microsoft.com/backoffice/sql/default.htm
B.4 Sybase SQL Server Stuff Even though we did not provide a Sybase case study, we want to share some information on how to use the Sybase client with you. B.4.1 Configuring Sybase SQL Server Connectivity The Sybase client is configured by updating the interfaces file (similar to the DB2 node and database directories). On UNIX, the interfaces file resides, by default, in the $SYBASE directory. The interfaces file can be edited manually, or configured using Sybase’s sybinit utility. The following sample interfaces file provides connectivity information for two Sybase servers, namely SYBSVR1 and SYBSVR2. SYBSVR1 master tcp ether 137.12.111.33 4000 query tcp ether 137.12.111.33 4000 SYBSVR2 master tcp ether 137.12.111.42 3048 query tcp ether 137.12.111.42 3048
Non-IBM Database Stuff
333
B.4.2 Using the Sybase Client isql Sybase supplies a command line utility called isql with Sybase SQL Server. The syntax for invoking isql is summarized in the following statement: isql -Y -S<server> -D -U<user> -P -c -i -p
The invocation parameters for isql are detailed in Table 18. Table 18. Invocation Parameters for isql
Parameter
Meaning
-Y
Changed transaction - autocommit off
-S
Sybase server to connect to
-D
Database at server <server>
-U
SQL Server logon (user)
-P
SQL Server password
-c
Overwrite the SQL Server default command termination character (the default termination ’character’ is ’go’); example: -c";"
To change databases in SQL Server, specify: use
Advice: To execute commands in SQL Server, type go after entering the command in isql. B.4.3 Sybase SQL Server Data Dictionary Table 19 summarizes some of the more useful Sybase SQL Server data dictionary tables. These data dictionary tables are owned by user dbo. Table 19. Useful SQL Server Data Dictionary Tables
334
Table Name
Columns
Description
sysobjects
name, type
Information on all objects in the database.
syscolumns
name, type, length
All columns in tables, views and arguments in stored procedures.
sysusers
name
Specifies who can use the database.
The IBM Data Replication Solution
B.4.4 Helpful SQL Server Stored Procedures Sybase SQL Server makes extensive use of stored procedures. Table 20 below lists some of the more useful stored procedures. Table 20. Microsoft SQL Server Stored Procedures
Stored Procedure Name
Description
sp_help
Lists the objects in the database.
sp_help
Gives the table structure for .
sp_helpdb
Provides information about the databases on the server.
B.4.5 Sybase SQL Server Error Messages To display a description for an error code, use isql to logon to a Sybase SQL Server, and then type: use master go select error, description from sysmessages where error=<errorno> go
B.4.6 More Information The Sybase home page is at: http://www.sybase.com
Non-IBM Database Stuff
335
336
The IBM Data Replication Solution
Appendix C. General Implementation Checklist
Set Up the Database Middleware Server Step 1 - Install the Non-IBM Client Code to Access Non-IBM Data Sources Step 2 - Prepare and Check Native Access to the Remote Data Sources Step 3 - Install the DataJoiner Software Step 4 - Prepare DataJoiner to Access the Remote Data Sources Step 5 - Create a DataJoiner Instance Step 6 - Create the DataJoiner Databases Step 7 - Connect DataJoiner to Other DB2 or DataJoiner Databases Step 8 - Enable DB2 Clients to Connect to the DataJoiner Databases Step 9 - Create Server Mappings for All non-IBM Database Systems Step 10 - Create the Server Options Step 11 - Create the User Mappings
Implement the Replication Subcomponents (Capture, Apply) Step 12 - Install and Set Up DProp Capture (If Required) Step 13 - Install and Set Up DProp Apply (If Required)
Set Up the Replication Administration Workstation Step 14 - Install DB2 Client Application Enabler (DB2 CAE) Step 15 - Establish DB2 Connectivity Step 16 - Install the DataJoiner Replication Administration Software (DJRA) Step 17 - Set up DJRA to Access the Source and Target Databases Step 18 - Modify the DJRA User Exits (Optional)
Create the Replication Control Tables Step 19 - Set Up the DProp Control Tables at the Replication Source Servers Step 20 - Set Up the DProp Control Tables at the Replication Control Servers
Bind DProp Capture and DProp Apply Step 21 - Bind DProp Capture (If Required) Step 22 - Bind DProp Apply
© Copyright IBM Corp. 1999
337
338
The IBM Data Replication Solution
Appendix D. DJRA Generated SQL for Case Study 2 This Appendix contains the SQL generated from DJRA for the various replication definitions which were configured in case study 2.
D.1 Define Replication Sources 1. Define table (ITEMS) as replication source: --* echo input: TABLEREG SJ390DB1 LIYAN ITEMS AFTER NONEEXCLUDED --* DELETEINSERT NONE N -- using SRCESVR.REX as the REXX logic filename -- using REXX password file PASSWORD.REX -- If you don’t see: ’-- now done interpreting...’ -- then check your REXX code. -- now done interpreting REXX password file PASSWORD.REX -- connect to the source-server CONNECT TO SJ390DB1 USER db2res5 USING pwd; -- USERID=DB2RES5 SOURCE_ALIAS alias=SJ390DB1 -- PRDID=DSN0501
23 Mar 1999 9:42am
-- 1 candidate registrations, 16 already known to be registered -- The following tables are candidates for registration: --
1 table
LIYAN.ITEMS
-- registration candidate #1 LIYAN.ITEMS -- LIYAN.ITEMS is assumed a USER table -- reading REXX logic file SRCESVR.REX -- If you don’t see: ’-- now done interpreting...’ -- then check your REXX code. -- in SRCESVR.REX, about to create a change data tablespace CREATE TABLESPACE TS420598 IN SJ390DB1 SEGSIZE 4 LOCKSIZE TABLE CLOSE NO CCSID EBCDIC; -- now done interpreting REXX logic file SRCESVR.REX -- enable change data capture ALTER TABLE LIYAN.ITEMS DATA CAPTURE CHANGES; -- selecting ’X’ as the before-image prefix character -- create the cd/ccd table for LIYAN.ITEMS CREATE TABLE LIYAN.LIYANCD_ITEMS(IBMSNAP_UOWID CHAR(10) FOR BIT DATA NOT NULL,IBMSNAP_INTENTSEQ CHAR(10) FOR BIT DATA NOT NULL, IBMSNAP_OPERATION CHAR(1) NOT NULL,ITEM_NUM DECIMAL(13 , 0) NOT NULL, DESC VARCHAR(150) NOT NULL,PROD_LINE_NUM DECIMAL(7 , 0) NOT NULL, SUPP_NO DECIMAL(13 , 0) NOT NULL) IN SJ390DB1.TS420598;
-- create the index for the change data table for LIYAN.ITEMS CREATE TYPE 2 UNIQUE INDEX LIYAN.CDI00LIYANCD_ITEMS ON LIYAN.LIYANCD_ITEMS (IBMSNAP_UOWID ASC, IBMSNAP_INTENTSEQ ASC); -- insert a registration record into ASN.IBMSNAP_REGISTER INSERT INTO ASN.IBMSNAP_REGISTER(GLOBAL_RECORD,SOURCE_OWNER, SOURCE_TABLE,SOURCE_VIEW_QUAL,SOURCE_STRUCTURE,SOURCE_CONDENSED, SOURCE_COMPLETE,CD_OWNER,CD_TABLE,PHYS_CHANGE_OWNER,PHYS_CHANGE_TABLE, DISABLE_REFRESH,ARCH_LEVEL,BEFORE_IMG_PREFIX,CONFLICT_LEVEL,
© Copyright IBM Corp. 1999
339
PARTITION_KEYS_CHG) VALUES(’N’,’LIYAN’,’ITEMS’, 0 , 1 ,’Y’,’Y’,’LIYAN’, ’LIYANCD_ITEMS’,’LIYAN’,’LIYANCD_ITEMS’, 0 ,’0201’,NULL,’0’,’Y’); COMMIT; -- Satisfactory completion at 9:42am
2. Define view (S_PRODUCT base on STORE_ITEM & ITEMS) as replication source: --* Calling VIEWREG for source table DB2RES5.S_PRODUCT -- using REXX password file PASSWORD.REX -- If you don’t see: ’-- now done interpreting...’ then check your REXX code -- now done interpreting REXX password file PASSWORD.REX -- input view OWNER=DB2RES5 input view NAME=S_PRODUCT -- connect to the source server CONNECT TO SJ390DB1 USER db2res5 USING pwd; -- USERID=DB2RES5 SOURCE_ALIAS alias=SJ390DB1 23 Mar 1999 10:28am --* The view definition to be registered=’CREATE VIEW --* DB2RES5.S_PRODUCT AS SELECT S.STORE_NUM,I.ITEM_NUM,I.DESC, --* I.PROD_LINE_NUM,I.SUPP_NO FROM LIYAN.STORE_ITEM S, LIYAN.ITEMS I --* WHERE S.PRODLINE_NO=I.PROD_LINE_NUM’ -- create the change data view for component 1 CREATE VIEW DB2RES5.S_PRODUCTA AS SELECT S.IBMSNAP_UOWID, S.IBMSNAP_INTENTSEQ,S.IBMSNAP_OPERATION,S.STORE_NUM,I.ITEM_NUM,I.DESC, I.PROD_LINE_NUM,I.SUPP_NO FROM LIYAN.LIYANCD_STORE_ITEM S, LIYAN.ITEMS I WHERE S.PRODLINE_NO=I.PROD_LINE_NUM; -- register the base and change data views for component 1 INSERT INTO ASN.IBMSNAP_REGISTER(GLOBAL_RECORD,SOURCE_OWNER, SOURCE_TABLE,SOURCE_VIEW_QUAL,SOURCE_STRUCTURE,SOURCE_CONDENSED, SOURCE_COMPLETE,CD_OWNER,CD_TABLE,PHYS_CHANGE_OWNER,PHYS_CHANGE_TABLE, DISABLE_REFRESH,CCD_OWNER,CCD_TABLE,CCD_OLD_SYNCHPOINT,SYNCHPOINT, SYNCHTIME,CCD_CONDENSED,CCD_COMPLETE,ARCH_LEVEL,BEFORE_IMG_PREFIX, CONFLICT_LEVEL,PARTITION_KEYS_CHG) VALUES(’N’,’DB2RES5’,’S_PRODUCT’, 1 , 1 ,’Y’,’Y’,’DB2RES5’,’S_PRODUCTA’,’LIYAN’,’LIYANCD_STORE_ITEM’, 0 , NULL,NULL,NULL,NULL,NULL ,NULL,NULL,’0201’,NULL,’0’,’N’); -- create the change data view for component 2 CREATE VIEW DB2RES5.S_PRODUCTB AS SELECT I.IBMSNAP_UOWID, I.IBMSNAP_INTENTSEQ,I.IBMSNAP_OPERATION,S.STORE_NUM,I.ITEM_NUM,I.DESC, I.PROD_LINE_NUM,I.SUPP_NO FROM LIYAN.STORE_ITEM S, LIYAN.LIYANCD_ITEMS I WHERE S.PRODLINE_NO=I.PROD_LINE_NUM; -- register the base and change data views for component 2 INSERT INTO ASN.IBMSNAP_REGISTER(GLOBAL_RECORD,SOURCE_OWNER, SOURCE_TABLE,SOURCE_VIEW_QUAL,SOURCE_STRUCTURE,SOURCE_CONDENSED, SOURCE_COMPLETE,CD_OWNER,CD_TABLE,PHYS_CHANGE_OWNER,PHYS_CHANGE_TABLE, DISABLE_REFRESH,CCD_OWNER,CCD_TABLE,CCD_OLD_SYNCHPOINT,SYNCHPOINT, SYNCHTIME,CCD_CONDENSED,CCD_COMPLETE,ARCH_LEVEL,BEFORE_IMG_PREFIX, CONFLICT_LEVEL,PARTITION_KEYS_CHG) VALUES(’N’,’DB2RES5’,’S_PRODUCT’, 2 , 1 ,’Y’,’Y’,’DB2RES5’,’S_PRODUCTB’,’LIYAN’,’LIYANCD_ITEMS’, 0 ,NULL, NULL,NULL,NULL,NULL ,NULL,NULL,’0201’,NULL,’0’,’N’); COMMIT; -- Satisfactory completion at 10:28am
340
The IBM Data Replication Solution
D.2 Create Empty Subscription Sets 1. Create subscription sets for CCD tables (use SUPPLIER as example): --* Calling ADDSET for set 1 : AQCCD/setsupp --* echo input: ADDSET SJ390DB1 AQCCD SETSUPP SJ390DB1 SJ390DB1 --* 19990318111900 R 2 NULL 30 -- using REXX password file PASSWORD.REX -- If you don’t see: ’-- now done interpreting...’ then check your REXX code -- now done interpreting REXX password file PASSWORD.REX --* CONNECTing TO SJ390DB1 USER db2res5 USING pwd; --* The ALIAS name ’SJ390DB1’ maps to RDBNAM ’DB2I
’
--* CONNECTing TO SJ390DB1 USER db2res5 USING pwd; --* The ALIAS name ’SJ390DB1’ maps to RDBNAM ’DB2I
’
--* connect to the CNTL_ALIAS CONNECT TO SJ390DB1 USER db2res5 USING pwd; -- current USERID=DB2RES5 CNTL_ALIAS alias=SJ390DB1 18 Mar 1999 11:21am -- create a new row in IBMSNAP_SUBS_SET INSERT INTO ASN.IBMSNAP_SUBS_SET( ACTIVATE,APPLY_QUAL,SET_NAME,WHOS_ON_FIRST, SOURCE_SERVER,SOURCE_ALIAS,TARGET_SERVER,TARGET_ALIAS, STATUS,LASTRUN, REFRESH_TIMING,SLEEP_MINUTES,EVENT_NAME,MAX_SYNCH_MINUTES,AUX_STMTS,ARCH_LE VEL) VALUES (0 , ’AQCCD’ , ’SETSUPP’ , ’S’ , ’DB2I ’ , ’SJ390DB1’ , ’DB2I ’ , ’SJ390DB1’ , 0 , ’1999-03-18-11.19.00’ , ’R’ , 2 ,NULL , 30 , 0 ,’0201’); -- create a new row in IBMSNAP_SUBS_SET INSERT INTO ASN.IBMSNAP_SUBS_SET( ACTIVATE, APPLY_QUAL, SET_NAME, WHOS_ON_FIRST, SOURCE_SERVER, SOURCE_ALIAS, TARGET_SERVER, TARGET_ALIAS, STATUS, LASTRUN, REFRESH_TIMING, SLEEP_MINUTES, EVENT_NAME, MAX_SYNCH_MINUTES, AUX_STMTS, ARCH_LEVEL) VALUES (0 , ’AQCCD’ , ’SETSUPP’ , ’F’ , ’DB2I ’ , ’SJ390DB1’ , ’DB2I ’, ’SJ390DB1’ , 0 , ’1999-03-18-11.19.00’ , ’R’ , 2 ,NULL , 30 , 0 ,’0201’); --* commit work at SJ390DB1 COMMIT; -- Satisfactory completion of ADDSET at 11:21am
2. Create empty subscription sets for user copy tables: --* Calling ADDSET for set 1 : AQLY/SETLY --* echo input: ADDSET DJDB AQLY SETLY SJ390DB1 DJDB 19990323103300 R --* 2 NULL 30 -- using REXX password file PASSWORD.REX -- If you don’t see: ’-- now done interpreting...’ then check your REXX code -- now done interpreting REXX password file PASSWORD.REX --* CONNECTing TO SJ390DB1 USER db2res5 USING pwd; --* The ALIAS name ’SJ390DB1’ maps to RDBNAM ’DB2I
’
--* CONNECTing TO DJDB; --* The ALIAS name ’DJDB’ matches the RDBNAM ’DJDB’ --* connect to the CNTL_ALIAS CONNECT TO DJDB; -- current USERID=GROHRES3 CNTL_ALIAS alias=DJDB
23 Mar 1999 10:35am
-- create a new row in IBMSNAP_SUBS_SET INSERT INTO ASN.IBMSNAP_SUBS_SET( ACTIVATE,APPLY_QUAL,SET_NAME, WHOS_ON_FIRST,SOURCE_SERVER,SOURCE_ALIAS,TARGET_SERVER,TARGET_ALIAS, STATUS,LASTRUN,REFRESH_TIMING,SLEEP_MINUTES,EVENT_NAME,
DJRA Generated SQL for Case Study 2
341
MAX_SYNCH_MINUTES,AUX_STMTS,ARCH_LEVEL) VALUES (0 , ’AQLY’ , ’SETLY’ , ’S’ , ’DB2I , ’SJ390DB1’ , ’DJDB’ , ’DJDB’ , 0 , ’1999-03-23-10.33.00’ , ’R’ , 2 ,NULL , 30 , 0 ,’0201’);
’
-- create a new row in IBMSNAP_SUBS_SET INSERT INTO ASN.IBMSNAP_SUBS_SET( ACTIVATE,APPLY_QUAL,SET_NAME, WHOS_ON_FIRST,SOURCE_SERVER,SOURCE_ALIAS,TARGET_SERVER,TARGET_ALIAS, STATUS,LASTRUN,REFRESH_TIMING,SLEEP_MINUTES,EVENT_NAME, MAX_SYNCH_MINUTES,AUX_STMTS,ARCH_LEVEL) VALUES (0 , ’AQLY’ , ’SETLY’ , ’F’ , ’DJDB’ , ’DJDB’ , ’DB2I ’ , ’SJ390DB1’ , 0 , ’1999-03-23-10.33.00’ , ’R’ , 2 ,NULL , 30 , 0 ,’0201’); --* commit work at DJDB COMMIT; -- Satisfactory completion of ADDSET at 10:35am
D.3 Add a Member to Subscription Sets 1. Add a member to subscription sets to create a CCD table (use BRAND as example): --* Calling d:\Program Files\DPRTools\addmembr.rex for AQCCD/SETBRAND pair # 1 --* Echo input: ADDMEMBR SJ390DB1 AQCCD SETBRAND ITSOSJ BRAND --* NONEEXECLUDED CCD=YNNYN BRAND_NUM+ DB2RES5 CCDBRAND NODATAJOINER U -- using REXX password file PASSWORD.REX -- If you don’t see: ’-- now done interpreting...’ then check your REXX code -- now done interpreting REXX password file PASSWORD.REX --* Connect to the CNTL_ALIAS CONNECT TO SJ390DB1 USER db2res5 USING pwd; --* The ALIAS name ’SJ390DB1’ maps to RDBNAM ’DB2I --* Current USERID=DB2RES5 CNTL_ALIAS alias=
’ 18 Mar 1999 12:04pm
--* Fetching from the ASN.IBMSNAP_SUBS_SET table at SJ390DB1 --* CONNECTing TO SJ390DB1 USER db2res5 USING pwd; --* The ALIAS name ’SJ390DB1’ maps to RDBNAM ’DB2I
’
--* Fetching from the ASN.IBMSNAP_REGISTER table at SJ390DB1 -- using REXX logic file CNTLSVR.REX --* If you don’t see: ’--* now done interpreting REXX logic file --* CNTLSVR.REX’, then check your REXX code --* The subscription predicate was not changed by the user logic in --* CNTLSVR.REX --* now done interpreting REXX logic file CNTLSVR.REX -- create a new row in IBMSNAP_SUBS_MEMBR INSERT INTO ASN.IBMSNAP_SUBS_MEMBR( APPLY_QUAL, SET_NAME, WHOS_ON_FIRST, SOURCE_OWNER, SOURCE_TABLE,SOURCE_VIEW_QUAL,TARGET_OWNER,TARGET_TABLE, TARGET_CONDENSED,TARGET_COMPLETE,TARGET_STRUCTURE,PREDICATES) VALUES ( ’AQCCD’ , ’SETBRAND’ , ’S’ , ’ITSOSJ’ , ’BRAND’ , 0 ,’DB2RES5’,’CCDBRAND’ ,’Y’ ,’N’ , 3 ,NULL); -- Create a new row in IBMSNAP_SUBS_COLS INSERT INTO ASN.IBMSNAP_SUBS_COLS(APPLY_QUAL,SET_NAME,WHOS_ON_FIRST, TARGET_OWNER,TARGET_TABLE,COL_TYPE,TARGET_NAME,IS_KEY,COLNO,EXPRESSION) VALUES(’AQCCD’,’SETBRAND’ , ’S’,’DB2RES5’,’CCDBRAND’ ,’A’,’BRAND_NUM’,’Y’, 1 ,’BRAND_NUM’); -- Create a new row in IBMSNAP_SUBS_COLS INSERT INTO ASN.IBMSNAP_SUBS_COLS(APPLY_QUAL,SET_NAME,WHOS_ON_FIRST, TARGET_OWNER,TARGET_TABLE,COL_TYPE,TARGET_NAME,IS_KEY,COLNO,EXPRESSION) VALUES(’AQCCD’,’SETBRAND’ , ’S’,’DB2RES5’,’CCDBRAND’ ,’A’,’DESC’,’N’, 2 ,’DESC’); --* I noticed the set subscription is inactive
342
The IBM Data Replication Solution
UPDATE ASN.IBMSNAP_SUBS_SET SET ACTIVATE=1 WHERE APPLY_QUAL=’AQCCD’ AND SET_NAME=’SETBRAND’ AND WHOS_ON_FIRST=’S’;--* Commit work at cntl_ALIAS SJ390DB1 COMMIT; --* Connect to the SOURCE_ALIAS CONNECT TO SJ390DB1 USER db2res5 USING pwd; --* The ALIAS name ’SJ390DB1’ maps to RDBNAM ’DB2I
’
--* record the subscription in the pruning control table at the --* source server INSERT INTO ASN.IBMSNAP_PRUNCNTL( TARGET_SERVER,TARGET_OWNER, TARGET_TABLE,SOURCE_OWNER,SOURCE_TABLE,SOURCE_VIEW_QUAL,APPLY_QUAL, SET_NAME,CNTL_SERVER,TARGET_STRUCTURE,CNTL_ALIAS)VALUES(’DB2I’, ’DB2RES5’,’CCDBRAND’,’ITSOSJ’,’BRAND’, 0 ,’AQCCD’,’SETBRAND’,’DB2I ’, 3 ,’SJ390DB1’); --* Commit work at source_ALIAS SJ390DB1 COMMIT; --* Connect to the TARGET_ALIAS CONNECT TO SJ390DB1 USER db2res5 USING pwd; --* The ALIAS name ’SJ390DB1’ maps to RDBNAM ’DB2I
’
-- using REXX logic file TARGSVR.REX --* If you don’t see: -- now done interpreting REXX logic file --* TARGSVR.REX, then check your REXX code -- in TARGSVR.REX -- About to create a target table tablespace CREATE TABLESPACE TS045974 IN DSNDB04 SEGSIZE 4 LOCKSIZE PAGE CLOSE NO; -- now done interpreting REXX logic file TARGSVR.REX -- The target table does not yet exist -- Not remote to DataJoiner target -- Create the target table DB2RES5.CCDBRAND CREATE TABLE DB2RES5.CCDBRAND(BRAND_NUM DECIMAL(7 , 0) NOT NULL, DESC CHAR(30) NOT NULL,IBMSNAP_INTENTSEQ CHAR(10) FOR BIT DATA NOT NULL, IBMSNAP_OPERATION CHAR(1) NOT NULL,IBMSNAP_COMMITSEQ CHAR(10) FOR BIT DATA NOT NULL, IBMSNAP_LOGMARKER TIMESTAMP NOT NULL) IN DSNDB04.TS045974; CREATE INDEX DB2RES5.CCDBRANDX ON DB2RES5.CCDBRAND(IBMSNAP_COMMITSEQ ASC, IBMSNAP_INTENTSEQ ASC); -- Create an index for the TARGET DB2RES5.CCDBRAND CREATE TYPE 2 UNIQUE INDEX DB2RES5.CCDBRAND ON DB2RES5.CCDBRAND( BRAND_NUM ASC); -- Auto-register the TARGET as an ’internal CCD’ UPDATE ASN.IBMSNAP_REGISTER SET CCD_OWNER=’DB2RES5’, CCD_TABLE=’CCDBRAND’,CCD_COMPLETE=’N’,CCD_CONDENSED=’Y’ WHERE SOURCE_OWNER=’ITSOSJ’ AND SOURCE_TABLE=’BRAND’; --* Commit work at target server SJ390DB1 COMMIT; --* Satisfactory completion of ADDMEMBR at 12:05pm
2. Add a member to subcription sets to create a user copy table (use BRAND as example): --* Calling d:\Program Files\DPRTools\addmembr.rex for LY610/SET610 pair # 3 --* Echo input: ADDMEMBR DJDB LY610 SET610 ITSOSJ BRAND NONEEXECLUDED --* UCOPY DESC+ GROHRES3 BRAND INFODB1 U -- using REXX password file PASSWORD.REX
DJRA Generated SQL for Case Study 2
343
-- If you don’t see: ’-- now done interpreting...’ then check your REXX code -- now done interpreting REXX password file PASSWORD.REX --* Connect to the CNTL_ALIAS CONNECT TO DJDB; --* The ALIAS name ’DJDB
’ matches the RDBNAM ’DJDB’
--* Current USERID=GROHRES3 CNTL_ALIAS alias=
18 Mar 1999 11:58am
--* Fetching from the ASN.IBMSNAP_SUBS_SET table at DJDB --* CONNECTing TO SJ390DB1 USER db2res5 USING pwd; --* The ALIAS name ’SJ390DB1’ maps to RDBNAM ’DB2I
’
--* Fetching from the ASN.IBMSNAP_REGISTER table at SJ390DB1 -- using REXX logic file CNTLSVR.REX --* If you don’t see: ’--* now done interpreting REXX logic file --* CNTLSVR.REX’, then check your REXX code --* The subscription predicate was not changed by the user logic in --* CNTLSVR.REX --* now done interpreting REXX logic file CNTLSVR.REX -- create a new row in IBMSNAP_SUBS_MEMBR INSERT INTO ASN.IBMSNAP_SUBS_MEMBR( APPLY_QUAL,SET_NAME,WHOS_ON_FIRST, SOURCE_OWNER,SOURCE_TABLE,SOURCE_VIEW_QUAL,TARGET_OWNER,TARGET_TABLE, TARGET_CONDENSED,TARGET_COMPLETE,TARGET_STRUCTURE,PREDICATES) VALUES ( ’LY610’ , ’SET610’ , ’S’ , ’ITSOSJ’ , ’BRAND’ , 0 ,’GROHRES3’,’BRAND’,’Y’,’Y’, 8 ,NULL); -- Create a new row in IBMSNAP_SUBS_COLS INSERT INTO ASN.IBMSNAP_SUBS_COLS(APPLY_QUAL,SET_NAME,WHOS_ON_FIRST, TARGET_OWNER,TARGET_TABLE,COL_TYPE,TARGET_NAME,IS_KEY,COLNO,EXPRESSION) VALUES(’LY610’,’SET610’ , ’S’,’GROHRES3’,’BRAND’ ,’A’,’BRAND_NUM’,’N’,1 ,’BRAND_NUM’); -- Create a new row in IBMSNAP_SUBS_COLS INSERT INTO ASN.IBMSNAP_SUBS_COLS(APPLY_QUAL,SET_NAME,WHOS_ON_FIRST, TARGET_OWNER,TARGET_TABLE,COL_TYPE,TARGET_NAME,IS_KEY,COLNO,EXPRESSION) VALUES(’LY610’,’SET610’ , ’S’,’GROHRES3’,’BRAND’ ,’A’,’DESCI’,’Y’, 2 ,’DESC’);*1 --* Commit work at cntl_ALIAS DJDB COMMIT; --* Connect to the SOURCE_ALIAS CONNECT TO SJ390DB1 USER db2res5 USING pwd; --* The ALIAS name ’SJ390DB1’ maps to RDBNAM ’DB2I
’
--* record the subscription in the pruning control table at the --* source server INSERT INTO ASN.IBMSNAP_PRUNCNTL( TARGET_SERVER,TARGET_OWNER, TARGET_TABLE,SOURCE_OWNER,SOURCE_TABLE,SOURCE_VIEW_QUAL,APPLY_QUAL, SET_NAME,CNTL_SERVER,TARGET_STRUCTURE,CNTL_ALIAS)VALUES(’DJDB’,’GROHRES3’,’BRAN D’,’ITSOSJ’,’BRAND’, 0 ,’LY610’,’SET610’,’DJDB’, 8 ,’DJDB’); --* Commit work at source_ALIAS SJ390DB1 COMMIT; --* Connect to the TARGET_ALIAS CONNECT TO DJDB
;
--* Set DJ two_phase commit off SET SERVER OPTION TWO_PHASE_COMMIT TO ’N’ FOR SERVER INFODB1; --* The ALIAS name ’DJDB
’ matches the RDBNAM ’DJDB’
-- using REXX logic file TARGSVR.REX --* If you don’t see: -- now done interpreting REXX logic file --* TARGSVR.REX, then check your REXX code -- in TARGSVR.REX 1
Here we change the TARGET_NAME from DESC to DESCI manually, corresponding to the change in next page.
344
The IBM Data Replication Solution
-- now done interpreting REXX logic file TARGSVR.REX -- The target table does not yet exist --* The target server is DataJoiner and no ’nickname’ yet exists for --* the target table, making it necessary to passthru DataJoiner to --* create the target table, then create a nickname for the new target --* in DataJoiner SET PASSTHRU INFODB1; -- Assuming MS SqlServer data types for the target table -- Create the target table GROHRES3.BRAND CREATE TABLE BRAND(BRAND_NUM DECIMAL(7 , 0) NOT NULL,DESCI CHAR(30) NOT NULL);*2 -- Create an index for the TARGET GROHRES3.BRAND CREATE UNIQUE INDEX BRANDX ON dbo.BRAND(DESCI);* --* Returning now to a local DataJoiner context COMMIT; SET PASSTHRU RESET; --* Create a DataJoiner nickname for the new target table in INFODB1 CREATE NICKNAME GROHRES3.BRAND FOR "INFODB1"."dbo"."BRAND"; --* Please resist the temptation to edit the CREATE NICKNAME --* definition above. DPRTOOLS relies on both the name and the --* qualifier of the nickname matching the name and qualifier of the --* target table. --* Commit work at target server DJDB COMMIT; --* Satisfactory completion of ADDMEMBR at 11:58am
D.4 Add Stored Procedure to Subscription Sets Add the stored procedure to the subscription set in order to invoke it after the apply program has applied the changed data to the target. --* Calling ADDSTMT for AQLY/SETLY pair # 4 --* --* echo input: ADDSTMT DJDB AQLY SETLY S A 1 C 0000002000 C_ITEM -- using REXX password file PASSWORD.REX -- If you don’t see: ’-- now done interpreting...’ then check your REXX code -- now done interpreting REXX password file PASSWORD.REX --* connect to the CNTL_ALIAS CONNECT TO DJDB USER grohres3 USING pwd; --* The ALIAS name ’DJDB
’ matches the RDBNAM ’DJDB’
-- current USERID=GROHRES3 CNTL_ALIAS alias=
25 Mar 1999 8:28pm
--* Fetching from the ASN.IBMSNAP_SUBS_SET table at DJDB ---* Fetching from the ASN.IBMSNAP_SUBS_STMTS table at DJDB --* There is no conflicting entry in the ASN.IBMSNAP_SUBS_STMTS table, --* so Ok to add. --* Unable to verify stored procedure names in advance, so will --* continue, assuming that the stored procedure name is Ok. -- create a new row in IBMSNAP_SUBS_STMTS 2
*the BRAND table has a column called ’DESC’, but MS SQL Server use ’DESC’ as a keep word, so it doesn’t support a table with such a column name, so we update the script here manaually, also update the statement for inserting rows to asn.ibmsnap_subs_col
DJRA Generated SQL for Case Study 2
345
INSERT INTO ASN.IBMSNAP_SUBS_STMTS(APPLY_QUAL,SET_NAME,WHOS_ON_FIRST, BEFORE_OR_AFTER,STMT_NUMBER,EI_OR_CALL,SQL_STMT,ACCEPT_SQLSTATES) VALUES(’AQLY’,’SETLY’,’S’,’A’, 1 ,’C’,’C_ITEM’,’0000002000’); -- increment the AUX_STMTS counter in IBMSNAP_SUBS_SET UPDATE ASN.IBMSNAP_SUBS_SET SET AUX_STMTS=AUX_STMTS + 1 WHERE APPLY_QUAL=’AQLY’ AND SET_NAME=’SETLY’ AND WHOS_ON_FIRST=’S’; --* commit work at cntl_ALIAS DJDB COMMIT; -- Satisfactory completion of ADDSTMT at 8:28pm
346
The IBM Data Replication Solution
Appendix E. DJRA Generated SQL for Case Study 3 This Appendix contains the SQL generated from DJRA for the various replication definitions which were configured in case study 3. The modifications to the generated SQL are shown in bold typeface.
E.1 Output from Define the SALES_SET Subscription Set --* File Name: create_sales_set.sql --* --* Calling ADDSET for set 1 : WHQ3/SALES_SET --* --* echo input: ADDSET DJDB WHQ3 SALES_SET SJ390DB1 DJDB --* 19990314000000 R 1440 NULL 30 --* -- using REXX password file PASSWORD.REX -- If you don’t see: ’-- now done interpreting...’ then check your REXX code -- now done interpreting REXX password file PASSWORD.REX --* CONNECTing TO SJ390DB1 USER db2res5 USING pwd ; --* --* The ALIAS name ’SJ390DB1’ maps to RDBNAM ’DB2I ’ --* --* CONNECTing TO DJDB USER djinst5 USING pwd; --* --* The ALIAS name ’DJDB’ matches the RDBNAM ’DJDB’ --* --* connect to the CNTL_ALIAS --* CONNECT TO DJDB USER djinst5 USING pwd; -- current USERID=DJINST5 CNTL_ALIAS alias=DJDB 15 Mar 1999 11:36am -- create a new row in IBMSNAP_SUBS_SET INSERT INTO ASN.IBMSNAP_SUBS_SET( ACTIVATE,APPLY_QUAL,SET_NAME, WHOS_ON_FIRST,SOURCE_SERVER,SOURCE_ALIAS,TARGET_SERVER,TARGET_ALIAS, STATUS,LASTRUN,REFRESH_TIMING,SLEEP_MINUTES,EVENT_NAME, MAX_SYNCH_MINUTES,AUX_STMTS,ARCH_LEVEL) VALUES (0 , ’WHQ1’ , ’SALES_SET’ , ’S’ , ’DB2I ’ , ’SJ390DB1’ , ’DJDB’ , ’DJDB’ , 0 , ’1999-03-14-00.00.00’ , ’R’ , 1440 ,NULL , 30 , 0 ,’0201’); -- create a new row in IBMSNAP_SUBS_SET INSERT INTO ASN.IBMSNAP_SUBS_SET( ACTIVATE,APPLY_QUAL,SET_NAME, WHOS_ON_FIRST,SOURCE_SERVER,SOURCE_ALIAS,TARGET_SERVER,TARGET_ALIAS, STATUS,LASTRUN,REFRESH_TIMING,SLEEP_MINUTES,EVENT_NAME, MAX_SYNCH_MINUTES,AUX_STMTS,ARCH_LEVEL) VALUES (0 , ’WHQ1’ , ’SALES_SET’ , ’F’ , ’DJDB’ , ’DJDB’ , ’DB2I ’ , ’SJ390DB1’ , 0 , ’1999-03-14-00.00.00’ , ’R’ , 1440 ,NULL , 30 , 0 ,’0201’); --* commit work at DJDB --* COMMIT; -- Satisfactory completion of ADDSET at 11:36am
© Copyright IBM Corp. 1999
347
E.2 Output from Register the Supplier Table --* File Name: register_supplier.sql --* --* echo input: TABLEREG SJ390DB1 ITSOSJ SUPPLIER AFTER NONEEXCLUDED --* DELETEINSERTUPDATE NONE --* -- using SRCESVR.REX as the REXX logic filename -- using REXX password file PASSWORD.REX -- If you don’t see: ’-- now done interpreting...’ -- then check your REXX code. -- now done interpreting REXX password file PASSWORD.REX -- connect to the source-server CONNECT TO SJ390DB1 USER db2res5 USING pwd ; -- USERID=DB2RES5 -- PRDID=DSN0501
SOURCE_ALIAS alias=SJ390DB1 11 Mar 1999 3:13pm
-- 1 candidate registrations, 9 already known to be registered -- The following tables are candidates for registration: -1 table ITSOSJ.SUPPLIER
-- registration candidate #1 ITSOSJ.SUPPLIER -- ITSOSJ.SUPPLIER is assumed a USER table -- reading REXX logic file SRCESVR.REX -- If you don’t see: ’-- now done interpreting...’ -- then check your REXX code. -- in SRCESVR.REX, about to create a change data tablespace CREATE TABLESPACE TSSUPPLI IN SJ390DB1 SEGSIZE 4 LOCKSIZE TABLE CLOSE NO CCSID EBCDIC; -- now done interpreting REXX logic file SRCESVR.REX
-- enable change data capture ALTER TABLE ITSOSJ.SUPPLIER DATA CAPTURE CHANGES; -- selecting ’X’ as the before-image prefix character -- create the cd/ccd table for ITSOSJ.SUPPLIER CREATE TABLE ITSOSJ.CDSUPPLIER(IBMSNAP_UOWID CHAR(10) FOR BIT DATA NOT NULL,IBMSNAP_INTENTSEQ CHAR(10) FOR BIT DATA NOT NULL, IBMSNAP_OPERATION CHAR(1) NOT NULL,SUPP_NO DECIMAL(7 , 0) NOT NULL, SUPP_NAME CHAR(25) NOT NULL) IN SJ390DB1.TSSUPPLI; -- create the index for the change data table for ITSOSJ.SUPPLIER CREATE TYPE 2 UNIQUE INDEX ITSOSJ.CDI00000CDSUPPLIER ON ITSOSJ.CDSUPPLIER(IBMSNAP_UOWID ASC, IBMSNAP_INTENTSEQ ASC); -- insert a registration record into ASN.IBMSNAP_REGISTER INSERT INTO ASN.IBMSNAP_REGISTER(GLOBAL_RECORD,SOURCE_OWNER, SOURCE_TABLE,SOURCE_VIEW_QUAL,SOURCE_STRUCTURE,SOURCE_CONDENSED, SOURCE_COMPLETE,CD_OWNER,CD_TABLE,PHYS_CHANGE_OWNER,PHYS_CHANGE_TABLE, DISABLE_REFRESH,ARCH_LEVEL,BEFORE_IMG_PREFIX,CONFLICT_LEVEL, PARTITION_KEYS_CHG) VALUES(’N’,’ITSOSJ’,’SUPPLIER’, 0 , 1 ,’Y’,’Y’, ’ITSOSJ’,’CDSUPPLIER’,’ITSOSJ’,’CDSUPPLIER’, 1 ,’0201’,NULL,’0’,’N’);
348
The IBM Data Replication Solution
COMMIT; -- Satisfactory completion at 3:13pm
E.3 Output from Subscribe to the Supplier Table --* File Name: subs_supplier1.sql --* -- ***************************************************** -- File was edited as follows : -- 1.CREATE TABLE statement changed : --a) CREATE TABLESPACE statement commented out. -b) TABLESPACE changed to Oracle server mapping -AZOVORA8 (nickname will automatically be -- created). --c)EXPIRED_TIMESTAMP column added for temporal histories -- ***************************************************** --* --* Calling C:\DPRTools\addmembr.rex for WHQ1/SALES_SET pair # 3 --* --* Echo input: ADDMEMBR DJDB WHQ1 SALES_SET ITSOSJ SUPPLIER --* NONEEXECLUDED CCD=NYNNN NONE SIMON SUPPLIER NODATAJOINER U --* -- using REXX password file PASSWORD.REX -- If you don’t see: ’-- now done interpreting...’ then check your REXX code -- now done interpreting REXX password file PASSWORD.REX --* Connect to the CNTL_ALIAS --* CONNECT TO DJDB USER djinst5 USING pwd; --* The ALIAS name ’DJDB ’ matches the RDBNAM ’DJDB’ --* --* Current USERID=DJINST5 CNTL_ALIAS alias= 15 Mar 1999 1:55pm --* Fetching from the ASN.IBMSNAP_SUBS_SET table at DJDB --* --* CONNECTing TO SJ390DB1 USER db2res5 USING pwd ; --* --* The ALIAS name ’SJ390DB1’ maps to RDBNAM ’DB2I ’ --* --* Fetching from the ASN.IBMSNAP_REGISTER table at SJ390DB1 --* -- using REXX logic file CNTLSVR.REX --* --* --* --* --* --*
If you don’t see: ’--* now done interpreting REXX logic file CNTLSVR.REX’, then check your REXX code The subscription predicate was not changed by the user logic in CNTLSVR.REX now done interpreting REXX logic file CNTLSVR.REX
-- create a new row in IBMSNAP_SUBS_MEMBR INSERT INTO ASN.IBMSNAP_SUBS_MEMBR( APPLY_QUAL,SET_NAME,WHOS_ON_FIRST, SOURCE_OWNER,SOURCE_TABLE,SOURCE_VIEW_QUAL,TARGET_OWNER,TARGET_TABLE, TARGET_CONDENSED,TARGET_COMPLETE,TARGET_STRUCTURE,PREDICATES) VALUES ( ’WHQ1’ , ’SALES_SET’ , ’S’ , ’ITSOSJ’ , ’SUPPLIER’ , 0 ,’SIMON’, ’SUPPLIER’ ,’N’ ,’Y’ , 3 ,NULL); -- Create a new row in IBMSNAP_SUBS_COLS INSERT INTO ASN.IBMSNAP_SUBS_COLS(APPLY_QUAL,SET_NAME,WHOS_ON_FIRST,
DJRA Generated SQL for Case Study 3
349
TARGET_OWNER,TARGET_TABLE,COL_TYPE,TARGET_NAME,IS_KEY,COLNO,EXPRESSION) VALUES(’WHQ1’,’SALES_SET’ , ’S’,’SIMON’,’SUPPLIER’ ,’A’,’SUPP_NO’,’N’, 1 ,’SUPP_NO’); -- Create a new row in IBMSNAP_SUBS_COLS INSERT INTO ASN.IBMSNAP_SUBS_COLS(APPLY_QUAL,SET_NAME,WHOS_ON_FIRST, TARGET_OWNER,TARGET_TABLE,COL_TYPE,TARGET_NAME,IS_KEY,COLNO,EXPRESSION) VALUES(’WHQ1’,’SALES_SET’ , ’S’,’SIMON’,’SUPPLIER’ ,’A’,’SUPP_NAME’, ’N’, 2 ,’SUPP_NAME’); --* I noticed the set subscription is inactive --* UPDATE ASN.IBMSNAP_SUBS_SET SET ACTIVATE=1 WHERE APPLY_QUAL=’WHQ1’ AND SET_NAME=’SALES_SET’ AND WHOS_ON_FIRST=’S’; --* Commit work at cntl_ALIAS DJDB --* COMMIT; --* Connect to the SOURCE_ALIAS --* CONNECT TO SJ390DB1 USER db2res5 USING pwd ; --* The ALIAS name ’SJ390DB1’ maps to RDBNAM ’DB2I ’ --* --* record the subscription in the pruning control table at the --* source server --* INSERT INTO ASN.IBMSNAP_PRUNCNTL( TARGET_SERVER,TARGET_OWNER, TARGET_TABLE,SOURCE_OWNER,SOURCE_TABLE,SOURCE_VIEW_QUAL,APPLY_QUAL, SET_NAME,CNTL_SERVER,TARGET_STRUCTURE,CNTL_ALIAS)VALUES(’DJDB’,’SIMON’, ’SUPPLIER’,’ITSOSJ’,’SUPPLIER’, 0 ,’WHQ1’,’SALES_SET’,’DJDB’, 3 , ’DJDB’); --* Commit work at source_ALIAS SJ390DB1 --* COMMIT; --* Connect to the TARGET_ALIAS CONNECT TO DJDB USER djinst5 USING pwd; --* The ALIAS name ’DJDB ’ matches the RDBNAM ’DJDB’ --* -- using REXX logic file TARGSVR.REX --* If you don’t see: -- now done interpreting REXX logic file --* TARGSVR.REX, then check your REXX code --* -- in TARGSVR.REX -- About to create a target table tablespace --CREATE TABLESPACE TSSUPPLIER MANAGED BY DATABASE USING (FILE ’/data/djinst5/djinst5/SUPPLIER.F1’ 2000 ); -- now done interpreting REXX logic file TARGSVR.REX -- The target table does not yet exist -- Not remote to DataJoiner target -- Create the target table SIMON.SUPPLIER CREATE TABLE SIMON.SUPPLIER(SUPP_NO DECIMAL(7 , 0) NOT NULL,SUPP_NAME CHAR(25) NOT NULL,IBMSNAP_INTENTSEQ CHAR(10) FOR BIT DATA NOT NULL, IBMSNAP_OPERATION CHAR(1) NOT NULL,IBMSNAP_COMMITSEQ CHAR(10) FOR BIT DATA NOT NULL,IBMSNAP_LOGMARKER TIMESTAMP NOT NULL, EXPIRED_TIMESTAMP TIMESTAMP) IN AZOVORA8;
350
The IBM Data Replication Solution
--* Commit work at target server DJDB --* COMMIT;
E.4 Output from Register the Store and Region Tables --* File Name: register_store+region.sql --* --* Calling TABLEREG for source table ITSOSJ.REGION --* --* echo input: TABLEREG SJ390DB1 ITSOSJ REGION AFTER NONEXCLUDED --* DELETEINSERTUPDATE NONE N --* -- using SRCESVR.REX as the REXX logic filename -- using REXX password file PASSWORD.REX -- If you don’t see: ’-- now done interpreting...’ -- then check your REXX code. -- now done interpreting REXX password file PASSWORD.REX -- connect to the source-server CONNECT TO SJ390DB1 USER db2res5 USING pwd; -- USERID=DB2RES5 -- PRDID=DSN0501
SOURCE_ALIAS alias=SJ390DB1 17 Mar 1999 9:27am
-- 1 candidate registrations, 10 already known to be registered -- The following tables are candidates for registration: -1 table ITSOSJ.REGION
-- registration candidate #1 ITSOSJ.REGION -- ITSOSJ.REGION is assumed a USER table -- reading REXX logic file SRCESVR.REX -- If you don’t see: ’-- now done interpreting...’ -- then check your REXX code. -- in SRCESVR.REX, about to create a change data tablespace CREATE TABLESPACE TSREGION IN SJ390DB1 SEGSIZE 4 LOCKSIZE TABLE CLOSE NO CCSID EBCDIC; -- now done interpreting REXX logic file SRCESVR.REX --* Source table ITSOSJ.REGION already has CDC attribute, no need to --* alter. -- selecting ’X’ as the before-image prefix character -- create the cd/ccd table for ITSOSJ.REGION CREATE TABLE ITSOSJ.CDREGION(IBMSNAP_UOWID CHAR(10) FOR BIT DATA NOT NULL,IBMSNAP_INTENTSEQ CHAR(10) FOR BIT DATA NOT NULL, IBMSNAP_OPERATION CHAR(1) NOT NULL,REGION_ID DECIMAL(3 , 0) NOT NULL, REGION_NAME CHAR(30)) IN SJ390DB1.TSREGION; -- create the index for the change data table for ITSOSJ.REGION CREATE TYPE 2 UNIQUE INDEX ITSOSJ.CDI0000000CDREGION ON ITSOSJ.CDREGION(IBMSNAP_UOWID ASC, IBMSNAP_INTENTSEQ ASC); -- insert a registration record into ASN.IBMSNAP_REGISTER
DJRA Generated SQL for Case Study 3
351
INSERT INTO ASN.IBMSNAP_REGISTER(GLOBAL_RECORD,SOURCE_OWNER, SOURCE_TABLE,SOURCE_VIEW_QUAL,SOURCE_STRUCTURE,SOURCE_CONDENSED, SOURCE_COMPLETE,CD_OWNER,CD_TABLE,PHYS_CHANGE_OWNER,PHYS_CHANGE_TABLE, DISABLE_REFRESH,ARCH_LEVEL,BEFORE_IMG_PREFIX,CONFLICT_LEVEL, PARTITION_KEYS_CHG) VALUES(’N’,’ITSOSJ’,’REGION’, 0 , 1 ,’Y’,’Y’, ’ITSOSJ’,’CDREGION’,’ITSOSJ’,’CDREGION’, 1 ,’0201’,NULL,’0’,’N’); COMMIT; -- Satisfactory completion at 9:27am --* --* Calling TABLEREG for source table ITSOSJ.STORE --* --* echo input: TABLEREG SJ390DB1 ITSOSJ STORE AFTER NONEXCLUDED --* DELETEINSERTUPDATE NONE N --* -- using SRCESVR.REX as the REXX logic filename -- using REXX password file PASSWORD.REX -- If you don’t see: ’-- now done interpreting...’ -- then check your REXX code. -- now done interpreting REXX password file PASSWORD.REX -- connect to the source-server CONNECT TO SJ390DB1 USER db2res5 USING pwd ; -- USERID=DB2RES5 -- PRDID=DSN0501
SOURCE_ALIAS alias=SJ390DB1 17 Mar 1999 9:27am
-- 1 candidate registrations, 10 already known to be registered -- The following tables are candidates for registration: -1 table ITSOSJ.STORE
-- registration candidate #1 ITSOSJ.STORE -- ITSOSJ.STORE is assumed a USER table -- reading REXX logic file SRCESVR.REX -- If you don’t see: ’-- now done interpreting...’ -- then check your REXX code. -- in SRCESVR.REX, about to create a change data tablespace CREATE TABLESPACE TSSTORE IN SJ390DB1 SEGSIZE 4 LOCKSIZE TABLE CLOSE NO CCSID EBCDIC; -- now done interpreting REXX logic file SRCESVR.REX --* Source table ITSOSJ.STORE already has CDC attribute, no need to --* alter. -- selecting ’X’ as the before-image prefix character -- create the cd/ccd table for ITSOSJ.STORE CREATE TABLE ITSOSJ.CDSTORE(IBMSNAP_UOWID CHAR(10) FOR BIT DATA NOT NULL,IBMSNAP_INTENTSEQ CHAR(10) FOR BIT DATA NOT NULL, IBMSNAP_OPERATION CHAR(1) NOT NULL,COMPNO DECIMAL(3 , 0) NOT NULL, STORE_NUM DECIMAL(3 , 0) NOT NULL,NAME CHAR(25) NOT NULL,STREET CHAR( 25) NOT NULL,ZIP DECIMAL(5 , 0) NOT NULL,CITY CHAR(20) NOT NULL, REGION_ID DECIMAL(3 , 0) NOT NULL) IN SJ390DB1.TSSTORE ; -- create the index for the change data table for ITSOSJ.STORE CREATE TYPE 2 UNIQUE INDEX ITSOSJ.CDI00000000CDSTORE ON ITSOSJ.CDSTORE( IBMSNAP_UOWID ASC, IBMSNAP_INTENTSEQ ASC);
352
The IBM Data Replication Solution
-- insert a registration record into ASN.IBMSNAP_REGISTER INSERT INTO ASN.IBMSNAP_REGISTER(GLOBAL_RECORD,SOURCE_OWNER, SOURCE_TABLE,SOURCE_VIEW_QUAL,SOURCE_STRUCTURE,SOURCE_CONDENSED, SOURCE_COMPLETE,CD_OWNER,CD_TABLE,PHYS_CHANGE_OWNER,PHYS_CHANGE_TABLE, DISABLE_REFRESH,ARCH_LEVEL,BEFORE_IMG_PREFIX,CONFLICT_LEVEL, PARTITION_KEYS_CHG) VALUES(’N’,’ITSOSJ’,’STORE’, 0 , 1 ,’Y’,’Y’, ’ITSOSJ’,’CDSTORE’,’ITSOSJ’,’CDSTORE’, 1 ,’0201’,NULL,’0’,’N’); COMMIT; -- Satisfactory completion at 9:27am
E.5 Output from Subscribe to the Region Table --* File Name: subs_region.sql --* --* Calling C:\DPRTools\addmembr.rex for WHQ1/SALES_SET pair # 1 --* --* Echo input: ADDMEMBR DJDB WHQ1 SALES_SET ITSOSJ REGION --* NONEEXECLUDED PIT REGION_ID+ DJINST5 REGION AZOVORA8 U --* ’IBMSNAP_OPERATION = ’I’’ --* -- using REXX password file PASSWORD.REX -- If you don’t see: ’-- now done interpreting...’ then check your REXX code -- now done interpreting REXX password file PASSWORD.REX --* Connect to the CNTL_ALIAS --* CONNECT TO DJDB USER djinst5 USING pwd; --* The ALIAS name ’DJDB ’ matches the RDBNAM ’DJDB’ --* --* Current USERID=DJINST5 CNTL_ALIAS alias= 17 Mar 1999 9:42am --* Fetching from the ASN.IBMSNAP_SUBS_SET table at DJDB --* --* CONNECTing TO SJ390DB1 USER db2res5 USING pwd ; --* --* The ALIAS name ’SJ390DB1’ maps to RDBNAM ’DB2I ’ --* --* Fetching from the ASN.IBMSNAP_REGISTER table at SJ390DB1 --* -- using REXX logic file CNTLSVR.REX --* --* --* --* --* --*
If you don’t see: ’--* now done interpreting REXX logic file CNTLSVR.REX’, then check your REXX code The subscription predicate was not changed by the user logic in CNTLSVR.REX now done interpreting REXX logic file CNTLSVR.REX
-- create a new row in IBMSNAP_SUBS_MEMBR INSERT INTO ASN.IBMSNAP_SUBS_MEMBR( APPLY_QUAL,SET_NAME,WHOS_ON_FIRST, SOURCE_OWNER,SOURCE_TABLE,SOURCE_VIEW_QUAL,TARGET_OWNER,TARGET_TABLE, TARGET_CONDENSED,TARGET_COMPLETE,TARGET_STRUCTURE,PREDICATES) VALUES ( ’WHQ1’ , ’SALES_SET’ , ’S’ , ’ITSOSJ’ , ’REGION’ , 0 ,’DJINST5’, ’REGION’,’Y’,’Y’, 4 , ’IBMSNAP_OPERATION = ’’I’’’); -- Create a new row in IBMSNAP_SUBS_COLS INSERT INTO ASN.IBMSNAP_SUBS_COLS(APPLY_QUAL,SET_NAME,WHOS_ON_FIRST,
DJRA Generated SQL for Case Study 3
353
TARGET_OWNER,TARGET_TABLE,COL_TYPE,TARGET_NAME,IS_KEY,COLNO,EXPRESSION) VALUES(’WHQ1’,’SALES_SET’ , ’S’,’DJINST5’,’REGION’ ,’A’,’REGION_ID’, ’Y’, 1 ,’REGION_ID’); -- Create a new row in IBMSNAP_SUBS_COLS INSERT INTO ASN.IBMSNAP_SUBS_COLS(APPLY_QUAL,SET_NAME,WHOS_ON_FIRST, TARGET_OWNER,TARGET_TABLE,COL_TYPE,TARGET_NAME,IS_KEY,COLNO,EXPRESSION) VALUES(’WHQ1’,’SALES_SET’ , ’S’,’DJINST5’,’REGION’ ,’A’,’REGION_NAME’, ’N’, 2 ,’REGION_NAME’); --* Commit work at cntl_ALIAS DJDB --* COMMIT; --* Connect to the SOURCE_ALIAS --* CONNECT TO SJ390DB1 USER db2res5 USING pwd ; --* The ALIAS name ’SJ390DB1’ maps to RDBNAM ’DB2I ’ --* --* record the subscription in the pruning control table at the --* source server --* INSERT INTO ASN.IBMSNAP_PRUNCNTL( TARGET_SERVER,TARGET_OWNER, TARGET_TABLE,SOURCE_OWNER,SOURCE_TABLE,SOURCE_VIEW_QUAL,APPLY_QUAL, SET_NAME,CNTL_SERVER,TARGET_STRUCTURE,CNTL_ALIAS)VALUES(’DJDB’, ’DJINST5’,’REGION’,’ITSOSJ’,’REGION’, 0 ,’WHQ1’,’SALES_SET’,’DJDB’, 4 , ’DJDB’); --* Commit work at source_ALIAS SJ390DB1 --* COMMIT; --* Connect to the TARGET_ALIAS CONNECT TO DJDB USER djinst5 USING pwd; --* Set DJ two_phase commit off --* SET SERVER OPTION TWO_PHASE_COMMIT TO ’N’ FOR SERVER AZOVORA8; --* The ALIAS name ’DJDB ’ matches the RDBNAM ’DJDB’ --* -- using REXX logic file TARGSVR.REX --* If you don’t see: -- now done interpreting REXX logic file --* TARGSVR.REX, then check your REXX code --* -- in TARGSVR.REX -- now done interpreting REXX logic file TARGSVR.REX -- The target table does not yet exist --* The target server is DataJoiner and no ’nickname’ yet exists for --* the target table, making it necessary to passthru DataJoiner to --* create the target table, then create a nickname for the new target --* in DataJoiner --* SET PASSTHRU AZOVORA8; -- Assuming Oracle data types for the target table -- Create the target table DJINST5.REGION CREATE TABLE SIMON.REGION(REGION_ID DECIMAL(3 , 0) NOT NULL, REGION_NAME CHAR(30),IBMSNAP_LOGMARKER DATE); -- Create an index for the TARGET DJINST5.REGION CREATE UNIQUE INDEX SIMON.REGION ON SIMON.REGION(REGION_ID ASC);
354
The IBM Data Replication Solution
--* Returning now to a local DataJoiner context --* COMMIT; SET PASSTHRU RESET; --* Create a DataJoiner nickname for the new target table in AZOVORA8 --* CREATE NICKNAME DJINST5.REGION FOR AZOVORA8.SIMON.REGION; --* Please resist the temptation to edit the CREATE NICKNAME --* definition above. DPRTOOLS relies on both the name and the --* qualifier of the nickname matching the name and qualifier of the --* target table. --* --* Fixup DataJoiner data type mapping --* ALTER NICKNAME DJINST5.REGION SET COLUMN REGION_ID LOCAL TYPE DECIMAL( 3 , 0); --* Commit work at target server DJDB --* COMMIT; --* Satisfactory completion of ADDMEMBR at 9:42am
E.6 Output from Subscribe to the Store Table --* File Name: subs_store.sql --* -- ***************************************************** -- File was edited as follows : -- 1.CREATE TABLE statement changed : --a) CREATE TABLESPACE statement commented out. -b) TABLESPACE changed to Oracle server mapping -AZOVORA8 (nickname will automatically be -br created). --c)EXPIRED_TIMESTAMP column added for temporal histories -- ***************************************************** --* --* Calling C:\DPRTools\addmembr.rex for WHQ1/SALES_SET pair # 1 --* --* Echo input: ADDMEMBR DJDB WHQ1 SALES_SET ITSOSJ STORE --* NONEEXECLUDED CCD=NYNNN NONE SIMON STORE NODATAJOINER U --* ’IBMSNAP_OPERATION IN (’I’,’U’)’ --* -- using REXX password file PASSWORD.REX -- If you don’t see: ’-- now done interpreting...’ then check your REXX code -- now done interpreting REXX password file PASSWORD.REX --* Connect to the CNTL_ALIAS --* CONNECT TO DJDB USER djinst5 USING pwd; --* --* --* --* --* --* --* --*
The ALIAS name ’DJDB
’ matches the RDBNAM ’DJDB’
Current USERID=DJINST5 CNTL_ALIAS alias= 17 Mar 1999 9:56am Fetching from the ASN.IBMSNAP_SUBS_SET table at DJDB CONNECTing TO SJ390DB1 USER db2res5 USING pwd ; The ALIAS name ’SJ390DB1’ maps to RDBNAM ’DB2I
’
DJRA Generated SQL for Case Study 3
355
--* --* Fetching from the ASN.IBMSNAP_REGISTER table at SJ390DB1 --* -- using REXX logic file CNTLSVR.REX --* --* --* --* --* --*
If you don’t see: ’--* now done interpreting REXX logic file CNTLSVR.REX’, then check your REXX code The subscription predicate was not changed by the user logic in CNTLSVR.REX now done interpreting REXX logic file CNTLSVR.REX
-- create a new row in IBMSNAP_SUBS_MEMBR INSERT INTO ASN.IBMSNAP_SUBS_MEMBR( APPLY_QUAL,SET_NAME,WHOS_ON_FIRST, SOURCE_OWNER,SOURCE_TABLE,SOURCE_VIEW_QUAL,TARGET_OWNER,TARGET_TABLE, TARGET_CONDENSED,TARGET_COMPLETE,TARGET_STRUCTURE,PREDICATES) VALUES ( ’WHQ1’ , ’SALES_SET’ , ’S’ , ’ITSOSJ’ , ’STORE’ , 0 ,’SIMON’,’STORE’ ,’N’ ,’Y’ , 3 ,NULL); -- Create a new row in IBMSNAP_SUBS_COLS INSERT INTO ASN.IBMSNAP_SUBS_COLS(APPLY_QUAL,SET_NAME,WHOS_ON_FIRST, TARGET_OWNER,TARGET_TABLE,COL_TYPE,TARGET_NAME,IS_KEY,COLNO,EXPRESSION) VALUES(’WHQ1’,’SALES_SET’ , ’S’,’SIMON’,’STORE’ ,’A’,’COMPNO’,’N’, 1 , ’COMPNO’); -- Create a new row in IBMSNAP_SUBS_COLS INSERT INTO ASN.IBMSNAP_SUBS_COLS(APPLY_QUAL,SET_NAME,WHOS_ON_FIRST, TARGET_OWNER,TARGET_TABLE,COL_TYPE,TARGET_NAME,IS_KEY,COLNO,EXPRESSION) VALUES(’WHQ1’,’SALES_SET’ , ’S’,’SIMON’,’STORE’ ,’A’,’STORE_NUM’,’N’, 2 ,’STORE_NUM’); -- Create a new row in IBMSNAP_SUBS_COLS INSERT INTO ASN.IBMSNAP_SUBS_COLS(APPLY_QUAL,SET_NAME,WHOS_ON_FIRST, TARGET_OWNER,TARGET_TABLE,COL_TYPE,TARGET_NAME,IS_KEY,COLNO,EXPRESSION) VALUES(’WHQ1’,’SALES_SET’ , ’S’,’SIMON’,’STORE’ ,’A’,’NAME’,’N’, 3 , ’NAME’); -- Create a new row in IBMSNAP_SUBS_COLS INSERT INTO ASN.IBMSNAP_SUBS_COLS(APPLY_QUAL,SET_NAME,WHOS_ON_FIRST, TARGET_OWNER,TARGET_TABLE,COL_TYPE,TARGET_NAME,IS_KEY,COLNO,EXPRESSION) VALUES(’WHQ1’,’SALES_SET’ , ’S’,’SIMON’,’STORE’ ,’A’,’STREET’,’N’, 4 , ’STREET’); -- Create a new row in IBMSNAP_SUBS_COLS INSERT INTO ASN.IBMSNAP_SUBS_COLS(APPLY_QUAL,SET_NAME,WHOS_ON_FIRST, TARGET_OWNER,TARGET_TABLE,COL_TYPE,TARGET_NAME,IS_KEY,COLNO,EXPRESSION) VALUES(’WHQ1’,’SALES_SET’ , ’S’,’SIMON’,’STORE’ ,’A’,’ZIP’,’N’, 5 , ’ZIP’); -- Create a new row in IBMSNAP_SUBS_COLS INSERT INTO ASN.IBMSNAP_SUBS_COLS(APPLY_QUAL,SET_NAME,WHOS_ON_FIRST, TARGET_OWNER,TARGET_TABLE,COL_TYPE,TARGET_NAME,IS_KEY,COLNO,EXPRESSION) VALUES(’WHQ1’,’SALES_SET’ , ’S’,’SIMON’,’STORE’ ,’A’,’CITY’,’N’, 6 , ’CITY’); -- Create a new row in IBMSNAP_SUBS_COLS INSERT INTO ASN.IBMSNAP_SUBS_COLS(APPLY_QUAL,SET_NAME,WHOS_ON_FIRST, TARGET_OWNER,TARGET_TABLE,COL_TYPE,TARGET_NAME,IS_KEY,COLNO,EXPRESSION) VALUES(’WHQ1’,’SALES_SET’ , ’S’,’SIMON’,’STORE’ ,’A’,’REGION_ID’,’N’, 7 ,’REGION_ID’); --* Commit work at cntl_ALIAS DJDB --*
356
The IBM Data Replication Solution
COMMIT; --* Connect to the SOURCE_ALIAS --* CONNECT TO SJ390DB1 USER db2res5 USING pwd ; --* The ALIAS name ’SJ390DB1’ maps to RDBNAM ’DB2I ’ --* --* record the subscription in the pruning control table at the --* source server --* INSERT INTO ASN.IBMSNAP_PRUNCNTL( TARGET_SERVER,TARGET_OWNER, TARGET_TABLE,SOURCE_OWNER,SOURCE_TABLE,SOURCE_VIEW_QUAL,APPLY_QUAL, SET_NAME,CNTL_SERVER,TARGET_STRUCTURE,CNTL_ALIAS)VALUES(’DJDB’,’SIMON’, ’STORE’,’ITSOSJ’,’STORE’, 0 ,’WHQ1’,’SALES_SET’,’DJDB’, 3 ,’DJDB’); --* Commit work at source_ALIAS SJ390DB1 --* COMMIT; --* Connect to the TARGET_ALIAS CONNECT TO DJDB USER djinst5 USING pwd; --* The ALIAS name ’DJDB ’ matches the RDBNAM ’DJDB’ --* -- using REXX logic file TARGSVR.REX --* If you don’t see: -- now done interpreting REXX logic file --* TARGSVR.REX, then check your REXX code --* -- in TARGSVR.REX -- About to create a target table tablespace -- CREATE TABLESPACE TSSTORE MANAGED BY DATABASE USING (FILE ’/data/djinst5/djinst5/STORE.F1’ 2000 ); -- now done interpreting REXX logic file TARGSVR.REX -- The target table does not yet exist -- Not remote to DataJoiner target -- Create the target table SIMON.STORE CREATE TABLE SIMON.STORE(COMPNO DECIMAL(3 , 0) NOT NULL,STORE_NUM DECIMAL(3 , 0) NOT NULL,NAME CHAR(25) NOT NULL,STREET CHAR(25) NOT NULL,ZIP DECIMAL(5 , 0) NOT NULL,CITY CHAR(20) NOT NULL,REGION_ID DECIMAL(3 , 0) NOT NULL,IBMSNAP_INTENTSEQ CHAR(10) FOR BIT DATA NOT NULL,IBMSNAP_OPERATION CHAR(1) NOT NULL,IBMSNAP_COMMITSEQ CHAR(10) FOR BIT DATA NOT NULL,IBMSNAP_LOGMARKER TIMESTAMP NOT NULL, EXPIRED_TIMESTAMP TIMESTAMP) IN AZOVORA8; --* Commit work at target server DJDB --* COMMIT; --* Satisfactory completion of ADDMEMBR at 9:56am
E.7 Output from Register the Items, ProdLine, and Brand Tables --* File Name: register_items+prodline+brand.sql --* --* Calling TABLEREG for source table ITSOSJ.BRAND --*
DJRA Generated SQL for Case Study 3
357
--* echo input: TABLEREG SJ390DB1 ITSOSJ BRAND AFTER NONEXCLUDED --* DELETEINSERTUPDATE NONE --* -- using SRCESVR.REX as the REXX logic filename -- using REXX password file PASSWORD.REX -- If you don’t see: ’-- now done interpreting...’ -- then check your REXX code. -- now done interpreting REXX password file PASSWORD.REX -- connect to the source-server CONNECT TO SJ390DB1 USER db2res5 USING pwd ; -- USERID=DB2RES5 -- PRDID=DSN0501
SOURCE_ALIAS alias=SJ390DB1 9 Mar 1999 3:27pm
-- 1 candidate registrations, 3 already known to be registered -- The following tables are candidates for registration: -1 table ITSOSJ.BRAND
-- registration candidate #1 ITSOSJ.BRAND -- ITSOSJ.BRAND is assumed a USER table -- reading REXX logic file SRCESVR.REX -- If you don’t see: ’-- now done interpreting...’ -- then check your REXX code. -- in SRCESVR.REX, about to create a change data tablespace CREATE TABLESPACE TSBRAND IN SJ390DB1 SEGSIZE 4 LOCKSIZE TABLE CLOSE NO CCSID EBCDIC; -- now done interpreting REXX logic file SRCESVR.REX
-- enable change data capture ALTER TABLE ITSOSJ.BRAND DATA CAPTURE CHANGES; -- selecting ’X’ as the before-image prefix character -- create the cd/ccd table for ITSOSJ.BRAND CREATE TABLE ITSOSJ.CDBRAND(IBMSNAP_UOWID CHAR(10) FOR BIT DATA NOT NULL,IBMSNAP_INTENTSEQ CHAR(10) FOR BIT DATA NOT NULL, IBMSNAP_OPERATION CHAR(1) NOT NULL,BRAND_NUM DECIMAL(7 , 0) NOT NULL, DESC CHAR(30) NOT NULL) IN SJ390DB1.TSBRAND ; -- create the index for the change data table for ITSOSJ.BRAND CREATE TYPE 2 UNIQUE INDEX ITSOSJ.CDI00000000CDBRAND ON ITSOSJ.CDBRAND( IBMSNAP_UOWID ASC, IBMSNAP_INTENTSEQ ASC); -- insert a registration record into ASN.IBMSNAP_REGISTER INSERT INTO ASN.IBMSNAP_REGISTER(GLOBAL_RECORD,SOURCE_OWNER, SOURCE_TABLE,SOURCE_VIEW_QUAL,SOURCE_STRUCTURE,SOURCE_CONDENSED, SOURCE_COMPLETE,CD_OWNER,CD_TABLE,PHYS_CHANGE_OWNER,PHYS_CHANGE_TABLE, DISABLE_REFRESH,ARCH_LEVEL,BEFORE_IMG_PREFIX,CONFLICT_LEVEL, PARTITION_KEYS_CHG) VALUES(’N’,’ITSOSJ’,’BRAND’, 0 , 1 ,’Y’,’Y’, ’ITSOSJ’,’CDBRAND’,’ITSOSJ’,’CDBRAND’, 1,’0201’,NULL,’0’,’N’); COMMIT; -- Disabled FULLREFRESH by setting DISABLE_REFRESH=1 above.
358
The IBM Data Replication Solution
-- Satisfactory completion at 3:27pm --* --* Calling TABLEREG for source table ITSOSJ.ITEMS --* --* echo input: TABLEREG SJ390DB1 ITSOSJ ITEMS AFTER NONEXCLUDED --* DELETEINSERTUPDATE NONE --* -- using SRCESVR.REX as the REXX logic filename -- using REXX password file PASSWORD.REX -- If you don’t see: ’-- now done interpreting...’ -- then check your REXX code. -- now done interpreting REXX password file PASSWORD.REX -- connect to the source-server CONNECT TO SJ390DB1 USER db2res5 USING pwd ; -- USERID=DB2RES5 -- PRDID=DSN0501
SOURCE_ALIAS alias=SJ390DB1 9 Mar 1999 3:27pm
-- 1 candidate registrations, 3 already known to be registered -- The following tables are candidates for registration: -1 table ITSOSJ.ITEMS
-- registration candidate #1 ITSOSJ.ITEMS -- ITSOSJ.ITEMS is assumed a USER table -- reading REXX logic file SRCESVR.REX -- If you don’t see: ’-- now done interpreting...’ -- then check your REXX code. -- in SRCESVR.REX, about to create a change data tablespace CREATE TABLESPACE TSITEMS IN SJ390DB1 SEGSIZE 4 LOCKSIZE TABLE CLOSE NO CCSID EBCDIC; -- now done interpreting REXX logic file SRCESVR.REX
-- enable change data capture ALTER TABLE ITSOSJ.ITEMS DATA CAPTURE CHANGES; -- selecting ’X’ as the before-image prefix character -- create the cd/ccd table for ITSOSJ.ITEMS CREATE TABLE ITSOSJ.CDITEMS(IBMSNAP_UOWID CHAR(10) FOR BIT DATA NOT NULL,IBMSNAP_INTENTSEQ CHAR(10) FOR BIT DATA NOT NULL, IBMSNAP_OPERATION CHAR(1) NOT NULL,ITEM_NUM DECIMAL(13 , 0) NOT NULL, DESC VARCHAR(150) NOT NULL,PROD_LINE_NUM DECIMAL(7 , 0) NOT NULL, SUPP_NO DECIMAL(13 , 0) NOT NULL) IN SJ390DB1.TSITEMS ; -- create the index for the change data table for ITSOSJ.ITEMS CREATE TYPE 2 UNIQUE INDEX ITSOSJ.CDI00000000CDITEMS ON ITSOSJ.CDITEMS( IBMSNAP_UOWID ASC, IBMSNAP_INTENTSEQ ASC); -- insert a registration record into ASN.IBMSNAP_REGISTER INSERT INTO ASN.IBMSNAP_REGISTER(GLOBAL_RECORD,SOURCE_OWNER, SOURCE_TABLE,SOURCE_VIEW_QUAL,SOURCE_STRUCTURE,SOURCE_CONDENSED, SOURCE_COMPLETE,CD_OWNER,CD_TABLE,PHYS_CHANGE_OWNER,PHYS_CHANGE_TABLE, DISABLE_REFRESH,ARCH_LEVEL,BEFORE_IMG_PREFIX,CONFLICT_LEVEL, PARTITION_KEYS_CHG) VALUES(’N’,’ITSOSJ’,’ITEMS’, 0 , 1 ,’Y’,’Y’, ’ITSOSJ’,’CDITEMS’,’ITSOSJ’,’CDITEMS’, 1 ,’0201’,NULL,’0’,’N’);
DJRA Generated SQL for Case Study 3
359
COMMIT; -- Disabled FULLREFRESH by setting DISABLE_REFRESH=1 above.
-- Satisfactory completion at 3:27pm --* --* Calling TABLEREG for source table ITSOSJ.PRODLINE --* --* echo input: TABLEREG SJ390DB1 ITSOSJ PRODLINE AFTER NONEXCLUDED --* DELETEINSERTUPDATE NONE --* -- using SRCESVR.REX as the REXX logic filename -- using REXX password file PASSWORD.REX -- If you don’t see: ’-- now done interpreting...’ -- then check your REXX code. -- now done interpreting REXX password file PASSWORD.REX -- connect to the source-server CONNECT TO SJ390DB1 USER db2res5 USING pwd ; -- USERID=DB2RES5 -- PRDID=DSN0501
SOURCE_ALIAS alias=SJ390DB1 9 Mar 1999 3:27pm
-- 1 candidate registrations, 3 already known to be registered -- The following tables are candidates for registration: -1 table ITSOSJ.PRODLINE
-- registration candidate #1 ITSOSJ.PRODLINE -- ITSOSJ.PRODLINE is assumed a USER table -- reading REXX logic file SRCESVR.REX -- If you don’t see: ’-- now done interpreting...’ -- then check your REXX code. -- in SRCESVR.REX, about to create a change data tablespace CREATE TABLESPACE TSPRODLI IN SJ390DB1 SEGSIZE 4 LOCKSIZE TABLE CLOSE NO CCSID EBCDIC; -- now done interpreting REXX logic file SRCESVR.REX
-- enable change data capture ALTER TABLE ITSOSJ.PRODLINE DATA CAPTURE CHANGES; -- selecting ’X’ as the before-image prefix character -- create the cd/ccd table for ITSOSJ.PRODLINE CREATE TABLE ITSOSJ.CDPRODLINE(IBMSNAP_UOWID CHAR(10) FOR BIT DATA NOT NULL,IBMSNAP_INTENTSEQ CHAR(10) FOR BIT DATA NOT NULL, IBMSNAP_OPERATION CHAR(1) NOT NULL,PROD_LINE_NUM DECIMAL(7 , 0) NOT NULL,DESC CHAR(30) NOT NULL,BRAND_NUM DECIMAL(7 , 0) NOT NULL) IN SJ390DB1.TSPRODLI; -- create the index for the change data table for ITSOSJ.PRODLINE CREATE TYPE 2 UNIQUE INDEX ITSOSJ.CDI00000CDPRODLINE ON ITSOSJ.CDPRODLINE(IBMSNAP_UOWID ASC, IBMSNAP_INTENTSEQ ASC); -- insert a registration record into ASN.IBMSNAP_REGISTER
360
The IBM Data Replication Solution
INSERT INTO ASN.IBMSNAP_REGISTER(GLOBAL_RECORD,SOURCE_OWNER, SOURCE_TABLE,SOURCE_VIEW_QUAL,SOURCE_STRUCTURE,SOURCE_CONDENSED, SOURCE_COMPLETE,CD_OWNER,CD_TABLE,PHYS_CHANGE_OWNER,PHYS_CHANGE_TABLE, DISABLE_REFRESH,ARCH_LEVEL,BEFORE_IMG_PREFIX,CONFLICT_LEVEL, PARTITION_KEYS_CHG) VALUES(’N’,’ITSOSJ’,’PRODLINE’, 0 , 1 ,’Y’,’Y’, ’ITSOSJ’,’CDPRODLINE’,’ITSOSJ’,’CDPRODLINE’, 1 ,’0201’,NULL,’0’,’N’); COMMIT; -- Disabled FULLREFRESH by setting DISABLE_REFRESH=1 above.
-- Satisfactory completion at 3:27pm
E.8 Output from Register the Products View --* File Name: register_products_view.sql --* --* Calling VIEWREG for source table DB2RES5.PRODUCTS --* -- using REXX password file PASSWORD.REX -- If you don’t see: ’-- now done interpreting...’ then check your REXX code -- now done interpreting REXX password file PASSWORD.REX -- input view OWNER=DB2RES5 input view NAME=PRODUCTS -- connect to the source server CONNECT TO SJ390DB1 USER db2res5 USING pwd ; -- USERID=DB2RES5 SOURCE_ALIAS alias=SJ390DB1 9 Mar 1999 3:46pm --* The view definition to be registered=’CREATE VIEW PRODUCTS AS --* SELECT I.ITEM_NUM, SUBSTR(I.DESC,1,40) AS ITEM_DESCRIPTION, --* I.PROD_LINE_NUM, P.DESC AS PRODUCT_LINE_DESC, I.SUPP_NO AS --* SUPPLIER_NUM, P.BRAND_NUM, B.DESC AS BRAND_DESCRIPTION FROM --* ITSOSJ.ITEMS I, ITSOSJ.PRODLINE P, ITSOSJ.BRAND B WHERE --* I.PROD_LINE_NUM=P.PROD_LINE_NUM AND P.BRAND_NUM=B.BRAND_NUM’ --* -- create the change data view for component 1 CREATE VIEW DB2RES5.PRODUCTSA AS SELECT P.IBMSNAP_UOWID, P.IBMSNAP_INTENTSEQ,P.IBMSNAP_OPERATION,I.ITEM_NUM, SUBSTR(I.DESC,1,40) AS ITEM_DESCRIPTION, I.PROD_LINE_NUM, P.DESC AS PRODUCT_LINE_DESC, I.SUPP_NO AS SUPPLIER_NUM, P.BRAND_NUM, B.DESC AS BRAND_DESCRIPTION FROM ITSOSJ.ITEMS I, ITSOSJ.CDPRODLINE P, ITSOSJ.BRAND B WHERE I.PROD_LINE_NUM=P.PROD_LINE_NUM AND P.BRAND_NUM=B.BRAND_NUM; -- register the base and change data views for component 1 INSERT INTO ASN.IBMSNAP_REGISTER(GLOBAL_RECORD,SOURCE_OWNER, SOURCE_TABLE,SOURCE_VIEW_QUAL,SOURCE_STRUCTURE,SOURCE_CONDENSED, SOURCE_COMPLETE,CD_OWNER,CD_TABLE,PHYS_CHANGE_OWNER,PHYS_CHANGE_TABLE, DISABLE_REFRESH,CCD_OWNER,CCD_TABLE,CCD_OLD_SYNCHPOINT,SYNCHPOINT, SYNCHTIME,CCD_CONDENSED,CCD_COMPLETE,ARCH_LEVEL,BEFORE_IMG_PREFIX, CONFLICT_LEVEL,PARTITION_KEYS_CHG) VALUES(’N’,’DB2RES5’,’PRODUCTS’, 1 , 1 ,’Y’,’Y’,’DB2RES5’,’PRODUCTSA’,’ITSOSJ’,’CDPRODLINE’, 1 ,NULL,NULL, NULL,NULL,NULL ,NULL,NULL,’0201’,NULL,’0’,’N’); -- create the change data view for component 2 CREATE VIEW DB2RES5.PRODUCTSB AS SELECT B.IBMSNAP_UOWID, B.IBMSNAP_INTENTSEQ,B.IBMSNAP_OPERATION,I.ITEM_NUM, SUBSTR(I.DESC,1,40) AS ITEM_DESCRIPTION, I.PROD_LINE_NUM, P.DESC AS PRODUCT_LINE_DESC, I.SUPP_NO AS SUPPLIER_NUM, P.BRAND_NUM, B.DESC AS BRAND_DESCRIPTION FROM ITSOSJ.ITEMS I, ITSOSJ.PRODLINE P, ITSOSJ.CDBRAND B WHERE
DJRA Generated SQL for Case Study 3
361
I.PROD_LINE_NUM=P.PROD_LINE_NUM AND
P.BRAND_NUM=B.BRAND_NUM;
-- register the base and change data views for component 2 INSERT INTO ASN.IBMSNAP_REGISTER(GLOBAL_RECORD,SOURCE_OWNER, SOURCE_TABLE,SOURCE_VIEW_QUAL,SOURCE_STRUCTURE,SOURCE_CONDENSED, SOURCE_COMPLETE,CD_OWNER,CD_TABLE,PHYS_CHANGE_OWNER,PHYS_CHANGE_TABLE, DISABLE_REFRESH,CCD_OWNER,CCD_TABLE,CCD_OLD_SYNCHPOINT,SYNCHPOINT, SYNCHTIME,CCD_CONDENSED,CCD_COMPLETE,ARCH_LEVEL,BEFORE_IMG_PREFIX, CONFLICT_LEVEL,PARTITION_KEYS_CHG) VALUES(’N’,’DB2RES5’,’PRODUCTS’, 2 , 1 ,’Y’,’Y’,’DB2RES5’,’PRODUCTSB’,’ITSOSJ’,’CDBRAND’, 1 ,NULL,NULL, NULL,NULL,NULL ,NULL,NULL,’0201’,NULL,’0’,’N’); -- create the change data view for component 3 CREATE VIEW DB2RES5.PRODUCTSC AS SELECT I.IBMSNAP_UOWID, I.IBMSNAP_INTENTSEQ,I.IBMSNAP_OPERATION,I.ITEM_NUM, SUBSTR(I.DESC,1,40) AS ITEM_DESCRIPTION, I.PROD_LINE_NUM, P.DESC AS PRODUCT_LINE_DESC, I.SUPP_NO AS SUPPLIER_NUM, P.BRAND_NUM, B.DESC AS BRAND_DESCRIPTION FROM ITSOSJ.CDITEMS I, ITSOSJ.PRODLINE P, ITSOSJ.BRAND B WHERE I.PROD_LINE_NUM=P.PROD_LINE_NUM AND P.BRAND_NUM=B.BRAND_NUM; -- register the base and change data views for component 3 INSERT INTO ASN.IBMSNAP_REGISTER(GLOBAL_RECORD,SOURCE_OWNER, SOURCE_TABLE,SOURCE_VIEW_QUAL,SOURCE_STRUCTURE,SOURCE_CONDENSED, SOURCE_COMPLETE,CD_OWNER,CD_TABLE,PHYS_CHANGE_OWNER,PHYS_CHANGE_TABLE, DISABLE_REFRESH,CCD_OWNER,CCD_TABLE,CCD_OLD_SYNCHPOINT,SYNCHPOINT, SYNCHTIME,CCD_CONDENSED,CCD_COMPLETE,ARCH_LEVEL,BEFORE_IMG_PREFIX, CONFLICT_LEVEL,PARTITION_KEYS_CHG) VALUES(’N’,’DB2RES5’,’PRODUCTS’, 3 , 1 ,’Y’,’Y’,’DB2RES5’,’PRODUCTSC’,’ITSOSJ’,’CDITEMS’, 1 ,NULL,NULL, NULL,NULL,NULL ,NULL,NULL,’0201’,NULL,’0’,’N’); COMMIT;
E.9 Output from Subscribe to the Products View --* File Name: subs_products.sql --* -- ***************************************************** -- File was edited as follows : -- 1.CREATE TABLE statement changed : --a) CREATE TABLESPACE statement commented out. -b) TABLESPACE changed to Oracle server mapping -AZOVORA8 (nickname will automatically be -- created). --c)EXPIRED_TIMESTAMP column added for temporal histories -- *****************************************************
--* --* Calling C:\DPRTools\addmembr.rex for WHQ1/SALES_SET pair # 2 --* --* Echo input: ADDMEMBR DJDB WHQ1 SALES_SET DB2RES5 PRODUCTS --* NONEEXECLUDED CCD NONE SIMON PRODUCTS NODATAJOINER U --* -- using REXX password file PASSWORD.REX -- If you don’t see: ’-- now done interpreting...’ then check your REXX code -- now done interpreting REXX password file PASSWORD.REX --* Connect to the CNTL_ALIAS --* CONNECT TO DJDB USER djinst5 USING pwd;
362
The IBM Data Replication Solution
--* The ALIAS name ’DJDB ’ matches the RDBNAM ’DJDB’ --* --* Current USERID=DJINST5 CNTL_ALIAS alias= 9 Mar 1999 4:43pm --* Fetching from the ASN.IBMSNAP_SUBS_SET table at DJDB --* --* CONNECTing TO SJ390DB1 USER db2res5 USING pwd ; --* --* The ALIAS name ’SJ390DB1’ maps to RDBNAM ’DB2I ’ --* --* Fetching from the ASN.IBMSNAP_REGISTER table at SJ390DB1 --* -- using REXX logic file CNTLSVR.REX --* --* --* --* --* --*
If you don’t see: ’--* now done interpreting REXX logic file CNTLSVR.REX’, then check your REXX code The subscription predicate was not changed by the user logic in CNTLSVR.REX now done interpreting REXX logic file CNTLSVR.REX
-- create a new row in IBMSNAP_SUBS_MEMBR INSERT INTO ASN.IBMSNAP_SUBS_MEMBR( APPLY_QUAL,SET_NAME,WHOS_ON_FIRST, SOURCE_OWNER,SOURCE_TABLE,SOURCE_VIEW_QUAL,TARGET_OWNER,TARGET_TABLE, TARGET_CONDENSED,TARGET_COMPLETE,TARGET_STRUCTURE,PREDICATES) VALUES ( ’WHQ1’ , ’SALES_SET’ , ’S’ , ’DB2RES5’ , ’PRODUCTS’ , 1 ,’SIMON’, ’PRODUCTS’,’N’,’Y’, 3 , NULL); -- Create a new row in IBMSNAP_SUBS_COLS INSERT INTO ASN.IBMSNAP_SUBS_COLS(APPLY_QUAL,SET_NAME,WHOS_ON_FIRST, TARGET_OWNER,TARGET_TABLE,COL_TYPE,TARGET_NAME,IS_KEY,COLNO,EXPRESSION) VALUES(’WHQ1’,’SALES_SET’ , ’S’,’SIMON’,’PRODUCTS’ ,’A’,’ITEM_NUM’, ’N’, 1 ,’ITEM_NUM’); -- Create a new row in IBMSNAP_SUBS_COLS INSERT INTO ASN.IBMSNAP_SUBS_COLS(APPLY_QUAL,SET_NAME,WHOS_ON_FIRST, TARGET_OWNER,TARGET_TABLE,COL_TYPE,TARGET_NAME,IS_KEY,COLNO,EXPRESSION) VALUES(’WHQ1’,’SALES_SET’ , ’S’,’SIMON’,’PRODUCTS’ ,’A’, ’ITEM_DESCRIPTION’,’N’, 2 ,’ITEM_DESCRIPTION’); -- Create a new row in IBMSNAP_SUBS_COLS INSERT INTO ASN.IBMSNAP_SUBS_COLS(APPLY_QUAL,SET_NAME,WHOS_ON_FIRST, TARGET_OWNER,TARGET_TABLE,COL_TYPE,TARGET_NAME,IS_KEY,COLNO,EXPRESSION) VALUES(’WHQ1’,’SALES_SET’ , ’S’,’SIMON’,’PRODUCTS’ ,’A’, ’PROD_LINE_NUM’,’N’, 3 ,’PROD_LINE_NUM’); -- Create a new row in IBMSNAP_SUBS_COLS INSERT INTO ASN.IBMSNAP_SUBS_COLS(APPLY_QUAL,SET_NAME,WHOS_ON_FIRST, TARGET_OWNER,TARGET_TABLE,COL_TYPE,TARGET_NAME,IS_KEY,COLNO,EXPRESSION) VALUES(’WHQ1’,’SALES_SET’ , ’S’,’SIMON’,’PRODUCTS’ ,’A’, ’PRODUCT_LINE_DESC’,’N’, 4 ,’PRODUCT_LINE_DESC’); -- Create a new row in IBMSNAP_SUBS_COLS INSERT INTO ASN.IBMSNAP_SUBS_COLS(APPLY_QUAL,SET_NAME,WHOS_ON_FIRST, TARGET_OWNER,TARGET_TABLE,COL_TYPE,TARGET_NAME,IS_KEY,COLNO,EXPRESSION) VALUES(’WHQ1’,’SALES_SET’ , ’S’,’SIMON’,’PRODUCTS’ ,’A’, ’SUPPLIER_NUM’,’N’, 5 ,’SUPPLIER_NUM’); -- Create a new row in IBMSNAP_SUBS_COLS INSERT INTO ASN.IBMSNAP_SUBS_COLS(APPLY_QUAL,SET_NAME,WHOS_ON_FIRST, TARGET_OWNER,TARGET_TABLE,COL_TYPE,TARGET_NAME,IS_KEY,COLNO,EXPRESSION) VALUES(’WHQ1’,’SALES_SET’ , ’S’,’SIMON’,’PRODUCTS’ ,’A’, ’BRAND_NUM’,’N’, 6 ,’BRAND_NUM’);
DJRA Generated SQL for Case Study 3
363
-- Create a new row in IBMSNAP_SUBS_COLS INSERT INTO ASN.IBMSNAP_SUBS_COLS(APPLY_QUAL,SET_NAME,WHOS_ON_FIRST, TARGET_OWNER,TARGET_TABLE,COL_TYPE,TARGET_NAME,IS_KEY,COLNO,EXPRESSION) VALUES(’WHQ1’,’SALES_SET’ , ’S’,’SIMON’,’PRODUCTS’ ,’A’, ’BRAND_DESCRIPTION’,’N’, 7 ,’BRAND_DESCRIPTION’); -- create a new row in IBMSNAP_SUBS_MEMBR INSERT INTO ASN.IBMSNAP_SUBS_MEMBR( APPLY_QUAL,SET_NAME,WHOS_ON_FIRST, SOURCE_OWNER,SOURCE_TABLE,SOURCE_VIEW_QUAL,TARGET_OWNER,TARGET_TABLE, TARGET_CONDENSED,TARGET_COMPLETE,TARGET_STRUCTURE,PREDICATES) VALUES ( ’WHQ1’ , ’SALES_SET’ , ’S’ , ’DB2RES5’ , ’PRODUCTS’ , 2 ,’SIMON’, ’PRODUCTS’,’N’,’Y’, 3 , NULL); -- create a new row in IBMSNAP_SUBS_MEMBR INSERT INTO ASN.IBMSNAP_SUBS_MEMBR( APPLY_QUAL,SET_NAME,WHOS_ON_FIRST, SOURCE_OWNER,SOURCE_TABLE,SOURCE_VIEW_QUAL,TARGET_OWNER,TARGET_TABLE, TARGET_CONDENSED,TARGET_COMPLETE,TARGET_STRUCTURE,PREDICATES) VALUES ( ’WHQ1’ , ’SALES_SET’ , ’S’ , ’DB2RES5’ , ’PRODUCTS’ , 3 ,’SIMON’, ’PRODUCTS’,’N’,’Y’, 3 , NULL); --* I noticed the set subscription is inactive --* UPDATE ASN.IBMSNAP_SUBS_SET SET ACTIVATE=1 WHERE APPLY_QUAL=’WHQ1’ AND SET_NAME=’SALES_SET’ AND WHOS_ON_FIRST=’S’; --* Commit work at cntl_ALIAS DJDB --* COMMIT; --* Connect to the SOURCE_ALIAS --* CONNECT TO SJ390DB1 USER db2res5 USING pwd ; --* The ALIAS name ’SJ390DB1’ maps to RDBNAM ’DB2I ’ --* --* record the subscription in the pruning control table at the --* source server --* INSERT INTO ASN.IBMSNAP_PRUNCNTL( TARGET_SERVER,TARGET_OWNER, TARGET_TABLE,SOURCE_OWNER,SOURCE_TABLE,SOURCE_VIEW_QUAL,APPLY_QUAL, SET_NAME,CNTL_SERVER,TARGET_STRUCTURE,CNTL_ALIAS)VALUES(’DJDB’,’SIMON’, ’PRODUCTS’,’DB2RES5’,’PRODUCTS’, 1 ,’WHQ1’,’SALES_SET’,’DJDB’, 3 , ’DJDB’); --* record the subscription in the pruning control table at the --* source server --* INSERT INTO ASN.IBMSNAP_PRUNCNTL( TARGET_SERVER,TARGET_OWNER, TARGET_TABLE,SOURCE_OWNER,SOURCE_TABLE,SOURCE_VIEW_QUAL,APPLY_QUAL, SET_NAME,CNTL_SERVER,TARGET_STRUCTURE,CNTL_ALIAS)VALUES(’DJDB’,’SIMON’, ’PRODUCTS’,’DB2RES5’,’PRODUCTS’, 2 ,’WHQ1’,’SALES_SET’,’DJDB’, 3 , ’DJDB’); --* record the subscription in the pruning control table at the --* source server --* INSERT INTO ASN.IBMSNAP_PRUNCNTL( TARGET_SERVER,TARGET_OWNER, TARGET_TABLE,SOURCE_OWNER,SOURCE_TABLE,SOURCE_VIEW_QUAL,APPLY_QUAL, SET_NAME,CNTL_SERVER,TARGET_STRUCTURE,CNTL_ALIAS)VALUES(’DJDB’,’SIMON’, ’PRODUCTS’,’DB2RES5’,’PRODUCTS’, 3 ,’WHQ1’,’SALES_SET’,’DJDB’, 3 , ’DJDB’);
364
The IBM Data Replication Solution
--* Commit work at source_ALIAS SJ390DB1 --* COMMIT; --* Connect to the TARGET_ALIAS CONNECT TO DJDB USER djinst5 USING pwd; --* The ALIAS name ’DJDB ’ matches the RDBNAM ’DJDB’ --* -- using REXX logic file TARGSVR.REX --* If you don’t see: -- now done interpreting REXX logic file --* TARGSVR.REX, then check your REXX code --* -- in TARGSVR.REX -- About to create a target table tablespace -- CREATE TABLESPACE TSPRODUCTS MANAGED BY DATABASE USING (FILE ’/data/djinst5/djinst5/PRODUCTS.F1’ 2000 ); -- now done interpreting REXX logic file TARGSVR.REX -- The target table does not yet exist -- Not remote to DataJoiner target -- Create the target table SIMON.PRODUCTS CREATE TABLE SIMON.PRODUCTS(ITEM_NUM DECIMAL(13 , 0) NOT NULL, ITEM_DESCRIPTION CHAR(40) NOT NULL,PROD_LINE_NUM DECIMAL(7 , 0) NOT NULL,PRODUCT_LINE_DESC CHAR(30) NOT NULL,SUPPLIER_NUM DECIMAL(13 , 0) NOT NULL,BRAND_NUM DECIMAL(7 , 0) NOT NULL,BRAND_DESCRIPTION CHAR(30) NOT NULL,IBMSNAP_INTENTSEQ CHAR(10) FOR BIT DATA NOT NULL, IBMSNAP_OPERATION CHAR(1) NOT NULL,IBMSNAP_COMMITSEQ CHAR(10) FOR BIT DATA NOT NULL,IBMSNAP_LOGMARKER TIMESTAMP NOT NULL, EXPIRED_TIMESTAMP TIMESTAMP) IN AZOVORA8; --* Commit work at target server DJDB --* COMMIT; --* Satisfactory completion of ADDMEMBR at 4:43pm
E.10 Output from Register the Sales Table --* File Name: register_sales.sql --* --* echo input: TABLEREG SJ390DB1 DB2RES5 SALES AFTER DEPTNO,IN_PRC, --* NO_CUST,SUPPLNO,WGRNO DELETEINSERTUPDATE NONE N --* -- using SRCESVR.REX as the REXX logic filename -- using REXX password file PASSWORD.REX -- If you don’t see: ’-- now done interpreting...’ -- then check your REXX code. -- now done interpreting REXX password file PASSWORD.REX -- connect to the source-server CONNECT TO SJ390DB1 USER db2res5 USING pwd ; -- USERID=DB2RES5 -- PRDID=DSN0501
SOURCE_ALIAS alias=SJ390DB1 17 Mar 1999 3:16pm
DJRA Generated SQL for Case Study 3
365
-- 1 candidate registrations, 12 already known to be registered -- The following tables are candidates for registration: -1 table DB2RES5.SALES
-- registration candidate #1 DB2RES5.SALES -- DB2RES5.SALES is assumed a USER table -- reading REXX logic file SRCESVR.REX -- If you don’t see: ’-- now done interpreting...’ -- then check your REXX code. -- in SRCESVR.REX, about to create a change data tablespace --CREATE TABLESPACE TSSALES -- IN SJ390DB1 SEGSIZE 4 LOCKSIZE TABLE CLOSE NO CCSID EBCDIC; create tablespace TSSALES in sj390db1 SEGSIZE 4 LOCKSIZE TABLE CLOSE NO CCSID EBCDIC USING STOGROUP SJDB1SG2 PRIQTY 180000 SECQTY 5000; -- now done interpreting REXX logic file SRCESVR.REX --* Source table DB2RES5.SALES already has CDC attribute, no need to --* alter. -- selecting ’X’ as the before-image prefix character -- create the cd/ccd table for DB2RES5.SALES CREATE TABLE DB2RES5.CDSALES(IBMSNAP_UOWID CHAR(10) FOR BIT DATA NOT NULL,IBMSNAP_INTENTSEQ CHAR(10) FOR BIT DATA NOT NULL, IBMSNAP_OPERATION CHAR(1) NOT NULL,DATE DATE NOT NULL,BASARTNO DECIMAL( 13 , 0) NOT NULL,LOCATION DECIMAL(4 , 0) NOT NULL,COMPANY DECIMAL(3 , 0) NOT NULL,PIECES DECIMAL(7 , 0) NOT NULL,OUT_PRC DECIMAL(11 , 2) NOT NULL,TAX DECIMAL(11 , 2) NOT NULL,TRANSFER_DATE TIMESTAMP NOT NULL, PROCESS_DATE TIMESTAMP NOT NULL) IN SJ390DB1.TSSALES ; -- create the index for the change data table for DB2RES5.SALES CREATE TYPE 2 UNIQUE INDEX DB2RES5.CDI00000000CDSALES ON DB2RES5.CDSALES(IBMSNAP_UOWID ASC, IBMSNAP_INTENTSEQ ASC); -- insert a registration record into ASN.IBMSNAP_REGISTER INSERT INTO ASN.IBMSNAP_REGISTER(GLOBAL_RECORD,SOURCE_OWNER, SOURCE_TABLE,SOURCE_VIEW_QUAL,SOURCE_STRUCTURE,SOURCE_CONDENSED, SOURCE_COMPLETE,CD_OWNER,CD_TABLE,PHYS_CHANGE_OWNER,PHYS_CHANGE_TABLE, DISABLE_REFRESH,ARCH_LEVEL,BEFORE_IMG_PREFIX,CONFLICT_LEVEL, PARTITION_KEYS_CHG) VALUES(’N’,’DB2RES5’,’SALES’, 0 , 1 ,’Y’,’Y’, ’DB2RES5’,’CDSALES’,’DB2RES5’,’CDSALES’, 1 ,’0201’,NULL,’0’,’N’); COMMIT; -- Satisfactory completion at 3:16pm
E.11 Output from Subscribe to the Sales Table --* File Name: subs_sales.sql --* --* Calling C:\DPRTools\addmembr.rex for WHQ1/SALES_SET pair # 1 --* --* Echo input: ADDMEMBR DJDB WHQ1 SALES_SET DB2RES5 SALES --* NONEEXECLUDED CCD=NYNNN NONE SIMON SALES NODATAJOINER U --* ’IBMSNAP_OPERATION = ’I’’ --* -- using REXX password file PASSWORD.REX
366
The IBM Data Replication Solution
-- If you don’t see: ’-- now done interpreting...’ then check your REXX code -- now done interpreting REXX password file PASSWORD.REX --* Connect to the CNTL_ALIAS --* CONNECT TO DJDB USER djinst5 USING pwd; --* The ALIAS name ’DJDB ’ matches the RDBNAM ’DJDB’ --* --* Current USERID=DJINST5 CNTL_ALIAS alias= 17 Mar 1999 5:55pm --* Fetching from the ASN.IBMSNAP_SUBS_SET table at DJDB --* --* CONNECTing TO SJ390DB1 USER db2res5 USING pwd ; --* --* The ALIAS name ’SJ390DB1’ maps to RDBNAM ’DB2I ’ --* --* Fetching from the ASN.IBMSNAP_REGISTER table at SJ390DB1 --* -- using REXX logic file CNTLSVR.REX --* --* --* --* --* --*
If you don’t see: ’--* now done interpreting REXX logic file CNTLSVR.REX’, then check your REXX code The subscription predicate was not changed by the user logic in CNTLSVR.REX now done interpreting REXX logic file CNTLSVR.REX
-- create a new row in IBMSNAP_SUBS_MEMBR INSERT INTO ASN.IBMSNAP_SUBS_MEMBR( APPLY_QUAL,SET_NAME,WHOS_ON_FIRST, SOURCE_OWNER,SOURCE_TABLE,SOURCE_VIEW_QUAL,TARGET_OWNER,TARGET_TABLE, TARGET_CONDENSED,TARGET_COMPLETE,TARGET_STRUCTURE,PREDICATES) VALUES ( ’WHQ1’ , ’SALES_SET’ , ’S’ , ’DB2RES5’ , ’SALES’ , 0 ,’SIMON’, ’SALES’ ,’N’ ,’Y’ , 3 , ’IBMSNAP_OPERATION = ’’I’’’); -- Create a new row in IBMSNAP_SUBS_COLS INSERT INTO ASN.IBMSNAP_SUBS_COLS(APPLY_QUAL,SET_NAME,WHOS_ON_FIRST, TARGET_OWNER,TARGET_TABLE,COL_TYPE,TARGET_NAME,IS_KEY,COLNO,EXPRESSION) VALUES(’WHQ1’,’SALES_SET’ , ’S’,’SIMON’,’SALES’ ,’A’,’SALE_DATE’,’N’, 1 , ’DATE’); -- Create a new row in IBMSNAP_SUBS_COLS INSERT INTO ASN.IBMSNAP_SUBS_COLS(APPLY_QUAL,SET_NAME,WHOS_ON_FIRST, TARGET_OWNER,TARGET_TABLE,COL_TYPE,TARGET_NAME,IS_KEY,COLNO,EXPRESSION) VALUES(’WHQ1’,’SALES_SET’ , ’S’,’SIMON’,’SALES’ ,’A’,’BASARTNO’,’N’, 2 ,’BASARTNO’); -- Create a new row in IBMSNAP_SUBS_COLS INSERT INTO ASN.IBMSNAP_SUBS_COLS(APPLY_QUAL,SET_NAME,WHOS_ON_FIRST, TARGET_OWNER,TARGET_TABLE,COL_TYPE,TARGET_NAME,IS_KEY,COLNO,EXPRESSION) VALUES(’WHQ1’,’SALES_SET’ , ’S’,’SIMON’,’SALES’ ,’A’,’LOCATION’,’N’, 3 ,’LOCATION’); -- Create a new row in IBMSNAP_SUBS_COLS INSERT INTO ASN.IBMSNAP_SUBS_COLS(APPLY_QUAL,SET_NAME,WHOS_ON_FIRST, TARGET_OWNER,TARGET_TABLE,COL_TYPE,TARGET_NAME,IS_KEY,COLNO,EXPRESSION) VALUES(’WHQ1’,’SALES_SET’ , ’S’,’SIMON’,’SALES’ ,’A’,’COMPANY’,’N’, 4 ,’COMPANY’); -- Create a new row in IBMSNAP_SUBS_COLS INSERT INTO ASN.IBMSNAP_SUBS_COLS(APPLY_QUAL,SET_NAME,WHOS_ON_FIRST, TARGET_OWNER,TARGET_TABLE,COL_TYPE,TARGET_NAME,IS_KEY,COLNO,EXPRESSION) VALUES(’WHQ1’,’SALES_SET’ , ’S’,’SIMON’,’SALES’ ,’A’,’PIECES’,’N’, 5 , ’PIECES’);
DJRA Generated SQL for Case Study 3
367
-- Create a new row in IBMSNAP_SUBS_COLS INSERT INTO ASN.IBMSNAP_SUBS_COLS(APPLY_QUAL,SET_NAME,WHOS_ON_FIRST, TARGET_OWNER,TARGET_TABLE,COL_TYPE,TARGET_NAME,IS_KEY,COLNO,EXPRESSION) VALUES(’WHQ1’,’SALES_SET’ , ’S’,’SIMON’,’SALES’ ,’A’,’OUT_PRC’,’N’, 6 ,’OUT_PRC’); -- Create a new row in IBMSNAP_SUBS_COLS INSERT INTO ASN.IBMSNAP_SUBS_COLS(APPLY_QUAL,SET_NAME,WHOS_ON_FIRST, TARGET_OWNER,TARGET_TABLE,COL_TYPE,TARGET_NAME,IS_KEY,COLNO,EXPRESSION) VALUES(’WHQ1’,’SALES_SET’ , ’S’,’SIMON’,’SALES’ ,’A’,’TAX’,’N’, 7 , ’TAX’); -- Create a new row in IBMSNAP_SUBS_COLS INSERT INTO ASN.IBMSNAP_SUBS_COLS(APPLY_QUAL,SET_NAME,WHOS_ON_FIRST, TARGET_OWNER,TARGET_TABLE,COL_TYPE,TARGET_NAME,IS_KEY,COLNO,EXPRESSION) VALUES(’WHQ1’,’SALES_SET’ , ’S’,’SIMON’,’SALES’ ,’A’,’TRANSFER_DATE’, ’N’, 8 ,’TRANSFER_DATE’); -- Create a new row in IBMSNAP_SUBS_COLS INSERT INTO ASN.IBMSNAP_SUBS_COLS(APPLY_QUAL,SET_NAME,WHOS_ON_FIRST, TARGET_OWNER,TARGET_TABLE,COL_TYPE,TARGET_NAME,IS_KEY,COLNO,EXPRESSION) VALUES(’WHQ1’,’SALES_SET’ , ’S’,’SIMON’,’SALES’ ,’A’,’PROCESS_DATE’, ’N’, 9 ,’PROCESS_DATE’); --* Commit work at cntl_ALIAS DJDB --* COMMIT; --* Connect to the SOURCE_ALIAS --* CONNECT TO SJ390DB1 USER db2res5 USING pwd ; --* The ALIAS name ’SJ390DB1’ maps to RDBNAM ’DB2I ’ --* --* record the subscription in the pruning control table at the --* source server --* INSERT INTO ASN.IBMSNAP_PRUNCNTL( TARGET_SERVER,TARGET_OWNER, TARGET_TABLE,SOURCE_OWNER,SOURCE_TABLE,SOURCE_VIEW_QUAL,APPLY_QUAL, SET_NAME,CNTL_SERVER,TARGET_STRUCTURE,CNTL_ALIAS)VALUES(’DJDB’,’SIMON’, ’SALES’,’DB2RES5’,’SALES’, 0 ,’WHQ1’,’SALES_SET’,’DJDB’, 3 ,’DJDB’); --* Commit work at source_ALIAS SJ390DB1 --* COMMIT; --* Connect to the TARGET_ALIAS CONNECT TO DJDB USER djinst5 USING pwd; --* The ALIAS name ’DJDB ’ matches the RDBNAM ’DJDB’ --* -- using REXX logic file TARGSVR.REX --* If you don’t see: -- now done interpreting REXX logic file --* TARGSVR.REX, then check your REXX code --* -- in TARGSVR.REX -- About to create a target table tablespace -- CREATE TABLESPACE TSSALES MANAGED BY DATABASE USING (FILE ’/data/djinst5/djinst5/SALES.F1’ 2000 ); -- now done interpreting REXX logic file TARGSVR.REX
368
The IBM Data Replication Solution
-- The target table does not yet exist -- Not remote to DataJoiner target -- Create the target table SIMON.SALES CREATE TABLE SIMON.SALES(SALE_DATE DATE NOT NULL,BASARTNO DECIMAL(13 , 0) NOT NULL,LOCATION DECIMAL(4 , 0) NOT NULL,COMPANY DECIMAL(3 , 0) NOT NULL,PIECES DECIMAL(7 , 0) NOT NULL,OUT_PRC DECIMAL(11 , 2) NOT NULL, TAX DECIMAL(11 , 2) NOT NULL,TRANSFER_DATE TIMESTAMP NOT NULL, PROCESS_DATE TIMESTAMP NOT NULL,IBMSNAP_INTENTSEQ CHAR(10) FOR BIT DATA NOT NULL,IBMSNAP_OPERATION CHAR(1) NOT NULL,IBMSNAP_COMMITSEQ CHAR(10) FOR BIT DATA NOT NULL,IBMSNAP_LOGMARKER TIMESTAMP NOT NULL) IN AZOVORA8 REMOTE OPTION ’TABLESPACE BIGTS’; --* Commit work at target server DJDB --* COMMIT; --* Satisfactory completion of ADDMEMBR at 5:55pm
E.12 SQL After to Support Temporal Histories for Supplier Table --* File Name: sqlafter_supplier.sql --* --* --* Calling ADDSTMT for WHQ1/SALES_SET pair # 1 --* --* echo input: ADDSTMT DJDB WHQ1 SALES_SET S A 1 E 0000002000 UPDATE --* SIMON.SUPPLIER A SET EXPIRED_TIMESTAMP= (SELECT MIN( --* IBMSNAP_LOGMARKER) FROM SIMON.SUPPLIER B WHERE --* A.SUPP_NO=B.SUPP_NO AND A.EXPIRED_TIMESTAMP IS NULL AND --* B.EXPIRED_TIMESTAMP IS NULL AND (B.IBMSNAP_INTENTSEQ > --* A.IBMSNAP_INTENTSEQ)) WHERE A.EXPIRED_TIMESTAMP IS NULL --* -- using REXX password file PASSWORD.REX -- If you don’t see: ’-- now done interpreting...’ then check your REXX code -- now done interpreting REXX password file PASSWORD.REX --* connect to the CNTL_ALIAS --* CONNECT TO DJDB USER djinst5 USING pwd; --* The ALIAS name ’DJDB ’ matches the RDBNAM ’DJDB’ --* -- current USERID=DJINST5 CNTL_ALIAS alias= 17 Mar 1999 2:24pm --* Fetching from the ASN.IBMSNAP_SUBS_SET table at DJDB --* --* Fetching from the ASN.IBMSNAP_SUBS_STMTS table at DJDB --* --* There is no conflicting entry in the ASN.IBMSNAP_SUBS_STMTS table, --* so Ok to add. --* --* CONNECTing TO DJDB USER djinst5 USING pwd; --* --* The SQL_STMT: UPDATE SIMON.SUPPLIER A SET EXPIRED_TIMESTAMP= ( --* SELECT MIN(IBMSNAP_LOGMARKER) FROM SIMON.SUPPLIER B WHERE --* A.SUPP_NO=B.SUPP_NO AND A.EXPIRED_TIMESTAMP IS NULL AND --* B.EXPIRED_TIMESTAMP IS NULL AND (B.IBMSNAP_INTENTSEQ > --* A.IBMSNAP_INTENTSEQ)) WHERE A.EXPIRED_TIMESTAMP IS NULL; passed --* validation at the server with alias name DJDB . --* -- create a new row in IBMSNAP_SUBS_STMTS
DJRA Generated SQL for Case Study 3
369
INSERT INTO ASN.IBMSNAP_SUBS_STMTS(APPLY_QUAL,SET_NAME,WHOS_ON_FIRST, BEFORE_OR_AFTER,STMT_NUMBER,EI_OR_CALL,SQL_STMT,ACCEPT_SQLSTATES) VALUES(’WHQ1’,’SALES_SET’,’S’,’A’, 1 ,’E’,’UPDATE SIMON.SUPPLIER A SET EXPIRED_TIMESTAMP= (SELECT MIN(IBMSNAP_LOGMARKER) FROM SIMON.SUPPLIER B WHERE A.SUPP_NO=B.SUPP_NO AND A.EXPIRED_TIMESTAMP IS NULL AND B.EXPIRED_TIMESTAMP IS NULL AND ( B.IBMSNAP_INTENTSEQ > A.IBMSNAP_INTENTSEQ)) WHERE A.EXPIRED_TIMESTAMP IS NULL’,’0000002000’); -- increment the AUX_STMTS counter in IBMSNAP_SUBS_SET UPDATE ASN.IBMSNAP_SUBS_SET SET AUX_STMTS=AUX_STMTS + 1 WHERE APPLY_QUAL=’WHQ1’ AND SET_NAME=’SALES_SET’ AND WHOS_ON_FIRST=’S’; --* commit work at cntl_ALIAS DJDB --* COMMIT; -- Satisfactory completion of ADDSTMT at 2:24pm
E.13 Maintain Base Aggregate Table from Change Aggregate Subscription ----* File Name: aggregat.sql ---- The following SQL will be used to create a summary table from -- the SALES table : -SELECT company, location, sum(pieces), sum(out_prc) -FROM sales GROUP BY company,location --- This will be maintained initially from a base aggregate -- (for full refresh only), and subsequently from a change -- aggregate subscription. --- Be sure to register the SALES table with XPIECES and XOUT_PRC -- as a before-image columns. If the table has already been -- registered, you can re-register the table using DJRA. -- Alternatively, you can uncomment the following SQL which adds -- the after-image columns to the SALES CD table, and updates -- the register table with information on the before image -- prefix. --- ALTER TABLE DB2RES5.CDSALES ADD XPIECES DECIMAL(7,0); -- ALTER TABLE DB2RES5.CDSALES ADD XOUT_PRC DECIMAL(11,2); -- Change register record for Sales by updating BEFORE_IMG_PREFIX -- UPDATE ASN.IBMSNAP_REGISTER SET BEFORE_IMG_PREFIX=’X’ WHERE -- SOURCE_OWNER=’DB2RES5’ AND SOURCE_TABLE=’SALES’; --- COMMIT ---- Connect to the DataJoiner database and create the -- MOVEMENT and AGGREGATE tables. -CONNECT TO DJDB USER djinst5 ; --- Create the Movement table. This needs to be similar to the -- AGGREGATES table, but has an additional IBMSNAP_OPERATION -- column. The table uses DataJoiner’s DDL transparency feature -- to create the table in Oracle and also create a nickname
370
The IBM Data Replication Solution
-- for it. -CREATE TABLE SIMON.MOVEMENT( COMPANY INTEGER NOT NULL, LOCATION INTEGER NOT NULL, DIFFERENCE_PIECES INTEGER NOT NULL, DIFFERENCE_OUTPRC DECIMAL(20,2) NOT NULL, IBMSNAP_OPERATION CHAR(1) NOT NULL, IBMSNAP_HLOGMARKER TIMESTAMP NOT NULL, IBMSNAP_LLOGMARKER TIMESTAMP NOT NULL) IN AZOVORA8; --- Note: The IBMSNAP_OPERATION column above is not referenced -- when updating the SIMON.AGGREGATES table. Rather, having -- the value in the SIMON.MOVEMENT table more clearly shows -- the intermediate aggregations. --- Create an index for the SIMON.MOVEMENT on the source -- columns which are used for GROUPING. -SET PASSTHRU AZOVORA8; CREATE INDEX MOVEMENTX ON SIMON.MOVEMENT(COMPANY,LOCATION ASC); SET PASSTHRU RESET; --- Now update the DJ global catalog so it is aware of the -- index just created. -CREATE INDEX MOVEMENTX ON SIMON.MOVEMENT(COMPANY,LOCATION ASC); --- Create the AGGREGATES table. This is the table which will -- hold the final summary information. Again, the table is created -- in Oracle using DataJoiner’s DDL transparency. -CREATE TABLE SIMON.AGGREGATES( COMPANY INTEGER NOT NULL, LOCATION INTEGER NOT NULL, SUM_PIECES INTEGER, SUM_OUTPRC DECIMAL(20,2), IBMSNAP_HLOGMARKER TIMESTAMP NOT NULL, IBMSNAP_LLOGMARKER TIMESTAMP NOT NULL) IN AZOVORA8; --- Create an index for the target SIMON.AGGREGATES table -- on the columns used for GROUPING. -SET PASSTHRU AZOVORA8; CREATE UNIQUE INDEX AGGREGATESX ON SIMON.AGGREGATES(COMPANY,LOCATION ASC); SET PASSTHRU RESET; --- Now update the DJ global catalog with information that -- this index exists. -CREATE UNIQUE INDEX AGGREGATESX ON SIMON.AGGREGATES(COMPANY,LOCATION ASC); ---- As a ’union’ must be simulated, and DB2 for OS/390 disallows -- unions in views, multiple subscription members are be used, -- requiring target views to differentiate the separate members -- copying insert, update and delete operations. -- The views are created in DataJoiner, over the nickname -- for MOVEMENT table. --- For INSERTs.... CREATE VIEW SIMON.MOVEMENTI AS SELECT * FROM SIMON.MOVEMENT; --
DJRA Generated SQL for Case Study 3
371
-- For UPDATEs.... CREATE VIEW SIMON.MOVEMENTU AS SELECT * FROM SIMON.MOVEMENT; --- For DELETEs.... CREATE VIEW SIMON.MOVEMENTD AS SELECT * FROM SIMON.MOVEMENT; ----- Create a new row in IBMSNAP_SUBS_SET for the base aggregate -- "AGGREGATES" subscription - BASEAGG_SET. -INSERT INTO ASN.IBMSNAP_SUBS_SET( ACTIVATE,APPLY_QUAL,SET_NAME, WHOS_ON_FIRST,SOURCE_SERVER,SOURCE_ALIAS,TARGET_SERVER,TARGET_ALIAS, STATUS,LASTRUN,REFRESH_TIMING,SLEEP_MINUTES,EVENT_NAME, MAX_SYNCH_MINUTES,AUX_STMTS,ARCH_LEVEL) VALUES (1 , ’WHQAGG’ , ’BASEAGG_SET’ , ’S’ , ’SJ390DB1’ , ’SJ390DB1’ , ’DJDB’ , ’DJDB’ , 0 , ’1999-01-05-19.19.00’ , ’R’ , 1 ,NULL , 15 , 0 ,’0201’); --- Create a new row in IBMSNAP_SUBS_SET for the change aggregate -- "MOVEMENT" subscription - CHGAGG_SET. -INSERT INTO ASN.IBMSNAP_SUBS_SET( ACTIVATE,APPLY_QUAL,SET_NAME, WHOS_ON_FIRST,SOURCE_SERVER,SOURCE_ALIAS,TARGET_SERVER,TARGET_ALIAS, STATUS,LASTRUN,REFRESH_TIMING,SLEEP_MINUTES,EVENT_NAME, MAX_SYNCH_MINUTES,AUX_STMTS,ARCH_LEVEL) VALUES (0 , ’WHQAGG’ , ’CHGAGG_SET’ , ’S’ , ’SJ390DB1’ , ’SJ390DB1’ , ’DJDB’ , ’DJDB’ , 0 , ’1999-01-05-19.19.00’ , ’R’ , 1 ,NULL , 15 , 0 ,’0201’); ---- Create a new row in IBMSNAP_SUBS_MEMBR for the BASE aggregate. -INSERT INTO ASN.IBMSNAP_SUBS_MEMBR( APPLY_QUAL,SET_NAME,WHOS_ON_FIRST, SOURCE_OWNER,SOURCE_TABLE,SOURCE_VIEW_QUAL,TARGET_OWNER,TARGET_TABLE, TARGET_CONDENSED,TARGET_COMPLETE,TARGET_STRUCTURE,PREDICATES) VALUES ( ’WHQAGG’ , ’BASEAGG_SET’ , ’S’ , ’DB2RES5’ , ’SALES’ , 0 , ’SIMON’,’AGGREGATES’,’A’,’N’, 5 ,’1=1 GROUP BY COMPANY,LOCATION’); --- The dummy predicate above, 1=1, can be substituted with a real -- filtering predicate. The aggregate subscription requires a -- predicate of some kind preceding the GROUP BY clause. ---- Create a new row in IBMSNAP_SUBS_COLS for the BASEAGG_SET aggregate INSERT INTO ASN.IBMSNAP_SUBS_COLS(APPLY_QUAL,SET_NAME,WHOS_ON_FIRST, TARGET_OWNER,TARGET_TABLE,COL_TYPE,TARGET_NAME,IS_KEY,COLNO,EXPRESSION) VALUES(’WHQAGG’,’BASEAGG_SET’ , ’S’,’SIMON’,’AGGREGATES’ ,’A’, ’COMPANY’,’N’, 1 ,’COMPANY’); --- Create a new row in IBMSNAP_SUBS_COLS for the BASEAGG_SET aggregate INSERT INTO ASN.IBMSNAP_SUBS_COLS(APPLY_QUAL,SET_NAME,WHOS_ON_FIRST, TARGET_OWNER,TARGET_TABLE,COL_TYPE,TARGET_NAME,IS_KEY,COLNO,EXPRESSION) VALUES(’WHQAGG’,’BASEAGG_SET’ , ’S’,’SIMON’,’AGGREGATES’ ,’A’, ’LOCATION’,’N’, 2 ,’LOCATION’); --- Create a new row in IBMSNAP_SUBS_COLS for the BASEAGG_SET aggregate INSERT INTO ASN.IBMSNAP_SUBS_COLS(APPLY_QUAL,SET_NAME,WHOS_ON_FIRST, TARGET_OWNER,TARGET_TABLE,COL_TYPE,TARGET_NAME,IS_KEY,COLNO,EXPRESSION) VALUES(’WHQAGG’,’BASEAGG_SET’ , ’S’,’SIMON’,’AGGREGATES’ ,’F’, ’SUM_PIECES’,’N’, 3 ,’SUM(PIECES)’); --- Create a new row in IBMSNAP_SUBS_COLS for the BASEAGG_SET aggregate INSERT INTO ASN.IBMSNAP_SUBS_COLS(APPLY_QUAL,SET_NAME,WHOS_ON_FIRST, TARGET_OWNER,TARGET_TABLE,COL_TYPE,TARGET_NAME,IS_KEY,COLNO,EXPRESSION)
372
The IBM Data Replication Solution
VALUES(’WHQAGG’,’BASEAGG_SET’ , ’S’,’SIMON’,’AGGREGATES’ ,’F’, ’SUM_OUTPRC’,’N’, 4 ,’SUM(OUT_PRC)’); ---- First of all, enable Full Refresh for the base table so -- that the initial load of the base_aggregate can take place. -- Full refresh will be disabled after the initial full refresh -- has completed. -- This is necessary, because the SALES table has Full Refresh -- disabled. If Full Refresh is enabled for the source table, -- the next two INSERTs into SUBS_STMTS can be commented out. --- ’G’ in BEFORE_OR_AFTER means execute this SQL before reading -- the REGISTER table. -- ’S’ in BEFORE_OR_AFTER means execute this SQL before reading -- the CD table. -INSERT INTO ASN.IBMSNAP_SUBS_STMTS(APPLY_QUAL,SET_NAME,WHOS_ON_FIRST, BEFORE_OR_AFTER,STMT_NUMBER,EI_OR_CALL,SQL_STMT,ACCEPT_SQLSTATES) VALUES(’WHQAGG’,’BASEAGG_SET’,’S’,’G’, 1 ,’E’, ’UPDATE ASN.IBMSNAP_REGISTER SET DISABLE_REFRESH=0 WHERE SOURCE_OWNER=’’DB2RES5’’ AND SOURCE_TABLE=’’SALES’’ ’,’0000002000’); --- Now disable full refresh -INSERT INTO ASN.IBMSNAP_SUBS_STMTS(APPLY_QUAL,SET_NAME,WHOS_ON_FIRST, BEFORE_OR_AFTER,STMT_NUMBER,EI_OR_CALL,SQL_STMT,ACCEPT_SQLSTATES) VALUES(’WHQAGG’,’BASEAGG_SET’,’S’,’S’, 2 ,’E’, ’UPDATE ASN.IBMSNAP_REGISTER SET DISABLE_REFRESH=1 WHERE SOURCE_OWNER=’’DB2RES5’’ AND SOURCE_TABLE=’’SALES’’ ’,’0000002000’); ---- Add an SQL-before statement to remove all rows from the AGGREGATES -- table, just in case this is a re-run and the table is not empty. --- COMMIT SQL statements are added to avoid a DUOW error. These are -- probably only required for non-IBM targets. --- COMMIT INSERT INTO ASN.IBMSNAP_SUBS_STMTS(APPLY_QUAL,SET_NAME,WHOS_ON_FIRST, BEFORE_OR_AFTER,STMT_NUMBER,EI_OR_CALL,SQL_STMT,ACCEPT_SQLSTATES) VALUES(’WHQAGG’,’BASEAGG_SET’,’S’,’B’, 3,’E’, ’COMMIT’,’0000002000’); --- Clear out the AGGREGATES table INSERT INTO ASN.IBMSNAP_SUBS_STMTS(APPLY_QUAL,SET_NAME,WHOS_ON_FIRST, BEFORE_OR_AFTER,STMT_NUMBER,EI_OR_CALL,SQL_STMT,ACCEPT_SQLSTATES) VALUES(’WHQAGG’,’BASEAGG_SET’,’S’,’B’, 4 ,’E’, ’DELETE FROM SIMON.AGGREGATES’,’0000002000’); --- and another COMMIT INSERT INTO ASN.IBMSNAP_SUBS_STMTS(APPLY_QUAL,SET_NAME,WHOS_ON_FIRST, BEFORE_OR_AFTER,STMT_NUMBER,EI_OR_CALL,SQL_STMT,ACCEPT_SQLSTATES) VALUES(’WHQAGG’,’BASEAGG_SET’,’S’,’A’, 5,’E’, ’COMMIT’,’0000002000’); ---- Now activate the change aggregate set CHGAGG_SET which will -- take over the subscription once the initial full refresh -- has taken place. -INSERT INTO ASN.IBMSNAP_SUBS_STMTS(APPLY_QUAL,SET_NAME,WHOS_ON_FIRST, BEFORE_OR_AFTER,STMT_NUMBER,EI_OR_CALL,SQL_STMT)
DJRA Generated SQL for Case Study 3
373
VALUES(’WHQAGG’,’BASEAGG_SET’,’S’,’A’, 6 ,’E’,’UPDATE ASN.IBMSNAP_SUBS_SET SET ACTIVATE=1 WHERE APPLY_QUAL=’’WHQAGG’’ AND SET_NAME=’’CHGAGG_SET’’ AND WHOS_ON_FIRST=’’S’’’); --- COMMIT..... INSERT INTO ASN.IBMSNAP_SUBS_STMTS(APPLY_QUAL,SET_NAME,WHOS_ON_FIRST, BEFORE_OR_AFTER,STMT_NUMBER,EI_OR_CALL,SQL_STMT,ACCEPT_SQLSTATES) VALUES(’WHQAGG’,’BASEAGG_SET’,’S’,’B’, 7,’E’, ’COMMIT’,’0000002000’); ----- Add an SQL-after statement to turn off the AGGREGATES subscription -- once it has completed successfully. -- Attach this to the MOVEMENT subscription, so that the AGGREGATES -- subscription will not be self-modifying. -- Older levels of Apply code did not allow a subscription to modify -- its own ACTIVATE value. --- COMMIT first of all to avoid DUOW problems.... INSERT INTO ASN.IBMSNAP_SUBS_STMTS(APPLY_QUAL,SET_NAME,WHOS_ON_FIRST, BEFORE_OR_AFTER,STMT_NUMBER,EI_OR_CALL,SQL_STMT) VALUES(’WHQAGG’,’CHGAGG_SET’,’S’,’A’, 1 ,’E’, ’COMMIT’); -INSERT INTO ASN.IBMSNAP_SUBS_STMTS(APPLY_QUAL,SET_NAME,WHOS_ON_FIRST, BEFORE_OR_AFTER,STMT_NUMBER,EI_OR_CALL,SQL_STMT) VALUES(’WHQAGG’,’CHGAGG_SET’,’S’,’A’, 2 ,’E’,’UPDATE ASN.IBMSNAP_SUBS_SET SET ACTIVATE=0 WHERE APPLY_QUAL=’’WHQAGG’’ AND SET_NAME=’’BASEAGG_SET’’ AND WHOS_ON_FIRST=’’S’’’); --- Increment the AUX_STMTS counter in IBMSNAP_SUBS_SET -- for both the BASEAGG_SET and the CHGAGG_SET. -UPDATE ASN.IBMSNAP_SUBS_SET SET AUX_STMTS=AUX_STMTS + 7 WHERE APPLY_QUAL=’WHQAGG’ AND SET_NAME=’BASEAGG_SET’ AND WHOS_ON_FIRST=’S’; -UPDATE ASN.IBMSNAP_SUBS_SET SET AUX_STMTS=AUX_STMTS + 2 WHERE APPLY_QUAL=’WHQAGG’ AND SET_NAME=’CHGAGG_SET’ AND WHOS_ON_FIRST=’S’; ----- Create a new row in IBMSNAP_SUBS_MEMBR to fetch insert -- operations from the source CD table into view SIMON.MOVEMENTI. -- This member just fetches the aggregated INSERTs. -INSERT INTO ASN.IBMSNAP_SUBS_MEMBR( APPLY_QUAL,SET_NAME,WHOS_ON_FIRST, SOURCE_OWNER,SOURCE_TABLE,SOURCE_VIEW_QUAL,TARGET_OWNER,TARGET_TABLE, TARGET_CONDENSED,TARGET_COMPLETE,TARGET_STRUCTURE,PREDICATES) VALUES ( ’WHQAGG’ , ’CHGAGG_SET’ , ’S’ , ’DB2RES5’ , ’SALES’ , 0 , ’SIMON’,’MOVEMENTI’,’A’,’N’, 6 , ’IBMSNAP_OPERATION=’’I’’ GROUP BY COMPANY,LOCATION,IBMSNAP_OPERATION’); --- Now add the columns for this subscription member to the -- SUBS_COLS table. --- Create a new row in IBMSNAP_SUBS_COLS for MOVEMENTI INSERT INTO ASN.IBMSNAP_SUBS_COLS(APPLY_QUAL,SET_NAME,WHOS_ON_FIRST, TARGET_OWNER,TARGET_TABLE,COL_TYPE,TARGET_NAME,IS_KEY,COLNO,EXPRESSION) VALUES(’WHQAGG’,’CHGAGG_SET’ , ’S’,’SIMON’,’MOVEMENTI’ ,’A’, ’COMPANY’,’N’, 1 ,’COMPANY’);
374
The IBM Data Replication Solution
--- Create a new row in IBMSNAP_SUBS_COLS for MOVEMENTI INSERT INTO ASN.IBMSNAP_SUBS_COLS(APPLY_QUAL,SET_NAME,WHOS_ON_FIRST, TARGET_OWNER,TARGET_TABLE,COL_TYPE,TARGET_NAME,IS_KEY,COLNO,EXPRESSION) VALUES(’WHQAGG’,’CHGAGG_SET’ , ’S’,’SIMON’,’MOVEMENTI’ ,’A’, ’LOCATION’,’N’, 2 ,’LOCATION’); --- Create a new row in IBMSNAP_SUBS_COLS for MOVEMENTI INSERT INTO ASN.IBMSNAP_SUBS_COLS(APPLY_QUAL,SET_NAME,WHOS_ON_FIRST, TARGET_OWNER,TARGET_TABLE,COL_TYPE,TARGET_NAME,IS_KEY,COLNO,EXPRESSION) VALUES(’WHQAGG’,’CHGAGG_SET’ , ’S’,’SIMON’,’MOVEMENTI’ ,’F’, ’DIFFERENCE_PIECES’,’N’, 3 ,’SUM(PIECES)’); --- Create a new row in IBMSNAP_SUBS_COLS for MOVEMENTI INSERT INTO ASN.IBMSNAP_SUBS_COLS(APPLY_QUAL,SET_NAME,WHOS_ON_FIRST, TARGET_OWNER,TARGET_TABLE,COL_TYPE,TARGET_NAME,IS_KEY,COLNO,EXPRESSION) VALUES(’WHQAGG’,’CHGAGG_SET’ , ’S’,’SIMON’,’MOVEMENTI’ ,’F’, ’DIFFERENCE_OUTPRC’,’N’, 4 ,’SUM(OUT_PRC)’); --- Create a new row in IBMSNAP_SUBS_COLS for MOVEMENTI INSERT INTO ASN.IBMSNAP_SUBS_COLS(APPLY_QUAL,SET_NAME,WHOS_ON_FIRST, TARGET_OWNER,TARGET_TABLE,COL_TYPE,TARGET_NAME,IS_KEY,COLNO,EXPRESSION) VALUES(’WHQAGG’,’CHGAGG_SET’ , ’S’,’SIMON’,’MOVEMENTI’ ,’A’, ’IBMSNAP_OPERATION’,’N’, 5 ,’IBMSNAP_OPERATION’); ----- Create a new row in IBMSNAP_SUBS_MEMBR to fetch UPDATE -- operations from the source CD table into view SIMON.MOVEMENTU. -- This member just fetches the aggregated UPDATEs. -INSERT INTO ASN.IBMSNAP_SUBS_MEMBR( APPLY_QUAL,SET_NAME,WHOS_ON_FIRST, SOURCE_OWNER,SOURCE_TABLE,SOURCE_VIEW_QUAL,TARGET_OWNER,TARGET_TABLE, TARGET_CONDENSED,TARGET_COMPLETE,TARGET_STRUCTURE,PREDICATES) VALUES ( ’WHQAGG’ , ’CHGAGG_SET’ , ’S’ , ’DB2RES5’ , ’SALES’ , 0 , ’SIMON’,’MOVEMENTU’,’A’,’N’, 6 , ’IBMSNAP_OPERATION=’’U’’ GROUP BY COMPANY,LOCATION,IBMSNAP_OPERATION’); --- Create a new row in IBMSNAP_SUBS_COLS for MOVEMENTU INSERT INTO ASN.IBMSNAP_SUBS_COLS(APPLY_QUAL,SET_NAME,WHOS_ON_FIRST, TARGET_OWNER,TARGET_TABLE,COL_TYPE,TARGET_NAME,IS_KEY,COLNO,EXPRESSION) VALUES(’WHQAGG’,’CHGAGG_SET’ , ’S’,’SIMON’,’MOVEMENTU’ ,’A’, ’COMPANY’,’N’, 1 ,’COMPANY’); --- Create a new row in IBMSNAP_SUBS_COLS for MOVEMENTU INSERT INTO ASN.IBMSNAP_SUBS_COLS(APPLY_QUAL,SET_NAME,WHOS_ON_FIRST, TARGET_OWNER,TARGET_TABLE,COL_TYPE,TARGET_NAME,IS_KEY,COLNO,EXPRESSION) VALUES(’WHQAGG’,’CHGAGG_SET’ , ’S’,’SIMON’,’MOVEMENTU’ ,’A’, ’LOCATION’,’N’, 2 ,’LOCATION’); --- Create a new row in IBMSNAP_SUBS_COLS for MOVEMENTU -- Because this is an update, DIFFERENCE_PIECES is calculated by -- substracting the before-image sum from the after-image sum. INSERT INTO ASN.IBMSNAP_SUBS_COLS(APPLY_QUAL,SET_NAME,WHOS_ON_FIRST, TARGET_OWNER,TARGET_TABLE,COL_TYPE,TARGET_NAME,IS_KEY,COLNO,EXPRESSION) VALUES(’WHQAGG’,’CHGAGG_SET’ , ’S’,’SIMON’,’MOVEMENTU’ ,’C’, ’DIFFERENCE_PIECES’,’N’, 3 ,’SUM(PIECES)-SUM(XPIECES)’); --- Create a new row in IBMSNAP_SUBS_COLS for MOVEMENTU -- Because this is an update, DIFFERENCE_OUTPRC is calculated by -- substracting the before-image sum from the after-image sum. INSERT INTO ASN.IBMSNAP_SUBS_COLS(APPLY_QUAL,SET_NAME,WHOS_ON_FIRST, TARGET_OWNER,TARGET_TABLE,COL_TYPE,TARGET_NAME,IS_KEY,COLNO,EXPRESSION)
DJRA Generated SQL for Case Study 3
375
VALUES(’WHQAGG’,’CHGAGG_SET’ , ’S’,’SIMON’,’MOVEMENTU’ ,’F’, ’DIFFERENCE_OUTPRC’,’N’, 4 ,’SUM(OUT_PRC)-SUM(XOUT_PRC)’); --- Create a new row in IBMSNAP_SUBS_COLS for MOVEMENTU INSERT INTO ASN.IBMSNAP_SUBS_COLS(APPLY_QUAL,SET_NAME,WHOS_ON_FIRST, TARGET_OWNER,TARGET_TABLE,COL_TYPE,TARGET_NAME,IS_KEY,COLNO,EXPRESSION) VALUES(’WHQAGG’,’CHGAGG_SET’ , ’S’,’SIMON’,’MOVEMENTU’ ,’A’, ’IBMSNAP_OPERATION’,’N’, 5 ,’IBMSNAP_OPERATION’); ----- Create a new row in IBMSNAP_SUBS_MEMBR to fetch DELETE -- operations from the source CD table into view SIMON.MOVEMENTD. -- This member just fetches the aggregated DELETEs. -INSERT INTO ASN.IBMSNAP_SUBS_MEMBR( APPLY_QUAL,SET_NAME,WHOS_ON_FIRST, SOURCE_OWNER,SOURCE_TABLE,SOURCE_VIEW_QUAL,TARGET_OWNER,TARGET_TABLE, TARGET_CONDENSED,TARGET_COMPLETE,TARGET_STRUCTURE,PREDICATES) VALUES ( ’WHQAGG’ , ’CHGAGG_SET’ , ’S’ , ’DB2RES5’ , ’SALES’ , 0 , ’SIMON’,’MOVEMENTD’,’A’,’N’, 6 , ’IBMSNAP_OPERATION=’’D’’ GROUP BY COMPANY,LOCATION,IBMSNAP_OPERATION’); --- Create a new row in IBMSNAP_SUBS_COLS for MOVEMENTD INSERT INTO ASN.IBMSNAP_SUBS_COLS(APPLY_QUAL,SET_NAME,WHOS_ON_FIRST, TARGET_OWNER,TARGET_TABLE,COL_TYPE,TARGET_NAME,IS_KEY,COLNO,EXPRESSION) VALUES(’WHQAGG’,’CHGAGG_SET’ , ’S’,’SIMON’,’MOVEMENTD’ ,’A’, ’COMPANY’,’N’, 1 ,’COMPANY’); --- Create a new row in IBMSNAP_SUBS_COLS for MOVEMENTD INSERT INTO ASN.IBMSNAP_SUBS_COLS(APPLY_QUAL,SET_NAME,WHOS_ON_FIRST, TARGET_OWNER,TARGET_TABLE,COL_TYPE,TARGET_NAME,IS_KEY,COLNO,EXPRESSION) VALUES(’WHQAGG’,’CHGAGG_SET’ , ’S’,’SIMON’,’MOVEMENTD’ ,’A’, ’LOCATION’,’N’, 2 ,’LOCATION’); --- Create a new row in IBMSNAP_SUBS_COLS for MOVEMENTD -- The PIECES value is negated before going into the -- MOVEMENT table (since the value has been deleted) and -- must be subtracted from the sum. INSERT INTO ASN.IBMSNAP_SUBS_COLS(APPLY_QUAL,SET_NAME,WHOS_ON_FIRST, TARGET_OWNER,TARGET_TABLE,COL_TYPE,TARGET_NAME,IS_KEY,COLNO,EXPRESSION) VALUES(’WHQAGG’,’CHGAGG_SET’ , ’S’,’SIMON’,’MOVEMENTD’ ,’F’, ’DIFFERENCE_PIECES’,’N’, 3 ,’-SUM(PIECES)’); --- Create a new row in IBMSNAP_SUBS_COLS for MOVEMENTD -- The OUT_PRC value is negated before going into the -- MOVEMENT table (since the value has been deleted) and -- must be subtracted from the sum. INSERT INTO ASN.IBMSNAP_SUBS_COLS(APPLY_QUAL,SET_NAME,WHOS_ON_FIRST, TARGET_OWNER,TARGET_TABLE,COL_TYPE,TARGET_NAME,IS_KEY,COLNO,EXPRESSION) VALUES(’WHQAGG’,’CHGAGG_SET’ , ’S’,’SIMON’,’MOVEMENTD’ ,’F’, ’DIFFERENCE_OUTPRC’,’N’, 4 ,’-SUM(OUT_PRC)’); --- Create a new row in IBMSNAP_SUBS_COLS for MOVEMENTD INSERT INTO ASN.IBMSNAP_SUBS_COLS(APPLY_QUAL,SET_NAME,WHOS_ON_FIRST, TARGET_OWNER,TARGET_TABLE,COL_TYPE,TARGET_NAME,IS_KEY,COLNO,EXPRESSION) VALUES(’WHQAGG’,’CHGAGG_SET’ , ’S’,’SIMON’,’MOVEMENTD’ ,’A’, ’IBMSNAP_OPERATION’,’N’, 5 ,’IBMSNAP_OPERATION’); ----- The IBMSNAP_LLOGMARKER and IBMSNAP_HLOGMARKER columns will be -- automatically maintained by the Apply process, representing -- the interval of the change aggregation.
376
The IBM Data Replication Solution
----- Remove old rows from the MOVEMENT table before the SQL-after -- statements run again. Otherwise, they will be double-counted -- in the AGGREGATES table. --- COMMIT...... INSERT INTO ASN.IBMSNAP_SUBS_STMTS(APPLY_QUAL,SET_NAME,WHOS_ON_FIRST, BEFORE_OR_AFTER,STMT_NUMBER,EI_OR_CALL,SQL_STMT,ACCEPT_SQLSTATES) VALUES(’WHQAGG’,’CHGAGG_SET’,’S’,’B’, 2 ,’E’,’COMMIT’,’0000002000’); -INSERT INTO ASN.IBMSNAP_SUBS_STMTS(APPLY_QUAL,SET_NAME,WHOS_ON_FIRST, BEFORE_OR_AFTER,STMT_NUMBER,EI_OR_CALL,SQL_STMT,ACCEPT_SQLSTATES) VALUES(’WHQAGG’,’CHGAGG_SET’,’S’,’B’, 3 ,’E’,’DELETE FROM SIMON.MOVEMENT’,’0000002000’); --- COMMIT...... INSERT INTO ASN.IBMSNAP_SUBS_STMTS(APPLY_QUAL,SET_NAME,WHOS_ON_FIRST, BEFORE_OR_AFTER,STMT_NUMBER,EI_OR_CALL,SQL_STMT,ACCEPT_SQLSTATES) VALUES(’WHQAGG’,’CHGAGG_SET’,’S’,’A’, 4 ,’E’,’COMMIT’,’0000002000’); ----- Add an SQL-after statement to compute adjust AGGREGATES for -- INSERT/UPDATE/DELETE to SALES. -- This is the main guts of the logic. --INSERT INTO ASN.IBMSNAP_SUBS_STMTS(APPLY_QUAL,SET_NAME,WHOS_ON_FIRST, BEFORE_OR_AFTER,STMT_NUMBER,EI_OR_CALL,SQL_STMT,ACCEPT_SQLSTATES) VALUES(’WHQAGG’,’CHGAGG_SET’,’S’,’A’, 5 ,’E’, ’UPDATE SIMON.AGGREGATES A SET SUM_PIECES= (SELECT CASE WHEN SUM(DIFFERENCE_PIECES) IS NULL THEN A.SUM_PIECES ELSE SUM(DIFFERENCE_PIECES) + A.SUM_PIECES END FROM SIMON.MOVEMENT M WHERE A.COMPANY=M.COMPANY AND A.LOCATION=M.LOCATION), SUM_OUTPRC= (SELECT CASE WHEN SUM(DIFFERENCE_OUTPRC) IS NULL THEN A.SUM_OUTPRC ELSE SUM(DIFFERENCE_OUTPRC) + A.SUM_OUTPRC END FROM SIMON.MOVEMENT M WHERE A.COMPANY=M.COMPANY AND A.LOCATION=M.LOCATION), IBMSNAP_HLOGMARKER= (SELECT CASE WHEN MAX(M.IBMSNAP_HLOGMARKER) IS NULL THEN A.IBMSNAP_HLOGMARKER ELSE MAX(M.IBMSNAP_HLOGMARKER) END FROM SIMON.MOVEMENT M)’,’0000002000’); ---- Add some more SQL-after to adds rows when new COMPANY’s, LOCATION’s -- are created. --INSERT INTO ASN.IBMSNAP_SUBS_STMTS(APPLY_QUAL,SET_NAME,WHOS_ON_FIRST, BEFORE_OR_AFTER,STMT_NUMBER,EI_OR_CALL,SQL_STMT,ACCEPT_SQLSTATES)
DJRA Generated SQL for Case Study 3
377
VALUES(’WHQAGG’,’CHGAGG_SET’,’S’,’A’, 6 ,’E’,’COMMIT’,’0000002000’); -INSERT INTO ASN.IBMSNAP_SUBS_STMTS(APPLY_QUAL,SET_NAME,WHOS_ON_FIRST, BEFORE_OR_AFTER,STMT_NUMBER,EI_OR_CALL,SQL_STMT,ACCEPT_SQLSTATES) VALUES(’WHQAGG’,’CHGAGG_SET’,’S’,’A’, 7 ,’E’, ’INSERT INTO SIMON.AGGREGATES (COMPANY,LOCATION,SUM_PIECES,SUM_OUTPRC,IBMSNAP_LLOGMARKER,IBMSNAP_HLOGMARKER) SELECT COMPANY,LOCATION,DIFFERENCE_PIECES,DIFFERENCE_OUTPRC, IBMSNAP_LLOGMARKER,IBMSNAP_HLOGMARKER FROM SIMON.MOVEMENT M WHERE NOT EXISTS (SELECT * FROM SIMON.AGGREGATES E WHERE E.COMPANY=M.COMPANY AND E.LOCATION=M.LOCATION)’, ’0000002000’); -INSERT INTO ASN.IBMSNAP_SUBS_STMTS(APPLY_QUAL,SET_NAME,WHOS_ON_FIRST, BEFORE_OR_AFTER,STMT_NUMBER,EI_OR_CALL,SQL_STMT,ACCEPT_SQLSTATES) VALUES(’WHQAGG’,’CHGAGG_SET’,’S’,’A’, 8 ,’E’,’COMMIT’,’0000002000’); ----- Increment the AUX_STMTS counter in IBMSNAP_SUBS_SET for -- the CHGAGG_SET. -UPDATE ASN.IBMSNAP_SUBS_SET SET AUX_STMTS=AUX_STMTS + 7 WHERE APPLY_QUAL=’WHQAGG’ AND SET_NAME=’CHGAGG_SET’ AND WHOS_ON_FIRST=’S’; --COMMIT; CONNECT RESET; ----- Connect to the source system and record the BASEAGG_SET and -- CHGAGG_SET subscriptions in the pruning control table. -CONNECT TO SJ390DB1 USER DB2RES5 ; -INSERT INTO ASN.IBMSNAP_PRUNCNTL( TARGET_SERVER,TARGET_OWNER, TARGET_TABLE,SOURCE_OWNER,SOURCE_TABLE,SOURCE_VIEW_QUAL,APPLY_QUAL, SET_NAME,CNTL_SERVER,TARGET_STRUCTURE,CNTL_ALIAS)VALUES(’DJDB’, ’SIMON’,’AGGREGATES’,’DB2RES5’,’SALES’, 0 ,’WHQAGG’, ’BASEAGG_SET’,’DJDB’, 5 ,’DJDB’); -INSERT INTO ASN.IBMSNAP_PRUNCNTL( TARGET_SERVER,TARGET_OWNER, TARGET_TABLE,SOURCE_OWNER,SOURCE_TABLE,SOURCE_VIEW_QUAL,APPLY_QUAL, SET_NAME,CNTL_SERVER,TARGET_STRUCTURE,CNTL_ALIAS)VALUES(’DJDB’, ’SIMON’,’MOVEMENTI’,’DB2RES5’,’SALES’, 0 ,’WHQAGG’, ’CHGAGG_SET’,’DJDB’, 6 ,’DJDB’); -INSERT INTO ASN.IBMSNAP_PRUNCNTL( TARGET_SERVER,TARGET_OWNER, TARGET_TABLE,SOURCE_OWNER,SOURCE_TABLE,SOURCE_VIEW_QUAL,APPLY_QUAL, SET_NAME,CNTL_SERVER,TARGET_STRUCTURE,CNTL_ALIAS)VALUES(’DJDB’, ’SIMON’,’MOVEMENTU’,’DB2RES5’,’SALES’, 0 ,’WHQAGG’, ’CHGAGG_SET’,’DJDB’, 6 ,’DJDB’); -INSERT INTO ASN.IBMSNAP_PRUNCNTL( TARGET_SERVER,TARGET_OWNER, TARGET_TABLE,SOURCE_OWNER,SOURCE_TABLE,SOURCE_VIEW_QUAL,APPLY_QUAL, SET_NAME,CNTL_SERVER,TARGET_STRUCTURE,CNTL_ALIAS)VALUES(’DJDB’, ’SIMON’,’MOVEMENTD’,’DB2RES5’,’SALES’, 0 ,’WHQAGG’, ’CHGAGG_SET’,’DJDB’, 6 ,’DJDB’); --
378
The IBM Data Replication Solution
COMMIT; -CONNECT RESET; --- Finshed. -- ************************************************************ -- Now check that all the SQL_Before and SQL-After are valid -- by using the Replication Analyzer with the DEEPCHECK option. -- ************************************************************
DJRA Generated SQL for Case Study 3
379
380
The IBM Data Replication Solution
Appendix F. DJRA Generated SQL for Case Study 4 This Appendix contains the SQL generated from DJRA for the various replication definitions which were configured in case study 4. The modifications to the generated SQL are shown in bold typeface.
F.1 Structures of the Tables The following SQL statements were used to create the tables: -- Customers table CREATE TABLE IWH.CUSTOMERS ( CUSTNO CHAR(8) NOT NULL, LNAME CHAR(20), FNAME CHAR(15), SEX CHAR(1), BIRTHDATE DATE, AGENCY INTEGER NOT NULL, SALESREP DECIMAL(6, 0) NOT NULL, ADDRESS CHAR(50), LICNB CHAR(12), LICCAT CHAR(1), LICDATE DATE, PRIMARY KEY (CUSTNO)) DATA CAPTURE CHANGES ; COMMENT ON IWH.CUSTOMERS ( CUSTNO IS ’Customer number’, LNAME IS ’First name’, FNAME IS ’Last name’, SEX IS ’Sex’, BIRTHDATE IS ’Birth date’, AGENCY IS ’Agency code’, SALESREP IS ’Sales rep in charge of the customer’, ADDRESS IS ’Customer Address’, LICNB IS ’Driving licence number’, LICCAT IS ’Driving licence category’, LICDATE IS ’Driving licence date’) ; -- Contracts table CREATE TABLE IWH.CONTRACTS ( CONTRACT INTEGER NOT NULL, CONTYPE CHAR(2) NOT NULL, CUSTNO CHAR(8) NOT NULL, LIMITED CHAR(1), BASEFARE DECIMAL(7, 2),
© Copyright IBM Corp. 1999
381
TAXES DECIMAL(7, 2), CREDATE DATE, PRIMARY KEY (CONTRACT)) DATA CAPTURE CHANGES ; COMMENT ON IWH.CONTRACTS ( CONTRACT IS ’Contract number’, CONTYPE IS ’Contract type’, CUSTNO IS ’Customer number’, LIMITED IS ’Warranty excludes fire/glass break’, BASEFARE IS ’Annual base fare’, TAXES IS ’Taxes’, CREDATE IS ’Creation date’) ; -- Vehicles table CREATE TABLE IWH.VEHICLES ( PLATENUM CHAR(12) NOT NULL, CONTRACT INTEGER NOT NULL, CUSTNO CHAR(8) NOT NULL, BRAND CHAR(10), MODEL CHAR(10), COACHWORK CHAR(1), ENERGY CHAR(2), POWER DECIMAL(4, 0), ENGINEID CHAR(10), VALUE DECIMAL(10, 0), FACTORDATE DATE, ALARM CHAR(1), ANTITHEFT HAR(1), PRIMARY KEY(PLATENUM)) DATA CAPTURE CHANGES ; COMMENT ON IWH.VEHICLES ( PLATENUM IS ’Plate-number’, CONTRACT IS ’Contract number’, CUSTNO IS ’Customer number’, BRAND IS ’Brand’, MODEL IS ’Model’, COACHWORK IS ’Coachwork type code’, ENERGY IS ’Energy type’, POWER IS ’Power’, ENGINEID IS ’Engine identification number’, VALUE IS ’Initial purchase value’, FACTORDATE IS ’Date of exit from factory’, ALARM IS ’Alarm feature code’, ANTITHEFT IS ’Anti-theft feature code’) ;
382
The IBM Data Replication Solution
-- Accidents table CREATE TABLE IWH.ACCIDENTS ( CUSTNO CHAR(8) NOT NULL, ACCNUM DECIMAL(5, 0) NOT NULL, TOWN CHAR(15), REPAIRCOST DECIMAL(10,2), STATUS CHAR(1), ACCDATE DATE, PRIMARY KEY(CUSTNO, ACCNUM)) DATA CAPTURE CHANGES ; COMMENT ON IWH.ACCIDENTS ( CUSTNO IS ’Customer number’, ACCNUM IS ’Accident record number’, TOWN IS ’Town where accident happened’, REPAIRCOST IS ’Repair cost’, STATUS IS ’Status’, ACCDATE IS ’Accident Date’) ;
F.2 SQL Script to Define the CONTRACTS Table as a Replication Source Notice : We adapted the generated SQL script before running it, to change the name of the Change Data table. --* echo input: TABLEREG SJNTDWH1 IWH CONTRACTS BOTH NONEEXCLUDED --* DELETEINSERTUPDATE STANDARD N --* -- using SRCESVR.REX as the REXX logic filename -- using REXX password file PASSWORD.REX -- If you don’t see: ’-- now done interpreting...’ -- then check your REXX code. -- now done interpreting REXX password file PASSWORD.REX -- connect to the source-server CONNECT TO SJNTDWH1 USER DBADMIN USING pwd; -- USERID=DBADMIN SOURCE_ALIAS alias=SJNTDWH1 -- PRDID=SQL0502 15 Mar 1999 6:04pm -- source server SJNTDWH1 is not a DataJoiner server, -- 1 candidate registrations, 4 already known to be registered
DJRA Generated SQL for Case Study 4
383
-- The following tables are candidates for registration: -1 table IWH.CONTRACTS
-- registration candidate #1 IWH.CONTRACTS -- IWH.CONTRACTS is assumed a USER table -- reading REXX logic file SRCESVR.REX -- If you don’t see: ’-- now done interpreting...’ -- then check your REXX code. -- in SRCESVR.REX, about to create a change data tablespace -- CREATE TABLESPACE TS041488 MANAGED BY DATABASE USING (FILE --’C:\TS041488.F1’ 500 ); -- now done interpreting REXX logic file SRCESVR.REX --* Source table IWH.CONTRACTS already has CDC attribute, no need to --* alter. -- selecting ’X’ as the before-image prefix character -- create the cd/ccd table for IWH.CONTRACTS CREATE TABLE IWH.CDCONTRACTS(IBMSNAP_UOWID CHAR(10) FOR BIT DATA NOT NULL,IBMSNAP_INTENTSEQ CHAR(10) FOR BIT DATA NOT NULL, IBMSNAP_OPERATION CHAR(1) NOT NULL,CONTRACT INTEGER NOT NULL,XCONTRACT INTEGER,CONTYPE CHAR(2) NOT NULL,XCONTYPE CHAR(2),CUSTNO CHAR(8) NOT NULL,XCUSTNO CHAR(8),LIMITED CHAR(1),XLIMITED CHAR(1),BASEFARE DECIMAL( 7 , 2),XBASEFARE DECIMAL(7 , 2),TAXES DECIMAL(7 , 2),XTAXES DECIMAL(7 , 2),CREDATE DATE,XCREDATE DATE) ; -- create the index for the change data table for IWH.CONTRACTS CREATE UNIQUE INDEX IWH.CDICONTRACTS ON IWH.CDCONTRACTS( IBMSNAP_UOWID ASC, IBMSNAP_INTENTSEQ ASC); -- insert a registration record into ASN.IBMSNAP_REGISTER INSERT INTO ASN.IBMSNAP_REGISTER(GLOBAL_RECORD,SOURCE_OWNER, SOURCE_TABLE,SOURCE_VIEW_QUAL,SOURCE_STRUCTURE,SOURCE_CONDENSED, SOURCE_COMPLETE,CD_OWNER,CD_TABLE,PHYS_CHANGE_OWNER,PHYS_CHANGE_TABLE, DISABLE_REFRESH,ARCH_LEVEL,BEFORE_IMG_PREFIX,CONFLICT_LEVEL, PARTITION_KEYS_CHG) VALUES(’N’,’IWH’,’CONTRACTS’, 0 , 1 ,’Y’,’Y’,’IWH’, ’CDCONTRACTS’,’IWH’,’CDCONTRACTS’, 0 ,’0201’,’X’,’1’,’N’) ; COMMIT; -- Satisfactory completion at 6:04pm
384
The IBM Data Replication Solution
F.3 SQL Script to Define the VCONTRACTS View as a Replication Source --* --* Calling VIEWREG for source table IWH.VCONTRACTS --* -- using REXX password file PASSWORD.REX -- If you don’t see: ’-code -- now done interpreting -- input view OWNER=IWH -- connect to the source CONNECT TO SJNTDWH1 USER
now done interpreting...’ then check your REXX REXX password file PASSWORD.REX input view NAME=VCONTRACTS server DBADMIN USING pwd;
-- USERID=DBADMIN SOURCE_ALIAS alias=SJNTDWH1 15 Mar 1999 6:40pm --* The view definition to be registered=’CREATE VIEW IWH.VCONTRACTS ( --* CONTRACT, CONTYPE, CUSTNO, LIMITED, BASEFARE, TAXES, CREDATE, --* AGENCY) AS SELECT A.CONTRACT, A.CONTYPE, A.CUSTNO, A.LIMITED, --* A.BASEFARE, A.TAXES, A.CREDATE, B.AGENCY FROM IWH.CONTRACTS A, --* IWH.CUSTOMERS B WHERE A.CUSTNO = B.CUSTNO ’ --* -- create the change data view for component 1 CREATE VIEW IWH.VCONTRACTSA (IBMSNAP_UOWID,IBMSNAP_INTENTSEQ, IBMSNAP_OPERATION,CONTRACT, CONTYPE, CUSTNO, LIMITED, BASEFARE, TAXES, CREDATE, AGENCY) AS SELECT B.IBMSNAP_UOWID,B.IBMSNAP_INTENTSEQ, B.IBMSNAP_OPERATION,A.CONTRACT, A.CONTYPE, A.CUSTNO, A.LIMITED, A.BASEFARE, A.TAXES, A.CREDATE, B.AGENCY FROM IWH.CONTRACTS A, IWH.CDCUSTOMERS B WHERE A.CUSTNO = B.CUSTNO ; -- register the base and change data views for component 1 INSERT INTO ASN.IBMSNAP_REGISTER(GLOBAL_RECORD,SOURCE_OWNER, SOURCE_TABLE,SOURCE_VIEW_QUAL,SOURCE_STRUCTURE,SOURCE_CONDENSED, SOURCE_COMPLETE,CD_OWNER,CD_TABLE,PHYS_CHANGE_OWNER,PHYS_CHANGE_TABLE, DISABLE_REFRESH,CCD_OWNER,CCD_TABLE,CCD_OLD_SYNCHPOINT,SYNCHPOINT, SYNCHTIME,CCD_CONDENSED,CCD_COMPLETE,ARCH_LEVEL,BEFORE_IMG_PREFIX, CONFLICT_LEVEL,PARTITION_KEYS_CHG) VALUES(’N’,’IWH’,’VCONTRACTS’, 1 , 1 ,’Y’,’Y’,’IWH’,’VCONTRACTSA’,’IWH’,’CDCUSTOMERS’, 0 ,NULL,NULL,NULL, NULL,NULL ,NULL,NULL,’0201’,NULL,’0’,’N’); -- create the change data view for component 2 CREATE VIEW IWH.VCONTRACTSB (IBMSNAP_UOWID,IBMSNAP_INTENTSEQ, IBMSNAP_OPERATION,CONTRACT, CONTYPE, CUSTNO, LIMITED, BASEFARE, TAXES, CREDATE, AGENCY) AS SELECT A.IBMSNAP_UOWID,A.IBMSNAP_INTENTSEQ, A.IBMSNAP_OPERATION,A.CONTRACT, A.CONTYPE, A.CUSTNO, A.LIMITED, A.BASEFARE, A.TAXES, A.CREDATE, B.AGENCY FROM IWH.CDCONTRACTS A, IWH.CUSTOMERS B WHERE A.CUSTNO = B.CUSTNO ;
DJRA Generated SQL for Case Study 4
385
-- register the base and change data views for component 2 INSERT INTO ASN.IBMSNAP_REGISTER(GLOBAL_RECORD,SOURCE_OWNER, SOURCE_TABLE,SOURCE_VIEW_QUAL,SOURCE_STRUCTURE,SOURCE_CONDENSED, SOURCE_COMPLETE,CD_OWNER,CD_TABLE,PHYS_CHANGE_OWNER,PHYS_CHANGE_TABLE, DISABLE_REFRESH,CCD_OWNER,CCD_TABLE,CCD_OLD_SYNCHPOINT,SYNCHPOINT, SYNCHTIME,CCD_CONDENSED,CCD_COMPLETE,ARCH_LEVEL,BEFORE_IMG_PREFIX, CONFLICT_LEVEL,PARTITION_KEYS_CHG) VALUES(’N’,’IWH’,’VCONTRACTS’, 2 , 1 ,’Y’,’Y’,’IWH’,’VCONTRACTSB’,’IWH’,’CDCONTRACTS’, 0 ,NULL,NULL,NULL, NULL,NULL ,NULL,NULL,’0201’,NULL,’0’,’N’); COMMIT; -- Satisfactory completion at 6:40pm
F.4 SQL Script to Create the CUST0001 Empty Subscription Set --* --* Calling ADDSET for --* --* echo input: ADDSET --* 19990316095500 R 2 --* -- using REXX password
set 1 : AQSR0001/CONT0001 SJNTDWH1 AQSR0001 CONT0001 SJNTDWH1 DBSR0001 NULL 30 TARGET=MSJET file PASSWORD.REX
-- If you don’t see: ’-- now done interpreting...’ then check your REXX code -- now done interpreting REXX password file PASSWORD.REX --* CONNECTing TO SJNTDWH1 USER DBADMIN USING pwd; --* --* The ALIAS name ’SJNTDWH1’ matches the RDBNAM ’SJNTDWH1’ --* --* connect to the CNTL_ALIAS --* CONNECT TO SJNTDWH1 USER DBADMIN USING pwd; -- current USERID=DBADMIN CNTL_ALIAS alias=SJNTDWH1 16 Mar 1999 9:57am -- create a new row in IBMSNAP_SUBS_SET INSERT INTO ASN.IBMSNAP_SUBS_SET( ACTIVATE,APPLY_QUAL,SET_NAME, WHOS_ON_FIRST,SOURCE_SERVER,SOURCE_ALIAS,TARGET_SERVER,TARGET_ALIAS, STATUS,LASTRUN,REFRESH_TIMING,SLEEP_MINUTES,EVENT_NAME, MAX_SYNCH_MINUTES,AUX_STMTS,ARCH_LEVEL) VALUES (0 , ’AQSR0001’ , ’CONT0001’ , ’S’ , ’SJNTDWH1’ , ’SJNTDWH1’ , ’MSJET’ , ’DBSR0001’ , 0 , ’1999-03-16-09.55.00’ , ’R’ , 2 ,NULL , 30 , 0 ,’0201’); -- create a new row in IBMSNAP_SUBS_SET
386
The IBM Data Replication Solution
INSERT INTO ASN.IBMSNAP_SUBS_SET( ACTIVATE,APPLY_QUAL,SET_NAME, WHOS_ON_FIRST,SOURCE_SERVER,SOURCE_ALIAS,TARGET_SERVER,TARGET_ALIAS, STATUS,LASTRUN,REFRESH_TIMING,SLEEP_MINUTES,EVENT_NAME, MAX_SYNCH_MINUTES,AUX_STMTS,ARCH_LEVEL) VALUES (0 , ’AQSR0001’ , ’CONT0001’ , ’F’ , ’MSJET’ , ’DBSR0001’ , ’SJNTDWH1’ , ’SJNTDWH1’ , 0 , ’1999-03-16-09.55.00’ , ’R’ , 2 ,NULL , 30 , 0 ,’0201’); --* commit work at SJNTDWH1 --* COMMIT; -- Satisfactory completion of ADDSET at 9:57am
F.5 SQL Script to Add a Member to the CONT0001 Empty Subscription Set --* --* Calling C:\SQLLIB\DPRTools\addmembr.rex for AQSR0001/CONT0001 pair # 2 --* --* Echo input: ADDMEMBR SJNTDWH1 AQSR0001 CONT0001 IWH VCONTRACTS --* NONEEXECLUDED ROWREPLICA CONTRACT+ IWH CONTRACTS NODATAJOINER U --* MSJET ’(AGENCY = 25)’ --* -- using REXX password file PASSWORD.REX -- If you don’t see: ’-- now done interpreting...’ then check your REXX code -- now done interpreting REXX password file PASSWORD.REX --* Connect to the CNTL_ALIAS --* CONNECT TO SJNTDWH1 USER DBADMIN USING pwd; --* --* --* --* --* --* --* --* --* --* --* --*
The ALIAS name ’SJNTDWH1’ matches the RDBNAM ’SJNTDWH1’ Current USERID=DBADMIN CNTL_ALIAS alias= 16 Mar 1999 11:24am Fetching from the ASN.IBMSNAP_SUBS_SET table at SJNTDWH1 CONNECTing TO SJNTDWH1 USER DBADMIN USING pwd; The ALIAS name ’SJNTDWH1’ matches the RDBNAM ’SJNTDWH1’ Fetching from the ASN.IBMSNAP_REGISTER table at SJNTDWH1 Because you are defining a replica subscription from a source view,
DJRA Generated SQL for Case Study 4
387
--* and not a source table, some unusual steps will be taken. The --* subscription down the heirarchy will name the view as the source --* and the replica as the target. The subscription up the heirarchy --* will name the replica as the source and the dominant table within --* the view as the target. The dominant table being the component of --* the source view which contributes the most columns to the source --* view. Only columns from the dominant component in the source view --* will be included in the replica. --* --* The source view IWH.VCONTRACTS includes 7 columns from the --* dominant view component IWH.CONTRACTS. --* --* The source view IWH.VCONTRACTS also includes 2 columns from view --* component IWH.CUSTOMERS, which will not be selected but can be --* referenced in your subscription predicates. --* -- using REXX logic file CNTLSVR.REX --* --* --* --* --* --*
If you don’t see: ’--* now done interpreting REXX logic file CNTLSVR.REX’, then check your REXX code The subscription predicate was not changed by the user logic in CNTLSVR.REX now done interpreting REXX logic file CNTLSVR.REX
-- create a new row in IBMSNAP_SUBS_MEMBR INSERT INTO ASN.IBMSNAP_SUBS_MEMBR( APPLY_QUAL,SET_NAME,WHOS_ON_FIRST, SOURCE_OWNER,SOURCE_TABLE,SOURCE_VIEW_QUAL,TARGET_OWNER,TARGET_TABLE, TARGET_CONDENSED,TARGET_COMPLETE,TARGET_STRUCTURE,PREDICATES) VALUES ( ’AQSR0001’ , ’CONT0001’ , ’S’ , ’IWH’ , ’VCONTRACTS’ , 1 ,’IWH’, ’CONTRACTS’,’Y’,’Y’, 9 , ’(AGENCY = 25)’); -- Create a new row in IBMSNAP_SUBS_COLS INSERT INTO ASN.IBMSNAP_SUBS_COLS(APPLY_QUAL,SET_NAME,WHOS_ON_FIRST, TARGET_OWNER,TARGET_TABLE,COL_TYPE,TARGET_NAME,IS_KEY,COLNO,EXPRESSION) VALUES(’AQSR0001’,’CONT0001’ , ’S’,’IWH’,’CONTRACTS’ ,’A’,’CONTRACT’, ’Y’, 1 ,’CONTRACT’); -- Create a new row in IBMSNAP_SUBS_COLS INSERT INTO ASN.IBMSNAP_SUBS_COLS(APPLY_QUAL,SET_NAME,WHOS_ON_FIRST, TARGET_OWNER,TARGET_TABLE,COL_TYPE,TARGET_NAME,IS_KEY,COLNO,EXPRESSION) VALUES(’AQSR0001’,’CONT0001’ , ’S’,’IWH’,’CONTRACTS’ ,’A’,’CONTYPE’, ’N’, 2 ,’CONTYPE’); -- Create a new row in IBMSNAP_SUBS_COLS INSERT INTO ASN.IBMSNAP_SUBS_COLS(APPLY_QUAL,SET_NAME,WHOS_ON_FIRST, TARGET_OWNER,TARGET_TABLE,COL_TYPE,TARGET_NAME,IS_KEY,COLNO,EXPRESSION)
388
The IBM Data Replication Solution
VALUES(’AQSR0001’,’CONT0001’ , ’S’,’IWH’,’CONTRACTS’ ,’A’,’CUSTNO’, ’N’, 3 ,’CUSTNO’); -- Create a new row in IBMSNAP_SUBS_COLS INSERT INTO ASN.IBMSNAP_SUBS_COLS(APPLY_QUAL,SET_NAME,WHOS_ON_FIRST, TARGET_OWNER,TARGET_TABLE,COL_TYPE,TARGET_NAME,IS_KEY,COLNO,EXPRESSION) VALUES(’AQSR0001’,’CONT0001’ , ’S’,’IWH’,’CONTRACTS’ ,’A’,’LIMITED’, ’N’, 4 ,’LIMITED’); -- Create a new row in IBMSNAP_SUBS_COLS INSERT INTO ASN.IBMSNAP_SUBS_COLS(APPLY_QUAL,SET_NAME,WHOS_ON_FIRST, TARGET_OWNER,TARGET_TABLE,COL_TYPE,TARGET_NAME,IS_KEY,COLNO,EXPRESSION) VALUES(’AQSR0001’,’CONT0001’ , ’S’,’IWH’,’CONTRACTS’ ,’A’,’BASEFARE’, ’N’, 5 ,’BASEFARE’); -- Create a new row in IBMSNAP_SUBS_COLS INSERT INTO ASN.IBMSNAP_SUBS_COLS(APPLY_QUAL,SET_NAME,WHOS_ON_FIRST, TARGET_OWNER,TARGET_TABLE,COL_TYPE,TARGET_NAME,IS_KEY,COLNO,EXPRESSION) VALUES(’AQSR0001’,’CONT0001’ , ’S’,’IWH’,’CONTRACTS’ ,’A’,’TAXES’,’N’, 6 ,’TAXES’); -- Create a new row in IBMSNAP_SUBS_COLS INSERT INTO ASN.IBMSNAP_SUBS_COLS(APPLY_QUAL,SET_NAME,WHOS_ON_FIRST, TARGET_OWNER,TARGET_TABLE,COL_TYPE,TARGET_NAME,IS_KEY,COLNO,EXPRESSION) VALUES(’AQSR0001’,’CONT0001’ , ’S’,’IWH’,’CONTRACTS’ ,’A’,’CREDATE’, ’N’, 7 ,’CREDATE’); -- create a new row in IBMSNAP_SUBS_MEMBR INSERT INTO ASN.IBMSNAP_SUBS_MEMBR( APPLY_QUAL,SET_NAME,WHOS_ON_FIRST, SOURCE_OWNER,SOURCE_TABLE,SOURCE_VIEW_QUAL,TARGET_OWNER,TARGET_TABLE, TARGET_CONDENSED,TARGET_COMPLETE,TARGET_STRUCTURE,PREDICATES) VALUES ( ’AQSR0001’ , ’CONT0001’ , ’S’ , ’IWH’ , ’VCONTRACTS’ , 2 ,’IWH’, ’CONTRACTS’,’Y’,’Y’, 9 , ’(AGENCY = 25)’); --* I noticed the set subscription is inactive --* UPDATE ASN.IBMSNAP_SUBS_SET SET ACTIVATE=1 WHERE APPLY_QUAL=’AQSR0001’ AND SET_NAME=’CONT0001’ AND WHOS_ON_FIRST=’S’; -- Create a new row in IBMSNAP_SUBS_MEMBR INSERT INTO ASN.IBMSNAP_SUBS_MEMBR( APPLY_QUAL,SET_NAME,WHOS_ON_FIRST, SOURCE_OWNER,SOURCE_TABLE,SOURCE_VIEW_QUAL,TARGET_OWNER,TARGET_TABLE, TARGET_CONDENSED,TARGET_COMPLETE,TARGET_STRUCTURE,PREDICATES) VALUES ( ’AQSR0001’ , ’CONT0001’ , ’F’ , ’IWH’ , ’CONTRACTS’ , 0 ,’IWH’, ’CONTRACTS’,’Y’,’Y’, 1 ,NULL); --* Assuming the set subscription is inactive, I’ll activate it.
DJRA Generated SQL for Case Study 4
389
--* UPDATE ASN.IBMSNAP_SUBS_SET SET ACTIVATE=1 WHERE APPLY_QUAL=’AQSR0001’ AND SET_NAME=’CONT0001’ AND WHOS_ON_FIRST=’F’; -- Create a new row in IBMSNAP_SUBS_COLS INSERT INTO ASN.IBMSNAP_SUBS_COLS(APPLY_QUAL,SET_NAME,WHOS_ON_FIRST, TARGET_OWNER,TARGET_TABLE,COL_TYPE,TARGET_NAME,IS_KEY,COLNO,EXPRESSION) VALUES(’AQSR0001’,’CONT0001’ , ’F’,’IWH’,’CONTRACTS’ ,’A’,’CONTRACT’, ’Y’, 1 ,’CONTRACT’); -- Create a new row in IBMSNAP_SUBS_COLS INSERT INTO ASN.IBMSNAP_SUBS_COLS(APPLY_QUAL,SET_NAME,WHOS_ON_FIRST, TARGET_OWNER,TARGET_TABLE,COL_TYPE,TARGET_NAME,IS_KEY,COLNO,EXPRESSION) VALUES(’AQSR0001’,’CONT0001’ , ’F’,’IWH’,’CONTRACTS’ ,’A’,’CONTYPE’, ’N’, 2 ,’CONTYPE’); -- Create a new row in IBMSNAP_SUBS_COLS INSERT INTO ASN.IBMSNAP_SUBS_COLS(APPLY_QUAL,SET_NAME,WHOS_ON_FIRST, TARGET_OWNER,TARGET_TABLE,COL_TYPE,TARGET_NAME,IS_KEY,COLNO,EXPRESSION) VALUES(’AQSR0001’,’CONT0001’ , ’F’,’IWH’,’CONTRACTS’ ,’A’,’CUSTNO’, ’N’, 3 ,’CUSTNO’); -- Create a new row in IBMSNAP_SUBS_COLS INSERT INTO ASN.IBMSNAP_SUBS_COLS(APPLY_QUAL,SET_NAME,WHOS_ON_FIRST, TARGET_OWNER,TARGET_TABLE,COL_TYPE,TARGET_NAME,IS_KEY,COLNO,EXPRESSION) VALUES(’AQSR0001’,’CONT0001’ , ’F’,’IWH’,’CONTRACTS’ ,’A’,’LIMITED’, ’N’, 4 ,’LIMITED’); -- Create a new row in IBMSNAP_SUBS_COLS INSERT INTO ASN.IBMSNAP_SUBS_COLS(APPLY_QUAL,SET_NAME,WHOS_ON_FIRST, TARGET_OWNER,TARGET_TABLE,COL_TYPE,TARGET_NAME,IS_KEY,COLNO,EXPRESSION) VALUES(’AQSR0001’,’CONT0001’ , ’F’,’IWH’,’CONTRACTS’ ,’A’,’BASEFARE’, ’N’, 5 ,’BASEFARE’); -- Create a new row in IBMSNAP_SUBS_COLS INSERT INTO ASN.IBMSNAP_SUBS_COLS(APPLY_QUAL,SET_NAME,WHOS_ON_FIRST, TARGET_OWNER,TARGET_TABLE,COL_TYPE,TARGET_NAME,IS_KEY,COLNO,EXPRESSION) VALUES(’AQSR0001’,’CONT0001’ , ’F’,’IWH’,’CONTRACTS’ ,’A’,’TAXES’,’N’, 6 ,’TAXES’); -- Create a new row in IBMSNAP_SUBS_COLS INSERT INTO ASN.IBMSNAP_SUBS_COLS(APPLY_QUAL,SET_NAME,WHOS_ON_FIRST, TARGET_OWNER,TARGET_TABLE,COL_TYPE,TARGET_NAME,IS_KEY,COLNO,EXPRESSION) VALUES(’AQSR0001’,’CONT0001’ , ’F’,’IWH’,’CONTRACTS’ ,’A’,’CREDATE’, ’N’, 7 ,’CREDATE’); -- CREATE A NEW ROW IN IBMSNAP_SCHEMA_CHG
390
The IBM Data Replication Solution
INSERT INTO ASN.IBMSNAP_SCHEMA_CHG( APPLY_QUAL,SET_NAME,LAST_CHANGED) VALUES ( ’AQSR0001’ , ’CONT0001’ , CURRENT TIMESTAMP); --* Commit work at cntl_ALIAS SJNTDWH1 --* COMMIT; --* Connect to the SOURCE_ALIAS --* CONNECT TO SJNTDWH1 USER DBADMIN USING pwd; --* The ALIAS name ’SJNTDWH1’ matches the RDBNAM ’SJNTDWH1’ --* --* record the subscription in the pruning control table at the --* source server --* INSERT INTO ASN.IBMSNAP_PRUNCNTL( TARGET_SERVER,TARGET_OWNER, TARGET_TABLE,SOURCE_OWNER,SOURCE_TABLE,SOURCE_VIEW_QUAL,APPLY_QUAL, SET_NAME,CNTL_SERVER,TARGET_STRUCTURE,CNTL_ALIAS)VALUES(’MSJET’,’IWH’, ’CONTRACTS’,’IWH’,’VCONTRACTS’, 1 ,’AQSR0001’,’CONT0001’,’SJNTDWH1’, 9 ,’SJNTDWH1’); --* record the subscription in the pruning control table at the --* source server --* INSERT INTO ASN.IBMSNAP_PRUNCNTL( TARGET_SERVER,TARGET_OWNER, TARGET_TABLE,SOURCE_OWNER,SOURCE_TABLE,SOURCE_VIEW_QUAL,APPLY_QUAL, SET_NAME,CNTL_SERVER,TARGET_STRUCTURE,CNTL_ALIAS)VALUES(’MSJET’,’IWH’, ’CONTRACTS’,’IWH’,’VCONTRACTS’, 2 ,’AQSR0001’,’CONT0001’,’SJNTDWH1’, 9 ,’SJNTDWH1’); --* Commit work at source_ALIAS SJNTDWH1 --* COMMIT; --* Satisfactory completion of ADDMEMBR at 11:24am
DJRA Generated SQL for Case Study 4
391
392
The IBM Data Replication Solution
Appendix G. Special Notices This publication is intended to help database administrators and replication specialists to learn more about the features of IBM’s replication solution available in multi-vendor database environments. The information in this publication is not intended as the specification of any programming interfaces that are provided by IBM DB2 DataPropagator, IBM DB2 Universal Database, and IBM DataJoiner. See the PUBLICATIONS section of the IBM Programming Announcement for IBM DB2 DataPropagator, IBM DB2 Universal Database, and IBM DataJoiner for more information about what publications are considered to be product documentation. References in this publication to IBM products, programs or services do not imply that IBM intends to make these available in all countries in which IBM operates. Any reference to an IBM product, program, or service is not intended to state or imply that only IBM's product, program, or service may be used. Any functionally equivalent program that does not infringe any of IBM's intellectual property rights may be used instead of the IBM product, program or service. Information in this book was developed in conjunction with use of the equipment specified, and is limited in application to those specific hardware and software products and levels. IBM may have patents or pending patent applications covering subject matter in this document. The furnishing of this document does not give you any license to these patents. You can send license inquiries, in writing, to the IBM Director of Licensing, IBM Corporation, 500 Columbus Avenue, Thornwood, NY 10594 USA. Licensees of this program who wish to have information about it for the purpose of enabling: (i) the exchange of information between independently created programs and other programs (including this one) and (ii) the mutual use of the information which has been exchanged, should contact IBM Corporation, Dept. 600A, Mail Drop 1329, Somers, NY 10589 USA. Such information may be available, subject to appropriate terms and conditions, including in some cases, payment of a fee. The information contained in this document has not been submitted to any formal IBM test and is distributed AS IS. The information about non-IBM ("vendor") products in this manual has been supplied by the vendor and IBM assumes no responsibility for its accuracy or completeness. The use of this information or the implementation of any of these techniques is a customer
© Copyright IBM Corp. 1999
393
responsibility and depends on the customer’s ability to evaluate and integrate them into the customer’s operational environment. While each item may have been reviewed by IBM for accuracy in a specific situation, there is no guarantee that the same or similar results will be obtained elsewhere. Customers attempting to adapt these techniques to their own environments do so at their own risk. Any pointers in this publication to external Web sites are provided for convenience only and do not in any manner serve as an endorsement of these Web sites. Any performance data contained in this document was determined in a controlled environment, and therefore, the results that may be obtained in other operating environments may vary significantly. Users of this document should verify the applicable data for their specific environment. The following document contains examples of data and reports used in daily business operations. To illustrate them as completely as possible, the examples contain the names of individuals, companies, brands, and products. All of these names are fictitious and any similarity to the names and addresses used by an actual business enterprise is entirely coincidental. Reference to PTF numbers that have not been released through the normal distribution process does not imply general availability. The purpose of including these reference numbers is to alert IBM customers to specific information relative to the implementation of the PTF when it becomes available to each customer according to the normal IBM PTF distribution process. The following terms are trademarks of the International Business Machines Corporation in the United States and/or other countries: AIX AS/400 DATABASE 2 DataJoiner DataPropagator DB2 DRDA IBM IMS
394
The IBM Data Replication Solution
MVS/ESA OS/400 OS/390 OS/2 PowerPC QMF RISC System/6000 VM/ESA VSE/ESA
The following terms are trademarks of other companies: Informix, Informix Dynamic Server, Informix ESQL/C, and Informix Client SDK are trademarks of Informix Corporation. Microsoft, Windows, Windows NT, the Windows logo, and Access are trademarks of Microsoft Corporation in the United States and/or other countries. MMX, and Pentium are trademarks of Intel Corporation in the United States and/or other countries. (For a complete list of Intel trademarks see www.intel.com/dradmarx.htm) Oracle, Oracle8, SQL*Net, Net8, SQL*Loader, and PL/SQL are trademarks of Oracle Corporation. Sybase, Open Server, Open Client, SQL Server, System11, and Transact-SQL are trademarks of Sybase Inc. UNIX is a registered trademark in the United States and/or other countries licensed exclusively through X/Open Company Limited. SET and the SET logo are trademarks owned by SET Secure Electronic Transaction LLC. Other company, product, and service names may be trademarks or service marks of others.
Special Notices
395
396
The IBM Data Replication Solution
Appendix H. Related Publications The publications listed in this section are considered particularly suitable for a more detailed discussion of the topics covered in this redbook.
H.1 International Technical Support Organization Publications For information on ordering these ITSO publications see “How to Get ITSO Redbooks” on page 401. • DPROPR Planning and Design Guide , SG24-4771 • WOW! DRDA Supports TCP/IP: DB2 Server for OS/390 and DB2 Universal Database, SG24-2212
• Lotus Solutions for the Enterprise, Volume 5 NotesPump: The Enterprise Data Mover, SG24-5255 • Migrating and Managing Data on RS/6000 SP with DB2 Parallel Edition, SG24-4658
H.2 Redbooks on CD-ROMs Redbooks are also available on CD-ROMs. Order a subscription and receive updates 2-4 times a year at significant savings.
CD-ROM Title System/390 Redbooks Collection Networking and Systems Management Redbooks Collection Transaction Processing and Data Management Redbook Lotus Redbooks Collection Tivoli Redbooks Collection AS/400 Redbooks Collection RS/6000 Redbooks Collection (HTML, BkMgr) RS/6000 Redbooks Collection (PostScript) RS/6000 Redbooks Collection (PDF Format) Application Development Redbooks Collection
Subscription Number SBOF-7201 SBOF-7370 SBOF-7240 SBOF-6899 SBOF-6898 SBOF-7270 SBOF-7230 SBOF-7205 SBOF-8700 SBOF-7290
Collection Kit Number SK2T-2177 SK2T-6022 SK2T-8038 SK2T-8039 SK2T-8044 SK2T-2849 SK2T-8040 SK2T-8041 SK2T-8043 SK2T-8037
H.3 Other Publications These publications are also relevant as further information sources:
© Copyright IBM Corp. 1999
397
• DataJoiner for AIX Systems Planning, Installation, and Configuration Guide, SC26-9145 • DataJoiner for Windows NT Systems Planning, Installation, and Configuration Guide, SC26-9150 • DataJoiner Application Programming and SQL Reference Supplement, SC26-9148 • DataJoiner Administration Supplement, SC26-9146 • DB2 Replication Guide and Reference , S95H-0999 • DB2 for OS/390 V5 SQL Reference, SC26-8966 • Oracle Net8 Administrator’s Guide, A58230-01 • Oracle8 Administrator’s Guide , A58397-01 • Administrator’s Guide for Informix Dynamic Server, Part No. 000-4354 • Oracle8 Utilities, A58244-01
H.4 Hot Web Sites IBM Software Homepage
http://www.software.ibm.com IBM Database Management Homepage
http://www.software.ibm.com/data IBM DProp Homepage:
http://www.software.ibm.com/data/dpropr IBM DataJoiner Homepage:
http://www.software.ibm.com/data/datajoiner IBM Data Management Performance Reports
http://www.software.ibm.com/data/db2/performance DataPropagator Relational Performance Measurement Series
http://www.software.ibm.com/data/db2/performance/dprperf.htm
398
The IBM Data Replication Solution
Related Publications
399
400
The IBM Data Replication Solution
How to Get ITSO Redbooks This section explains how both customers and IBM employees can find out about ITSO redbooks, CD-ROMs, workshops, and residencies. A form for ordering books and CD-ROMs is also provided. This information was current at the time of publication, but is continually subject to change. The latest information may be found at http://www.redbooks.ibm.com/.
How IBM Employees Can Get ITSO Redbooks Employees may request ITSO deliverables (redbooks, BookManager BOOKs, and CD-ROMs) and information about redbooks, workshops, and residencies in the following ways: • Redbooks Web Site on the World Wide Web http://w3.itso.ibm.com/
• PUBORDER – to order hardcopies in the United States • Tools Disks To get LIST3820s of redbooks, type one of the following commands: TOOLCAT REDPRINT TOOLS SENDTO EHONE4 TOOLS2 REDPRINT GET SG24xxxx PACKAGE TOOLS SENDTO CANVM2 TOOLS REDPRINT GET SG24xxxx PACKAGE (Canadian users only)
To get BookManager BOOKs of redbooks, type the following command: TOOLCAT REDBOOKS
To get lists of redbooks, type the following command: TOOLS SENDTO USDIST MKTTOOLS MKTTOOLS GET ITSOCAT TXT
To register for information on workshops, residencies, and redbooks, type the following command: TOOLS SENDTO WTSCPOK TOOLS ZDISK GET ITSOREGI 1998
• REDBOOKS Category on INEWS • Online – send orders to: USIB6FPL at IBMMAIL or DKIBMBSH at IBMMAIL Redpieces For information so current it is still in the process of being written, look at "Redpieces" on the Redbooks Web Site (http://www.redbooks.ibm.com/redpieces.html). Redpieces are redbooks in progress; not all redbooks become redpieces, and sometimes just a few chapters will be published this way. The intent is to get the information out much quicker than the formal publishing process allows.
© Copyright IBM Corp. 1999
401
How Customers Can Get ITSO Redbooks Customers may request ITSO deliverables (redbooks, BookManager BOOKs, and CD-ROMs) and information about redbooks, workshops, and residencies in the following ways: • Online Orders – send orders to: In United States In Canada Outside North America
IBMMAIL usib6fpl at ibmmail caibmbkz at ibmmail dkibmbsh at ibmmail
Internet [email protected] [email protected] [email protected]
• Telephone Orders United States (toll free) Canada (toll free)
1-800-879-2755 1-800-IBM-4YOU
Outside North America (+45) 4810-1320 - Danish (+45) 4810-1420 - Dutch (+45) 4810-1540 - English (+45) 4810-1670 - Finnish (+45) 4810-1220 - French
(long distance charges apply) (+45) 4810-1020 - German (+45) 4810-1620 - Italian (+45) 4810-1270 - Norwegian (+45) 4810-1120 - Spanish (+45) 4810-1170 - Swedish
• Mail Orders – send orders to: IBM Publications Publications Customer Support P.O. Box 29570 Raleigh, NC 27626-0570 USA
IBM Publications 144-4th Avenue, S.W. Calgary, Alberta T2P 3N5 Canada
IBM Direct Services Sortemosevej 21 DK-3450 Allerød Denmark
• Fax – send orders to: United States (toll free) Canada Outside North America
1-800-445-9269 1-800-267-4455 (+45) 48 14 2207
(long distance charge)
• 1-800-IBM-4FAX (United States) or (+1) 408 256 5422 (Outside USA) – ask for: Index # 4421 Abstracts of new redbooks Index # 4422 IBM redbooks Index # 4420 Redbooks for last six months • On the World Wide Web Redbooks Web Site IBM Direct Publications Catalog
http://www.redbooks.ibm.com http://www.elink.ibmlink.ibm.com/pbl/pbl
Redpieces For information so current it is still in the process of being written, look at "Redpieces" on the Redbooks Web Site (http://www.redbooks.ibm.com/redpieces.html). Redpieces are redbooks in progress; not all redbooks become redpieces, and sometimes just a few chapters will be published this way. The intent is to get the information out much quicker than the formal publishing process allows.
402
The IBM Data Replication Solution
IBM Redbook Order Form Please send me the following: Title
First name
Order Number
Quantity
Last name
Company Address City
Postal code
Country
Telephone number
Telefax number
VAT number
Card issued to
Signature
Invoice to customer number Credit card number
Credit card expiration date
We accept American Express, Diners, Eurocard, Master Card, and Visa. Payment by credit card not available in all countries. Signature mandatory for credit card payment.
403
404
The IBM Data Replication Solution
List of Abbreviations A
ODBC
Open Database Connectivity
CAE
DB2 Client Application Enabler
OLTP
CCD
Consistent change data table
Online Transaction Processing
PIT
Point in Time table
CD
Change data table
RDBMS
CPU
Central Processing Unit
Relational database management system
DAO
Data Access Objects
REXX
Restructured Extended Executor
DB2 UDB
DB2 Universal Database
RI
Referential Integrity
DCS
Database Connection Services
SNA
Systems Network Architecture
DDCS
Distributed Database Connection Services
SQL
Structured Query Language
DDF
Distributed Data Facility (component of DB2 for OS/390)
SQLCA
SQL Communication Area
TCP/IP
Transmission Control Protocol/Internet Protocol
UOW
unit-of-work table
DDL
Data Definiton Language (part of SQL)
DJRA
DataJoiner Replication Administration
DML
Data Manipulation Language (part of SQL)
DProp
DB2 DataPropagator
DRDA
Distributed Relational Database Architecture
EPOS
Electronic Point of Sales
FTP
File Transfer Program
GUI
Graphical User Interface
IBM
International Business Machines Corporation
ITSO
International Technical Support Organization
JCL
Job Control Language
© Copyright IBM Corp. 1999
405
406
The IBM Data Replication Solution
Index A abbreviations 405 acronyms 405 Apply ASNDONE user exit 111 automatic full refresh 86 bind 75, 192 changes from non-IBM sources 169 CPU utilization 24 enabling block fetch 126 general introduction 8 monitoring 106 placement 39 pull mode 39, 125 push mode 39 replication timing 49, 51 setup 71, 190 starting on OS/390 162 starting on Windows NT 200 TRCFLOW option 100 using multiple parallel processes 125 APPLYTRAIL table looking for details 109 pruning 93 ASNDONE user exit 111 ASNJET 276, 277
B block fetch 126 blocking factor 54 business requirements gathering 16 questions 16
C Capture bind 74 commit interval 119 CPU utilization 24 detecting errors 102 dropping unnecessary triggers for non-IBM sources 133 execution priority 118 general introduction 8 lag 102
© Copyright IBM Corp. 1999
lag limit 119 monitoring 101 NOPRUNE start option 92, 127 PRUNE command 92 pruning interval 119 resolving a gap 103 retention limit 120 setup 71, 190 starting 85 TRACE option 100 triggers 166 triggers, impact 55 tuning parameters 118 change data tables indexes 121 pruning 92 tablespace usage 120 control tables create at control servers 74, 191 create at source servers 74, 191 general introduction 8 placement 46 reorganizing 91
D data consolidation scenario configuration tasks 151 DataJoiner placement 44 defining replication subscriptions 161 moving from test to production 164 placement of the system components 142 replication design 145, 159 system design 142 system topology 150 target site union 145 data distribution scenario configuration tasks 188 DataJoiner placement 41 replication design 181, 192 system design 177 system topology 186 data volumes estimation 20 log space 22 spill file 24 staging tables 22 unit-of-work table 22
407
data warehouse scenario configuration tasks 214 initial load 261 maintaining a base aggregate table from a change aggregate subscription 257 maintaining history information 211, 220, 235, 244, 250 pushing down the replication status 259 replication design 210, 217 star join example 270 system design 209 system topology 213 using a CCD target table for the facts 245 using source site joins for data denormalization 237 using target site views for data denormalization 228 DataJoiner data access modules 66 database 68 general introduction 28 instance 67 monitoring 116 nicknames 9, 34, 79, 159, 186 placement 40 server mappings 69 server options 70 setup 65 user mappings 70 DB2 UDB Control Center 39 denormalization using source site joins 237 using target site views 228 DJRA connectivity 38 general introduction 8, 28 setup 72, 191, 217 user exits 73 DProp components 8 features 30 general introduction 28 open monitoring interface 99
forceing for a certain subscription set 132 forcing for all sets reading from a source server 133 forcing for all sets reading from a source table 133 initial automatic 86, 306 manual 89 preventing selectively 129 techniques 125 using DataJoiner’s EXPORT/IMPORT utilities 263 using DSNTIAUL and Oracle’s SQL*Loader utility 264 using SQL INSERT/SELECT from DataJoiner 262
H handshake between Capture and Apply 87 heterogeneous replication architectures 33 features 31
L lock rules 121 log space volume estimation 22
P password file create 196, 299 performance tuning 117 project implementation checklist 63 implementation tasks 65 planning 13 staffing 14 pruning 91 APPLYTRAIL table 93 change data tables 92 consistent change data tables 92 defering pruning for non-IBM sources 127 tuning considerations 127 unit-of-work table 92
F full refresh allow for certain subscriptions 130 disable for all subscriptions 129
408
The IBM Data Replication Solution
R reorganization of change data tables 91
of the unit-of-work table 91 replication sources defering pruning for non-IBM 127 defining non-IBM tables as 79 dropping unnecessary capture triggers for non-IBM 133 high-level picture 18 non-IBM source 34, 166 register 77, 159 replication status 259 replication targets defining non-IBM tables as 77 high-level picture 18 invoking stored procedures 185 non-IBM target 33 referential integrity considerations 49 target table types 49 RUNSTATS 122
S scheduling event based 52 relative timing 52 server mappings 69 server options 70 skills application specialists 15 data replication professional 14 database administrator 15 network specialists 15 system specialists 15 source site joins 237 source-to-target mapping 19 spill file using memory rather than disk 126 volume estimation 24 SPUFI 158 staffing 14 staging tables volume estimation 20, 21, 22 stored procedure add to subscription set 199 stored procedures invoking at target database 185 subscription set add stored procedure to 199 changing Apply qualifier or set name 134 deactivating 129
detecting errors 107 how to make use of 122 identifying members 109 latency 108 monitoring the status 106
T target site views 228
U unit-of-work table pruning 92 tablespace usage 120 volume estimation 21, 22 update anywhere scenario administration and maintenance 316 configuration tasks 279 conflict detection 313 creating source views to enable subsetting 287 major pitfalls 300 monitoring and problem determination 316 operational implications 314 replicating updates 311 replication design 277, 287 replication results 301 system design 276 system topology 277 user mappings 70 utility operations LOAD 95 pseudo-ALTER 96 RECOVER 96 REORG 97
409
410
The IBM Data Replication Solution
ITSO REDBOOK EVALUATION My Mother Thinks I’m a DBA! Cross-Platform, Multi-Vendor, Distributed Relational Data Replication with IBM DB2 DataPropagator and IBM DataJoiner Made Easy! SG24-5463-00 Your feedback is very important to help us maintain the quality of ITSO redbooks. Please complete this questionnaire and return it using one of the following methods: • Use the online evaluation form found at http://www.redbooks.ibm.com • Fax this form to: USA International Access Code + 1 914 432 8264 • Send your comments in an Internet note to [email protected] Which of the following best describes you? _ Customer _ Business Partner _ Solution Developer _ None of the above
_ IBM employee
Please rate your overall satisfaction with this book using the scale: (1 = very good, 2 = good, 3 = average, 4 = poor, 5 = very poor) Overall Satisfaction
__________
Please answer the following questions: Was this redbook published in time for your needs?
Yes___ No___
If no, please explain:
What other redbooks would you like to see published?
Comments/Suggestions:
© Copyright IBM Corp. 1999
(THANK YOU FOR YOUR FEEDBACK!)
411
My Mother Thinks I’m a DBA! Cross-Platform, Multi-Vendor, Distributed Relational Data Replication with IBM DB2 DataPropagator and IBM DataJoiner Made Easy!
SG24-5463-00 Printed in the U.S.A.
SG24-5463-00
Related Documents