Informatica User Group PowerCenter : Differences Between v 7 & v 8 Mark Murray - Senior Sales Consultant October, 19th 2006
Informatica confidential. For discussion purposes only. 1
Goals for New Architecture •
Enterprise Deployment • Improved Service Orientation • High Availability • Grid Deployments
•
Centralized Services • Administration • Logging & Auditing
•
Single Point of Administration • Traditional Configuration • HA Configuration • Grid Configuration
Informatica confidential. For discussion purposes only.
2
What do customers want? • High Availability and Failover was a top 10 request in the 2004 User Group surveys • Database Pushdown Optimization was 10th out of 66 features in the 2005 Surveys • Improved logging capabilities was 2nd out of over 60 feature requests in the 2004 surveys • Looping support within the Designer
Informatica confidential. For discussion purposes only.
3
Informatica Data Integration Platform Continually Raising the Bar Hercules 2007
PowerCenter 8.1.1 Now
PowerCenter 7
On-Demand Platform for the Enterprise
Mission-Critical Enterprise Deployment
Advanced Edition One Product, Single Install
Informatica confidential. For discussion purposes only.
4
Informatica Delivers Continuous Innovation
<18 min
0:37
“With PowerCenter continually leapfrogging on performance and scalability, we are never concerned about our ability to handle increasingly large data volumes in our data integration environment.”
3:36 SOA Web services Grid, 64-bit Team development Enterprise security Mainframe Data Server and CDC Impact analysis
SOA Web Services Grid, 64-bit Team development Enterprise security Mainframe Data Server and CDC Impact analysis
Realtime Workflow Data quality 3-tier architecture Enterprise metadata
Realtime Workflow Data quality 3-tier architecture Enterprise metadata
Realtime Workflow Data quality 3-tier architecture Enterprise metadata
Partitioning Debugger XML Metadata connectivity
Partitioning Debugger XML Metadata connectivity
Partitioning Debugger XML Metadata connectivity
Partitioning Debugger XML Metadata connectivity
Pipelining ERP Connectivity UNICODE
Pipelining ERP Connectivity UNICODE
Pipelining ERP Connectivity UNICODE
Pipelining ERP Connectivity UNICODE
--- Kevin Smith, CRM Strategies Manager, AAA Carolina 6:35
1 TB Transform and Load Test HR: Min
Pipelining ERP Connectivity UNICODE
V4.x
Session On Grid Adaptive Load Balancing High Availability Dynamic Partitioning Pushdown Optimization Unstructured Data Data Federation
V5.x
V6.x
V7.x
Informatica confidential. For discussion purposes only.
V8.x
5
What else is in the Informatica product family? PowerCenter Options Data Cleanse and Match
PowerCenter 8 Advanced Edition
Data Federation (EII)
New
Enterprise Grid High Availability
Metadata Manager Pushdown Optimization
Data Analyzer
Unstructured Data
Team Based Development
Mapping Generation Data Profiling
PowerCenter 8 Standard Edition
Updated
Partitioning Real-Time PowerCenter Connects
Broader
Metadata Exchange
Informatica confidential. For discussion purposes only.
6
PowerCenter 8 Base Improvements Delivering Value for Installed Base Customers Reduce Time To Results PowerCenter Advanced Edition Metadata Manager Data Analyzer Team Based Development
PowerCenter Standard Edition
•
Java transformation support
•
User defined functions
•
Extended expression library
•
Mapping generation and templates
•
Improved Data Profiling
Cost Effectively Scale •
Centralized administration web-based console
•
Extended recovery options
•
Connection resilience (RDMS, Network, PC)
•
Flat File Performance Optimization
•
Enhanced, centralized logging
•
Enhanced Team-Based Development
•
Unicode repository option
Informatica confidential. For discussion purposes only.
7
PowerCenter 8 Release Themes
• • • • •
Service Oriented Architecture 24x7 Availability of PowerCenter services Order of magnitude performance improvements Unlimited scalability Improved developer productivity
Informatica confidential. For discussion purposes only.
8
PowerCenter 8.x Update – Setting the Standard for Data Integration across the Enterprise •
•
Infrastructure and Server Enhancements • • • • •
• • • • •
Services based Architecture High Availability Grid Enhancements Easy Grid Configuration Centralized administration web-based console Centralized configuration
•
•
Developer Enhancements
Performance Enhancements • • • •
Pushdown Optimization Flat Files Partitioning Auto Cache
•
Connection resilience (RDMS, Network, PC)
•
•
Functions and Expressions User Defined Functions Java Transformation Dynamic Target Creation Visio Template – mapping generation and templates Upgrade Wizard
Expand the definition of universal data access • • • •
Data Federation Option Unstructured Data Option Data Quality Option – Extended PowerExchange
Informatica confidential. For discussion purposes only.
9
PowerCenter 8 Architecture
Informatica confidential. For discussion purposes only. 10
PowerCenter 6 and 7 Architecture Client Tools
Repository Manager Designer
Workflow Manger Workflow Monitor
Repository Server Admin Console Web Services Hub
PowerCenter Connects
Repository Server
Repository Database
Data Servers (pmserver)
PowerExchange
Machine Informatica confidential. For discussion purposes only.
11
PowerCenter 8 Architecture Client Tools
Repository Manager Designer
Workflow Manger Workflow Monitor
Administration Console
Application Services Integration Service
Repository Service
*
Web Services Hub
PowerCenter Connects
PowerExchange
Repository Database
SAP BW Service
Core Services Repository Service Domain/Gateway Services • • • •
Log Service
Administration & Authorization Configuration Domain Licensing
Node & Domain .
Informatica confidential. For discussion purposes only.
12
PowerCenter 8 Terminology • Services • A service is a resource that provides specialized functions. • PowerCenter has two types of services. Application and Core Services. • PowerCenter Application Services – represents server based functions such as Repository, Integration, SAP BW, and WebService Hub services. • PowerCenter Core Services – represents functions that manage and maintain the environment in which PowerCenter operates.
Informatica confidential. For discussion purposes only.
13
Introducing PowerCenter 8 Terminology • Node • A node is a logical representation of a physical machine. It has physical attributes such as a hostname and port number. • Each node runs a Service Manager which is responsible for the application and core services. • Is started when you start “Informatica Services”
• Domain • A domain is the fundamental unit of PowerCenter Services administraion. • A domain is a logical collection or set of nodes and services that you can group in a “folder like” deployment.
Informatica confidential. For discussion purposes only.
14
PowerCenter 8 Terminology • Service Manager • On the gateway node, the Service Manager is responsible for • Controlling the domain • Manage services running on the domain • Provide service lookup
• On all nodes, the Service Manager • Controls the core services and application services
Informatica confidential. For discussion purposes only.
15
PowerCenter Services Framework PowerCenter Domain
Client Tools Repository Database
Designer Repository Manager Workflow Manager Monitor
Check point
Repository Service
Master Gateway (Domain Controller)
Logs Domain Metadata
Administration Console Integration Service
Informatica confidential. For discussion purposes only.
16
High Availability (HA)
Informatica confidential. For discussion purposes only. 17
High Availability in PC8 • Failover • Restart for data integration, repository and other services • Primary and backup servers
• Recovery • Workflow and sessions will be recovered on running servers on the grid during server failure • Checkpoint recovery
• Repository recovery
• Resilience • PowerCenter jobs will sustain transient failure • Network errors • DB connection failures Informatica confidential. For discussion purposes only.
18
Resilience • DB Connection Resilience • When connecting/disconnecting from a DB • Oracle, DB2, Sybase, SQL Server and Teradata • Retry interval based on timeout setting
• FTP Resilience • For connections to FTP server • Read/write will recover if connection lost based on timeout parameter
• Internal Resilience • PowerCenter components (integration service, clients etc.) resilient to Repository service failure
Informatica confidential. For discussion purposes only.
19
Simple High Availability/Failover Scenario • Simple environment • 1 Domain which consists of: • 2 nodes for Integration Services
Node01 (Int_Svc01)
• node01 - Primary • node02 - Backup
• 1 server for repository.
Repository DB Node02 (Int_Svc02)
Informatica confidential. For discussion purposes only.
20
Simple High/Failover Availability Scenario • node01 Integration Service goes down • Node01 Integration Service “fails over” to node02
Component Failure (HW/SW)
node01 (Int_Svs01)
Repository DB node02 (Int_Svs02)
Automatic Failover Restart Recovery
Informatica confidential. For discussion purposes only.
21
Grid Enhancements
Informatica confidential. For discussion purposes only. 22
Domain Overview Dashboard Simplified, Web-based Administration
Services Configuration Remember pmserver config file?
Domain Example Primary & Backup Repository Service
Nodes
Services
Informatica confidential. For discussion purposes only.
23
Mission-critical Enterprise Deployment Cost-effective Scalability with PowerCenter on a Grid Automatically recover, restart on live server
Failed Hardware Server
PowerCenter Domain Controller
Distributed processing of sessions PowerCenter Domain on Server Grid
Informatica confidential. For discussion purposes only.
24
Grid Enhancements
Grid Object • • •
•
Workflow distributed on Grid (WOnG) • •
•
New in version 8 Can partition sessions to run on multiple nodes
Dynamic Partitioning • •
Same as version 7 Distribute Sessions of a Workflow across multiple nodes
Session distributed on Grid (SOnG) • •
Configured from admin console Services can be assigned to grid Workflows are assigned to be run by services
# of partitions dynamically determined at runtime Less configuration for users
Resource Maps • •
Configure available resources on nodes in grid through admin console Load balancer dispatch jobs based on resource availability on nodes
Informatica confidential. For discussion purposes only.
25
Grid – PC 7 vs. PC 8 PowerCenter 7 •
ServerGrid is collection of pmservers
•
Work is directed to individual pmservers
•
Work distributed across Grid in round-robin manner
•
Session/task is lowest unit of work
Informatica confidential. For discussion purposes only.
26
Grid Capabilities in 7.x vs. 8.x 8.X
7.x • ServerGrid Object • Collection of pmservers • Workflows explicitly assigned to pmservers • Pmservers belonging to a ServerGrid will dispatch to other pmservers • Pmservers could fail causing workflows to fail • Can’t split sessions across multiple nodes • Load balancer is round robin only
• Grid object • Collection of nodes
• Workflows assigned to Integration Service • Integration Service assigned to Grid (can run on any node in grid) • If one node fails, another Integration Service process on another node in grid takes over running the workflow • A session can be partitioned across nodes • Load balancer takes into account resource availability on nodes and resource requirements of sessions for dispatch.
Informatica confidential. For discussion purposes only.
27
Performance Improvements
Informatica confidential. For discussion purposes only. 28
Pushdown Optimization
Informatica confidential. For discussion purposes only. 29
Introduction • What is pushdown optimization? • Push transformation processing to data sources & targets w/o moving data out
• Benefits • Reduce movement of data when source and target are the same database instance • Utilize database-specific processing that may be more optimal
• Maintain metadata and lineage in PowerCenter
Informatica confidential. For discussion purposes only.
30
Pushdown Optimization •
Full Pushdown: • Source and target are in the same RDBMS • All transformations can be processed in database
•
Partial Source: • One or more transformations can be processed in source database
•
Partial Target : • One or more transformations can be processed in target database
•
Generated SQL: • INSERT INTO t (…) VALUES (?+1, SOUNDEX(?))
Extract Source DB
Transform
Load Target DB
Informatica confidential. For discussion purposes only.
31
Example – Full Pushdown SQL & Business Logic Maintained in Repository
Informatica confidential. For discussion purposes only.
32
Flat File Performance & Parameter and Variable Enhancements
Informatica confidential. For discussion purposes only. 33
Flat file enhancements • FF Reader and Writer have been rewritten to optimize for performance • Delimited files with lots of decimal data will see the most significant performance improvements • Out of box performance improvements should be between 30%300%
• Append to flat file targets • Session output can be appended to existing flat file
• Flat file source/target command support • Sources: use a command to generate source data or a file list that references multiple source files. • Targets: use a command to process the target data or process data for all partitioned targets in a session. Informatica confidential. For discussion purposes only.
34
Parameters and Variables Enhancements • Parameter Enhancements • Table owner name for relational sources/targets • E-mail address • FTP remote file name
• Global section specification in parameter files for use across different workflows / sessions
Informatica confidential. For discussion purposes only.
35
Partitioning Enhancements
Informatica confidential. For discussion purposes only. 36
Partitioning Enhancements • Flat File Partitioning • FF targets can now be partitioned • All partitions can write to a single file, a merge file or file list can be created that contains the names of the individual files that were written
• Database Partitioning • Partitioned Oracle and DB2 sources can be read in parallel • No changes to targets. DB2 can be written to in parallel.
• Dynamic Partitioning • Based on # of partitions in database • Based on the # of nodes in a Grid
Informatica confidential. For discussion purposes only.
37
Auto Cache
©Informatica Informatica Corporation, 2006. rights reserved. confidential. ForAll discussion purposes only. 38
AutoCache Overview • Cache in PowerCenter v7 • • • •
Default cache settings not adequate for all situations. Default settings can underestimate new chip technologies. Sometimes necessary to hand tune individual transformations. Development did not always scale when deployed to different production machines.
• Auto Cache in PowerCenter v8.x • Automatically distribute session memory to transformations. • Automatically scale memory usage based on resource available. • Automatically scale memory usage based on mapping complexity.
Informatica confidential. For discussion purposes only.
39
Memory Attributes •
PowerCenter has two types of memory attributes: • Transformation Memory Attributes • Session Memory Attributes
•
Transformation Memory Attributes are for individual transformations: • Lookup, Aggregator, Rank, Joiner • Index and Data Cache Size
• Sorter Cache Size • XML Target Cache Size
•
Session Memory Attributes are for the session: • Default Buffer Block Size • DTM Buffer Size
Informatica confidential. For discussion purposes only.
40
New Memory Attribute Specification • Previously, only integer byte value were allowed for Memory Attributes. E.g, 1000000 or 2000000. • Now also allow shortcuts: “KB”, “MB”, and “GB”. E.g, 100MB • Also allow the value “Auto” • This indicates that the user wants PowerCenter to automatically find a good value for that memory attribute • “Auto” supported for both session (e.g. DTM buffers/buffer block size) and transformation memory attributes (e.g. lookup caches)
Informatica confidential. For discussion purposes only.
41
AutoCache •
Allows the user to leave the calculations to PowerCenter
•
User specifies total amount of memory AutoCache is allowed to use
•
Automatically computes a value for ALL memory attributes that have the value “Auto”
•
Will NOT affect any memory attributes where the value is not “Auto”
Informatica confidential. For discussion purposes only.
42
Cache Calculator •
Click drop down
•
Calculate based on the number of rows and the ports going into the object
•
Value is propogated into the Cache value
Informatica confidential. For discussion purposes only.
43
Developer Improvements
Informatica confidential. For discussion purposes only. 44
Functions and Expressions
Informatica confidential. For discussion purposes only. 45
Function Enhancements • Over 20 new functions added in the 8.x release • Financial Functions, Regular Expression parsing/match, IN(), Compression, Encryption, CRC, MD5 and more
• Custom Functions • Extend the functionality of the Expression Transformation via a C API • All 20+ functions above were added via this API
Informatica confidential. For discussion purposes only.
46
Function Enhancements • User Defined Functions (UDF) • Ability for Designer users to create reusable functions entirely within the Expression Language • UDFs are folder level objects • can use any valid functions (except aggregation functions) as well as other UDFs (in the same folder)
Informatica confidential. For discussion purposes only.
47
Java & SQL Transformations
Informatica confidential. For discussion purposes only. 48
Java Transformation Use Cases • Looping over data • Walking data hierarchies • Calling third-party APIs (Java based) • Calling RMI/EJB etc. • Other Java Packages
• Calling expression/UDF/unconnected widget (like lookup) from Custom Transformation • Simple “Custom Transformation”
Informatica confidential. For discussion purposes only.
49
Improved Developer Productivity Java Inline Coding Sample
Informatica confidential. For discussion purposes only.
50
SQL Transformation Use Cases • New SQL Transformation • Allows PowerCenter developers to execute SQL statements midstream in a mapping. • You can insert, delete, update, and retrieve rows from a database and returns database errors. • The SQL that is executed can be static SQL or can be dynamic where the SQL statement is itself created on a row by row basis. • The SQL transformation can also be used to execute SQL scripts from within a mapping – e.g. leverage SQL scripts that already exist
Informatica confidential. For discussion purposes only.
51
XML
Informatica confidential. For discussion purposes only. 52
XML Enhancements • Filter data with query predicate • Create a default namespace • Import part of an XML schema • Use anySimpleType
Informatica confidential. For discussion purposes only.
53
Metadata Enhancements
Informatica confidential. For discussion purposes only. 54
Metadata Exchange Enhancements • New Data Model Support • • • •
Sybase Power Designer – bi-directional Oracle Designer – bi-directional ER Studio Design Tool – uni-directional (same as before) CA Erwin – bi-directional
• Business Intelligence Support • Business Objects (bi-directional) – added 6.5 & XI & XI R2 XConnects • Cognos ReportNet Framework Manager (bi-directional) – added 2.0 • Microstrategy (bi-directional) – added 8.0
Informatica confidential. For discussion purposes only.
55
Dynamic Target Creation
Informatica confidential. For discussion purposes only. 56
Dynamic Target creation • Ability to dynamically create a target based on a transformation in the workspace or navigator • Right click on transformation in workspace and selected Create and Add Target • Drag a transformation and drop it in the Target folder • Has same port definitions as transformation from which it was created • Target type is same as repository you are using • Can edit the target definition to change type or ports • Creation dialog will be added in an upcoming release
Informatica confidential. For discussion purposes only.
57
Improved Developer Productivity Target Generation
Simply Right-Click on an object…
…..Target is created! All you need to do is Auto link and you are ready to go
Informatica confidential. For discussion purposes only.
58
Mapping Generation Option Visio Client for PowerCenter
Informatica confidential. For discussion purposes only. 59
Mapping Generation Option • Bi-Directional “engine” for automatically generating mappings from Visio templates or reverse engineering PowerCenter mappings into Visio templates • Leverages the Informatica Data Stencil and Velocity templates for Visio
Informatica confidential. For discussion purposes only.
60
Visio Client for PowerCenter
Mapping Template
Template Inputs
Informatica confidential. For discussion purposes only.
61
Upgrade Wizard
Informatica confidential. For discussion purposes only. 62
PowerCenter Upgrade to 8.1 • A new Upgrade wizard in Admin Console • Integrated UI that takes the user through the various steps in the upgrade • Provides a detailed upgrade summary report in the end • Allows user to switch in and out of the Upgrade UI to perform any other administrative activities • Can handle multiple repositories (global /local) and multiple PowerCenter Servers in one shot • Live feedback during repository upgrade as user goes through the upgrade process
• A new post-upgrade reference guide
Informatica confidential. For discussion purposes only.
63
Summary
Informatica confidential. For discussion purposes only. 64
Summary - PC 7 vs. PC 8 PC 8.x
PC 7.x •
3 Tier Architecture
•
Services Oriented Architecture
•
Basic Grid Deployment
•
Enhanced Grid Deployment
•
Introduction to Profiling
•
Added Transformations • •
• • •
High Availability Session on Grid Resilience
Union XML
•
Enhanced Profiling
•
Web Services
•
Added Transformations
•
Team Based Development
• •
•
Java SQL
Enhanced Productivity • •
Mapping Generation User Defined Functions
Informatica confidential. For discussion purposes only.
65
Thank You Questions at the break
Informatica confidential. For discussion purposes only. 66