Filter Transformation Drops rows conditionally
Active Transformation Connected Ports • All input / output Specify a Filter condition Usage • Filter rows from flat file sources • Single pass source(s) into multiple targets 1
Aggregator Transformation Performs aggregate calculations
Active Transformation Connected Ports • Mixed • Variables allowed • Group By allowed Create expressions in output or variable ports Usage • Standard aggregations 2
Informatica Functions Aggregate Functions AVG COUNT FIRST LAST MAX MEDIAN MIN PERCENTILE STDDEV SUM VARIANCE
Return summary values for non-null data in selected ports Use only in Aggregator transformations Use in output ports only Calculate a single value (and row) for all records in a group Only one aggregate function can be nested within an aggregate function Conditional statements can be used with these functions 3
Aggregate Expressions Aggregate functions are supported only in the Aggregator Transformation
Conditional Aggregate expressions are supported
Conditional SUM format: SUM(value, condition) 4
Aggregator Properties Sorted Input Property
Instructs the Aggregator to expect the data to be sorted
Set Aggregator cache sizes (on Informatica Server machine) 5
Sorted Data The Aggregator can handle sorted or unsorted data • Sorted data can be aggregated more efficiently, decreasing total processing time
The Server will cache data from each group and release the cached data -- upon reaching the first record of the next group Data must be sorted according to the order of the Aggregator “Group By” ports Performance gain will depend upon varying factors
6
Incremental Aggregation Trigger in Session Properties, Performance Tab
MTD calculation
Cache is saved into $PMCacheDir: aggregatorname.DAT aggregatorname.IDX Upon next run, files are overwritten with new cache information Example: When triggered, PowerCenter Server will save new MTD totals. Upon next run (new totals), Server will subtract old totals; difference will be passed forward Best Practice is to copy these files in case a rerun of data is ever required. Reinitialize when no longer needed, e.g. – at the beginning new month processing
7
Joiner Transformation By the end of this section you will be familiar with: When to use a Joiner Transformation Homogeneous Joins Heterogeneous Joins Joiner properties Joiner Conditions Nested joins
8
Homogeneous Joins Joins that can be performed with a SQL SELECT statement: Source Qualifier contains a SQL join
Tables on same database server (or are synonyms) Database server does the join “work” Multiple homogenous tables can be joined
9
Heterogeneous Joins Joins that cannot be done with a SQL statement: An Oracle table and a Sybase table
Two Informix tables on different database servers Two flat files A flat file and a database table
10
Joiner Transformation Performs heterogeneous joins on records from different databases or flat file sources Active Transformation Connected Ports • All input or input / output • “M” denotes port comes from master source Specify the Join condition Usage • Join two flat files • Join two tables from different databases • Join a flat file with a relational table
11
Joiner Conditions
Multiple join conditions are supported
12
Joiner Properties Join types: • “Normal” (inner) • Master outer • Detail outer • Full outer Set Joiner Cache
Joiner can accept sorted data (configure the join condition to use the sort origin ports)
13
Mid-Mapping Join The Joiner does not accept input in the following situations:
Both input pipelines begin with the same Source Qualifier Both input pipelines begin with the same Normalizer Both input pipelines begin with the same Joiner Either input pipeline contains an Update Strategy
14
Sorter Transformation Can sort data from relational tables or flat files Sort takes place on the Informatica Server machine Multiple sort keys are supported The Sorter transformation is often more efficient than a sort performed on a database with an ORDER BY clause
15
Lookup Transformation By the end of this section you will be familiar with: Lookup principles Lookup properties Lookup conditions Lookup techniques Caching considerations
16
How a Lookup Transformation Works For each Mapping row, one or more port values are looked up in a database table If a match is found, one or more table values are returned to the Mapping. If no match is found, NULL is returned Lookup value(s)
Lookup transformation Return value(s)
17
Lookup Transformation Looks up values in a database table and provides data to other components in a Mapping Passive Transformation Connected / Unconnected Ports • Mixed • “L” denotes Lookup port • “R” denotes port used as a return value (unconnected Lookup only) Specify the Lookup Condition Usage • Get related values • Verify if records exists or if data has changed
18
Lookup Properties Override Lookup SQL option Toggle caching Native Database Connection Object name
19
Additional Lookup Properties Set cache directory
Make cache persistent
Set Lookup cache sizes
20
Lookup Conditions Multiple conditions are supported
21
To Cache or not to Cache? Caching can significantly impact performance Cached • Lookup table data is cached locally on the Server • Mapping rows are looked up against the cache • Only one SQL SELECT is needed
Uncached • Each Mapping row needs one SQL SELECT
Rule Of Thumb: Cache if the number (and size) of records in the Lookup table is small relative to the number of mapping rows requiring lookup 22
Target Options By the end of this section you will be familiar with: Row type indicators Row operations at load time Constraint-based loading considerations Rejected row handling options
23
Target Properties Session Task Select target instance Row loading operations Error handling
Properties Tab 24
Constraint-based Loading Maintains referential integrity in the Targets pk1 fk1, pk2
fk2
pk1 fk1, pk2 fk2
Example 1 With only One Active source, rows for Targets 1-3 will be loaded properly and maintain referential integrity
Example 2 With Two Active sources, it is not possible to control whether rows for Target 3 will be loaded before or after those for Target 2
The following transformations are ‘Active sources’: Advanced External Procedure, Source Qualifier, Normalizer, Aggregator, Sorter, Joiner, Rank, Mapplet (containing any of the previous transformations)
25
Update Strategy Transformation By the end of this section you will be familiar with: Update Strategy functionality Update Strategy expressions Refresh strategies Smart aggregation
26
Update Strategy Transformation Used to specify how each individual row will be used to update target tables (insert, update, delete, reject) Active Transformation Connected Ports • All input / output Specify the Update Strategy Expression Usage • Updating Slowly Changing Dimensions • IIF or DECODE logic determines how to handle the record 27
Target Refresh Strategies Single snapshot: Target truncated, new records
inserted
Sequential snapshot: new records inserted
Incremental: Only new records are inserted.
Records already present in the target are ignored
Incremental with Update: Only new records are
inserted. Records already present in the target are updated
28
Router Transformation Rows sent to multiple filter conditions
Active Transformation Connected Ports • All input/output • Specify filter conditions for each Group Usage • Link source data in one pass to multiple filter conditions
29
Router Transformation in a Mapping
30
Parameters and Variables By the end of this section you will understand: System Variables Creating Parameters and Variables Features and advantages Establishing values for Parameters and Variables
31
System Variables SYSDATE
Provides current datetime on the Informatica Server machine • Not a static value
$$$SessStartTime
Returns the system date value as a string. Uses system clock on machine hosting Informatica Server • format of the string is database type dependent • Used in SQL override • Has a constant value
SESSSTARTTIME
Returns the system date value on the Informatica Server • Used with any function that accepts transformation date/time data types • Not to be used in a SQL override • Has a constant value
32
Mapping Parameters and Variables Apply to all transformations within one Mapping Represent declared values Variables can change in value during run-time Parameters remain constant during run-time Provide increased development flexibility Defined in Mapping menu Format is $$VariableName or $$ParameterName
33
Mapping Parameters and Variables Sample declarations Userdefined names
Set the appropriate aggregation type
Set optional Initial Value
Declare Variables and Parameters in the Designer Mappings menu
34
Functions to Set Mapping Variables SetCountVariable -- Counts the number of evaluated rows and increments or decrements a mapping variable for each row SetMaxVariable -- Evaluates the value of a mapping variable to the higher of two values SetMinVariable -- Evaluates the value of a mapping variable to the lower of two values SetVariable -- Sets the value of a mapping variable to a specified value
35
Unconnected Lookup Will be physically “unconnected” from other transformations • There can be NO data flow arrows leading to or from an unconnected Lookup Lookup function can be set within any transformation that supports expressions Lookup data is called from the point in the Mapping that needs it
Function in the Aggregator calls the unconnected Lookup 36
Conditional Lookup Technique Two requirements: Must be Unconnected (or “function mode”) Lookup Lookup function used within a conditional statement Condition
Row keys (passed to Lookup)
IIF ( ISNULL(customer_id),:lkp.MYLOOKUP(order_no)) Lookup function
Conditional statement is evaluated for each row Lookup function is called only under the pre-defined condition
37
Conditional Lookup Advantage Data lookup is performed only for those rows which require it. Substantial performance can be gained EXAMPLE: A Mapping will process 500,000 rows. For two percent of those rows (10,000) the item_id value is NULL. Item_ID can be derived from the SKU_NUMB.
IIF ( ISNULL(item_id), :lkp.MYLOOKUP (sku_numb))
Condition (true for 2 percent of all rows)
Lookup (called only when condition is true)
Net savings = 490,000 lookups 38
Connected vs. Unconnected Lookups CONNECTED LOOKUP
UNCONNECTED LOOKUP
Part of the mapping data flow
Separate from the mapping data flow
Returns multiple values (by linking output ports to another transformation)
Returns one value (by checking the Return (R) port option for the output port that provides the return value)
Executed for every record passing through the transformation
Only executed when the lookup function is called
More visible, shows where the lookup values are used
Less visible, as the lookup is called from an expression within another transformation
Default values are used
Default values are ignored 39
Heterogeneous Targets By the end of this section you will be familiar with: Heterogeneous target types Heterogeneous target limitations Target conversions
40
Definition: Heterogeneous Targets Supported target definition types: Relational database Flat file XML ERP (SAP BW, PeopleSoft, etc.) A heterogeneous target is where the target types are different or the target database connections are different within a single Session Task 41
Step One: Identify Different Target Types
Oracle table
Tables are EITHER in two different databases, or require different (schemaspecific) connect strings One target is a flatfile load
Oracle table
Flat file
42
Step Two: Different Database Connections
The two database connections WILL differ Flatfile requires separate location information
43
Target Type Override (Conversion) Example: Mapping has SQL Server target definitions. Session Task can be set to load Oracle tables instead, using an Oracle database connection.
Only the following overrides are supported: Relational target to flat file target Relational target to any other relational database type SAP BW target to a flat file target
CAUTION: If target definition datatypes are not compatible with datatypes in newly selected database type, modify the target definition
44
Mapplet Designer
Mapplet Designer Tool
Mapplet Transformation Icons
Mapplet Output Transformation 45
Mapplet Advantages Useful for repetitive tasks / logic Represents a set of transformations Mapplets are reusable Use an ‘instance’ of a Mapplet in a Mapping Changes to a Mapplet are inherited by all instances Server expands the Mapplet at runtime
46
Active and Passive Mapplets Passive Mapplets contain only passive transformations Active Mapplets contain one or more active transformations
CAUTION: changing a passive Mapplet into an active Mapplet may invalidate Mappings which use that Mapplet • Do an impact analysis in Repository Manager first
47
Using Active and Passive Mapplets
Passive
Active
Multiple Passive Mapplets can populate the same target instance
Multiple Active Mapplets or Active and Passive Mapplets cannot populate the same target instance
48
Reusable Transformations By the end of this section you will be familiar with: Reusable transformation advantages Reusable transformation rules Promoting transformations to reusable Copying reusable transformations
49
Reusable Transformations Define once - reuse many times Reusable Transformations • • • • •
Can be a copy or a shortcut Edit Ports only in Transformation Developer Can edit Properties in the mapping Instances dynamically inherit changes Be careful: It is possible to invalidate mappings by changing reusable transformations
Transformations that cannot be made reusable • Source Qualifier • ERP Source Qualifier • Normalizer used to read a Cobol data source
50
Promoting a Transformation to Reusable
Place a check in the “Make reusable” box
This action is not reversible
51
Sequence Generator Transformation Generates unique keys for any port on a row
Passive Transformation Connected Ports • Two predefined output ports, NEXTVAL and CURRVAL • No input ports allowed Usage • Generate sequence numbers • Shareable across mappings 52
Sequence Generator Properties
Number of Cached Values
53
Dynamic Lookup By the end of this section you will be familiar with: Dynamic lookup theory Dynamic lookup advantages Dynamic lookup rules
54
Additional Lookup Cache Options
Make cache persistent Cache File Name Prefix • Reuse cache by name for another similar business purpose
Recache from Database • Overrides other settings and Lookup data is refreshed
Dynamic Lookup Cache • Allows a row to know about the handling of a previous row 55
Persistent Caches By default, Lookup caches are not persistent When Session completes, cache is erased Cache can be made persistent with the Lookup properties When Session completes, the persistent cache is stored on server hard disk files The next time Session runs, cached data is loaded fully or partially into RAM and reused Can improve performance, but “stale” data may pose a problem 56
Dynamic Lookup Cache Advantages When the target table is also the Lookup table, cache is changed dynamically as the target load rows are processed in the mapping New rows to be inserted into the target or for update to the target will affect the dynamic Lookup cache as they are processed Subsequent rows will know the handling of previous rows Dynamic Lookup cache and target load rows remain synchronized throughout the Session run 57
Update Dynamic Lookup Cache NewLookupRow port values • 0 – static lookup, cache is not changed • 1 – insert row to Lookup cache • 2 – update row in Lookup cache
Does NOT change row type Use the Update Strategy transformation before or after Lookup, to flag rows for insert or update to the target Ignore NULL Property • Per port • Ignore NULL values from input row and update the cache using only with non-NULL values from input 58
Example: Dynamic Lookup Configuration Router Group Filter Condition should be: NewLookupRow = 1
This allows isolation of insert rows from update rows 59
Concurrent and Sequential Workflows By the end of this section you will be familiar with: Concurrent Workflows Sequential Workflows Scheduling Workflows Stopping, aborting, and suspending Tasks and Workflows
60
Multi-Task Workflows - Sequential Tasks can be run sequentially:
Tasks shows are all Sessions, but they can also be other Tasks, such as Commands, Timer or Email Tasks
61
Multi-Task Workflows - Concurrent Tasks can be run concurrently:
Tasks shows are all Sessions, but they can also be other Tasks such as Commands, Timer or Email Tasks. 62
Multi-Task Workflows - Combined Tasks can be run in a combination concurrent and sequential pattern within one Workflow:
Tasks shows are all Sessions, but they can also be other Tasks such as Commands, Timer or Email Tasks 63
Additional Transformations By the end of this section you will be familiar with: The Rank transformation The Normalizer transformation The Stored Procedure transformation The External Procedure transformation The Advanced External Procedure transformation
64
Rank Transformation Filters the top or bottom range of records
Active Transformation Connected Ports • Mixed • One pre-defined output port RANKINDEX • Variables allowed • Group By allowed Usage • Select top/bottom • Number of records 65
Normalizer Transformation Normalizes records from relational or VSAM sources
Active Transformation Connected Ports • Input / output or output Usage • Required for VSAM Source definitions • Normalize flat file or relational source definitions • Generate multiple records from one record
66
Normalizer Transformation Turn one row YEAR,ACCOUNT,MONTH1,MONTH2,MONTH3, … MONTH12 1997,Salaries,21000,21000,22000,19000,23000,26000,29000,29000,34000,34000,40000,4500 0 1997,Benefits,4200,4200,4400,3800,4600,5200,5800,5800,6800,6800,8000,9000 1997,Expenses,10500,4000,5000,6500,3000,7000,9000,4500,7500,8000,8500,8250
Into multiple rows
67
Stored Procedure Transformation Calls a database stored procedure
Passive Transformation Connected/Unconnected Ports • Mixed • “R” denotes port will return a value from the stored function to the next transformation Usage • Perform transformation logic outside PowerMart / PowerCenter 68
External Procedure Transformation (TX) Calls a passive procedure defined in a dynamic linked library (DLL) or shared library
Passive Transformation Connected/Unconnected Ports • Mixed • “R” designates return value port of an unconnected transformation Usage • Perform transformation logic outside PowerMart / PowerCenter
Option to allow partitioning
69
Advanced TX Transformation Calls an active procedure defined in a dynamic linked library (DLL) or shared library
Active Transformation Connected Mode only Ports • Mixed Usage • Perform transformation logic outside PowerMart / PowerCenter • Sorting, Aggregation Option to allow partitioning
70
Transaction Control Transformation Allows custom commit types (source- or targetbased) and user-defined conditional commits
Passive Transformation Connected Mode Only Ports • Input and Output Properties • Continue • Commit Before • Commit After • Rollback Before • Rollback After 71
Transaction Control Functionality Commit Types • Target Based Commit Commit Based on “approximate” number of records written to target • Source Based Commit – Ensures that a source record is committed in all targets • User Defined Commit – Uses Transaction Control Transform to specify commits and rollbacks in the mapping based on conditions Set the Commit Type (and other specifications) in the Transaction Control Condition 72
Versioning View Object Version Properties Track Changes to an Object Check objects “in” and “out” Delete or Purge Object version Apply Labels and Run queries Deployment Groups
73
Informatica Business Analytics Suite Modular Plug-&-Play Approach
Packaged Analytic Solutions
Custom Built Analytic Solutions 74
Informatica Warehouses / Marts Informatica Warehouse™ Customer Relationship
Finance
Human Resources
Supply Chain
Sales
G/L
Compensation
Planning
Marketing
Receivables
Scorecard
Sourcing
Service
Payables
Inventory
Web
Profitability
Quality
Common Dimensions Customer
Product
Organization
Supplier Time
Geography
Employee 75
Inside the Informatica Warehouse Business Intelligence
Business Adapters™ (Extract)
• •
Analytic Bus™ (Transform)
• •
Transform
Warehouse Loader™ Analytic Bus™
Extract
Load
Informatica Warehouse™ Advanced Analytic Calculation Data Model Engine
Business Adapters™
ORCL
i2
SEBL
PSFT Custom
Industry Best Practice Metrics Process-centric model & conformed dimensions
Advanced Calculation Engine
• SAP
Type I, II slowly changing dimensions History and changed record tracking
Analytic Data Model
• •
Transaction consolidation and standardization Source independent interface
Warehouse Loader (Load)
• •
Data Source Connectivity with Minimal Load Structural/Functional Knowledge of Sources
•
Pre-aggregations for rapid query response Complex calculation metrics (e.g. statistical)
76
PowerConnect Products Family of enterprise software products that allow companies to directly source and integrate ERP, CRM, real-time message queue, mainframe, AS/400, remote data and metadata with other enterprise data
PowerConnect for MQSeries (real time) PowerConnect for TIBCO (real time) PowerConnect for PeopleSoft PowerConnect for SAP R/3 PowerConnect for SAP BW PowerConnect for Siebel PowerConnect for Mainframe PowerConnect for AS/400 PowerConnect for Remote Data PowerConnect SDK 77
78