Transformations Source Qualifier ♦ ♦ ♦
♦
Passive Connected
Homogenous Sources SQL Overwrite Properties • SQL Query • User defined Joins • Source filter • No. Of Sorted Ports • Tracing level • Select distinct Optimization • Use the Source Qualifier to filter. The Source Qualifier limits the row set extracted from the source whereas Filter limits the row set sent to a target.
Filter : ♦ ♦ ♦ ♦ ♦ ♦
:
Active Connected
Limits row set to target Any Sources Properties Filter condition drops data that does not match the condition Tracing level Optimization • Use the Filter Transformation early in the mapping • The Filter Condition is Case-sensitive, and queries in some databases do not take this into account
Aggregator ♦ ♦ ♦
♦
♦ ♦
♦
♦
Active Connected
Performs calculation on groups unlike an Expression Transformation which performs row by row basis Can use conditional clauses to filter records, providing more flexibility than SQL language Properties • Cache directory • Tracing level • Sorted Input Components • Aggregate Expression • Group by port • Sorted I/P option • Aggregate Cache Ports Input, Output, and Value Optimization • Filter before aggregating • Use Sorted Input to decrease the use of aggregate caches Cannot use Sorted Input if any of the following condition is True • The aggregate expression contains nested aggregate functions. • The session uses incremental aggregation. • Source data is data-driven. • The mapping is upgraded from PowerMart 3.5.
Expression ♦ ♦
:
:
Passive Connected
Row by Row calculation Multiple Expressions can be entered in a single Expression Transformation Properties • Tracing level
♦
Ports
Router ♦ ♦ ♦
♦
Input, Output, and Value
:
Processing the incoming data only once and passes the output to multiple groups and routes data to the default o/p group that do not meet the condition Properties • Tracing level Components • Input and Output groups • Input and Output ports • Group filter conditions Tips • One Group can be connected to One transformation or target • One Output port in a group can be connected to multiple transformations or targets • Multiple Output ports in one group can be connected to multiple transformations or targets • More than one group Cannot be connected to one transformation or target
Sequence Generator ♦ ♦ ♦ ♦
Active Connected
:
Passive Connected
Create Unique Primary key values Replace missing values Cycle through a sequential range of numbers Properties • Start value • Increment by • End value • Current value
♦
• Cycle • Number of Cached values • Reset • Tracing level Optimization • Use Reusable Sequence Generator if the same Sequence generator is to be used in more than one sessions
Rank : ♦ ♦
♦
Active Connected
Select only the Top or Bottom Rank of data Properties • Cache directory • Top / Bottom • Number of Ranks • Case Sensitive Sting Comparison • Tracing level Ports Input -- minimum of one Output -- minimum of one Variable – stores values or calculations to use in an expression Rank – Only one
Update Strategy ♦
♦ ♦
:
Active Connected
Sets at 2 levels Within a Session – treat all records in the same way (for example, treat all records as Inserts) Within a Mapping – flag records for Insert, Update, Delete or Reject Data driven Properties
• • •
Joiner : ♦ ♦ ♦
♦ ♦
♦
•
Update Strategy Expression Forward Rejected Rows Tracing level
Active Connected
Joining data from Heterogeneous Sources Only two Sources with one joiner Properties • Case Sensitive String Comparison • Cache directory • Join Condition • Joiner Type (Normal, Master Outer, Detail Outer, Full Outer) • Null ordering in Master • Null ordering in Detail • Tracing level Ports Input, Output, M (to switch the Master Details relationship for the Sources) The following types of Sources can be used in a Joiner • Two relational tables existing in separate databases • Two flat files in potentially different file systems • Two different ODBC sources • Two instances of the same XML source • A relational table and a flat file source • A relational table and an XML source A Joiner cannot contain the following types of Sources • Both pipelines begin with the same original data source • Both input pipelines originate from the same Source Qualifier transformation • Both input pipelines originate from the same Normalizer transformation
• •
Lookup ♦ ♦ ♦ ♦
♦
♦
Both input pipelines originate from the same Joiner transformation Either input pipelines contains an Update Strategy transformation Either input pipelines contains a connected or unconnected Sequence Generator transformation
:
Passive Connected / Unconnected
Get a related value Perform a Calculation Update slowly changing dimension tables Properties • SQL Override • Lookup Table name • Lookup caching enabled • Lookup policy on multiple match • Lookup condition • Location information • Source Type • Recache if Stale • Tracing level • Lookup cache directory name • Lookup cache initialize Ports Input – minimum of one Output – minimum of one Lookup – minimum of one Return – Only one Components • Lookup table • Ports • Properties • Condition
♦
Lookup Caches • Persistent Cache • Recache from Database • Static Cache • Dynamic Cache • Shared Cache
Stored Procedure ♦
♦
♦
♦
:
Passive Connected / Unconnected
Stored Procedure Transformation can be used to • Drop and recreate Indexes • Check the status of the Target database before moving Records into it • Determine if enough exists in a database • Perform a specialized calculation Options for running a Stored Procedure Transformation • Normal • Pre-load of the Source • Post-load of the Source • Pre-load of the Target • Post-load of the Target Properties • Stored Procedure Name • Connection Information • Call Text • Stored Procedure Type • Execution Order • Tracing level Ports Input, Output and Return
Normalizer
:
Active Connected
♦ ♦ ♦ ♦
Normalizer is designed to handle data read from COBOL sources Can also be used to denormalize data from any type of Sources Can also be used to handle multiple levels of denormalization in the Same record Properties • Reset • Restart • Tracing level SCD: • Type I – No history, overwrite existing dimensions • Type II – Keeping Versions • Version No. • Flag • Date Range • Type III – Stores current & previous values Tracing Levels • • • •
Normal – logs Initialization, errors, stopped rows, Summarized session results – not allowed Terse – logs Initialization Info, error notification of Rejected data Verbose Init – Normal & addl. Initialization names of Index & data files. Detailed transformation statistics Verbose Data -- verbose Init + logs each row passing Info mapping & default transformation statistics. Truncates string data to column size.
Directories of Server Variables – Inside root directory • SessLog • BadFiles • Cache • SrcFiles
• • •
TgtFiles Temp ExtProc
Output Files • Server Log • Session Log • Session detail • Reject File • Control File • Output File • Cache File • Post Session / Pre Session Email Load Manager & DTM LM:
• •
Writer Transformation
Following Transformations using Caches • • • •
Aggregator Joiner Lookup Rank
3 Parameters LM allocates to Sessions • • •
Max Sessions LM Shared Memory Keep repeating session stored in memory (be default disabled)
PMCMD • • • • •
Locks & reads Sessions, Parameters & variables Verify permissions and privileges Creates Session log file Creates DTM Process Sends Post-Session email
• • • • •
Allocate DTM Process memory Initialize Sessions and fetches mapping Execute Pre-session commands Creates threads Writes to repository
DTM:
Threads: • • •
Master thread Mapping thread – 1 thread for each session Pre & Post Sessions threads
Pmcmd Start <Username> <Password> TCP/IP: :<Sessionname>/:pf (parameter file):Session flag Wait flag Session flag: 1 – Session 0 – batch Wait flag: 1 – Session Complete 0 – Request Sent Mapping Optimizations • • •
Reduce No. of Transformations Reduce No. of expressions Delete unnecessary links between transformations
• • •
Filter before aggregator For transformations using cache reduce connected ports to reduce data in data cache Optimize data type conversion
Session Optimizations • Run Concurrent batches • Partition sessions • Reduce error tracing • Remove staging areas • Tune session parameters Target Database Optimization • • • • • •
Drop Indexes & Key Constraints Drop Checkpoints Use bulk loading Use external loading Turn Off Recovery Turn Off db network packet size
Session Properties Tab •
• •
General Tab • General Options • Source Options • Target Options • Session Commands • Email • Performance Options • Commit interval • Event bases Scheduling Time Tab Error handling
•
Transformations • Session level override • Aggregate behavior
Performance Tunings
•
Optimizing the Source Database •
•
Flat file Source By default, the Informatica Server reads 1024 bytes per line. The Session Performance can be improved by setting the number of bytes the Informatica Server reads per line. Relational file Source Suggestions to improve the Session Performance • Optimize the query Using Hints, using GROUP BY ORDER BY clauses, using Parallel queries • Create tempdb as in-memory database • Use conditional filters • Increase database network packet size Allowing larger packets of data to cross the network at one time For Oracle, increase the packet size in listner.ora and tnsnames.ora • Connect to Oracle databases using IPC protocol Set up Oracle database Connection in listner.ora and tnsnames.ora
Optimizing the Target Database • • • • • •
Drop Indexes and key constraints Increase Checkpoint Intervals Use bulk loading For Sybase and MS SQL Server Use external loading For Teradata, Oracle and Sybase IQ Turn Off Recovery Increase database network packet size
Optimize Oracle target database Checking the storage clause, space allocation, and rollback segments
Optimizing the Mapping •
Reduce the number of transformations in mappings and delete unnecessary links between transformations
•
Limit the connected input/output or output ports for the transformations that use data cache (Aggregator, Joiner, Lookup, and Rank). Limiting the number of input/output or output ports reduces the amount of data the transformation store in the data cache.
•
Configure single pass reading
•
Optimizing datatype conversions Eliminate unnecessary datatype conversions from mapping Use Integer values in place of other datatypes when performing Comparisons using Lookup and Filter transformations
•
Eliminating Transformation Errors Occurs when the Informatica Server encounters Conversion error, Conflicting mapping logic and any condition set up as an error, such as null input
•
Optimizing Lookup Transformations Reducing the Number of Cached Rows by using the WHERE clause in the Lookup SQL Override Option Optimizing the Lookup condition by placing the conditions with an equal sign first, if there are more than one filter condition indexing the columns in the Lookup ORDER BY and the columns in the condition Using Shared or Persistent Cache
•
Optimizing Filter Transformations
Use Filter transformation early in the mapping Avoid using complex expressions, instead of using simple integer or true/false expressions in the filter condition
•
Optimizing Aggregator Transformations Group by simple Columns like numbers instead of string or date Use sorted input Use incremental aggregation
•
Optimizing Joiner Transformation Though Joiner transformation needs a data cache to hold the master table rows and the index cache to hold the join columns from the master table, make sure that the system is having enough memory to hold the data and the index cache Normal joins are faster than outer joins and result in fewer tows, when possible use database joins for homogeneous sources
•
Optimizing Sequence Generator Transformation Creating a Reusable Sequence Generator and use it in multiple mappings simultaneously Number of Cached values should be greater than 1000
•
Optimizing Expressions Factoring out common logic Minimizing aggregator function calls. For e.g. Use SUM(A+B) instead of using SUM(A)+SUM(B) Replacing Common sub expressions with Local variables Choosing Numeric Versus String Operations Choosing DECODE function Versus LOOKUP function Using Operators instead of Functions. For e.g. Use Customers.First_Name ||’ ‘||Customers.Last_Name instead of CONCAT(CONCAT(Customers.First_Name,’ ‘), Customers.Last_name)
Optimizing the Session •
Running concurrent batches If there are Independent sessions that use separate sources and separate mappings to populate different targets, place them in a concurrent batch and run them simultaneously If there is a Complex mapping with multiple sources, separate the mapping into several simpler mappings with separate sources. Then create a session for each mapping and place the sessions in a concurrent batch to run in parallel
•
Partitioning Sessions When partition a Session, the PowerCenter Server opens multiple connections to source and target databases and extracts, transforms, and loads data in parallel pipelines
•
Allocation Buffer memory By default a session has enough buffer blocks for 83 sources and targets, if a session contains more than 83 sources and targets the number of available memory blocks can be increased by adjusting the following session parameters Increase the DTM buffer pool size, the default setting is 12,000,000 bytes Decrease the buffer block size, the default setting is 64,000 bytes To configure these settings, first determine the number of memory blocks the Informatica Server requires initializing the session, then configuring the DTM buffer pool size and the Buffer block size. For e.g. a non-partitioned session using a mapping that contains 50 sources and 50 targets Then the Session requires 200 memory blocks [(total number of sources+total number of targets)*2]=Session buffer blocks 100*2=200
Next, based on default settings determine that DTM buffer pool size can be changed to 15,000,000 or can change the default buffer block size to 54,000 Session buffer blocks=0.9*(DTM buffer pool size)/(default buffer block size)*(number of partitions) 200=0.9*14222222/64000*1 Or 200=0.9*12000000/54000*1
•
Increasing the Cache sizes If the allocated data or index cache size is not large enough to store the data, the Informatica Server stores the data in a temporary disk file as it processes the session data. Each time the Informatica Server pages to the temporary file, Performance slows
•
Increasing the Commit Interval The smaller the Commit Interval, the more often the Informatica Server writes to the target database, and slower the overall performance The Commit interval does not affect the Performance of Bulk load sessions. When the session is in Bulk mode, the target database Commits as necessary, ignoring Commit commands sent by the Informatica Server
•
Turning Off Session Recovery The Informatica Server writes recovery information in the OPB_SRVR_RECOVERY table during each commit. This can decrease the Performance, hence the session Recovery should be turned off in the Informatica Server Setup
• •
Disabling Decimal Arithmetic Reducing Error Tracing
Optimizing the System •
Improve Network Speed If flat file is there as a source or a target, move the file onto the Informatica Server System to improve Performance. When you store flat file on a machine, other than a Informatica Server, session performance is dependent on the performance of the network connections. Moving the files onto the Informatica Server system and adding disk space might improve Performance In the Case of using relational sources or target databases, try to minimize the number of network hops between the source and the target databases and the Informatica Server. Moving the target database onto the Informatica Server system might improve the Informatica Server Performance
•
Using Multiple PowerCenter Servers Running multiple PowerCenter Servers on separate systems against the same repository. Distributing the session load to the separate PowerCenter Server systems increases performance
•
Using additional CPUs