`Performance Tuning Performence Tuning of Informatica Workflows; covering areas like 1) Code enhancements - Redesign or re develop the identified ETL mappings or sessions. 2) Sessions 3) Mapping Components 4) Performance Tuning of DB parameters DB changes like schema changes, table alters and new table creations 5) Reconfigure Informatica setting for optimal data load process 6) Lay out Standard ETL development practices for ongoing project 7) Performance Tuning of the Hardware and Network Identify Potential Performance Improvements of Sessions Analysis of the jobs: This involves analysing the Sessionlogs of each workflow. Analysis of Session log includes: Analysis of session log before the tuning and after the tuning Analysis of Reader, Transformation and Writer busy % for given session Verify whether Unicode is set for Informatica Server Code Check DTM buffer pool size block size for optimal setup Check whether cache size is sufficient for each of the transformation eg lookup, aggregator etc. Understand if High Precision is enabled at session property level Analysis of session log for transformation errors and warnings Analysis of time taken for Source Qualifier and Filter Transformation Analysis of rejected rows and SQL errors Inputs and analysis of team members who worked on Performance Tuning Let me eloborate all the components one by one.. Performance Tuning Methodology Project team will be following best practices in performance tuning developed over a time by Satyam Computers. The proposed way to tune session performance is to first eliminate a performance bottleneck, and then identify the next performance bottleneck until for performance tuning. The process needs to continue until ETL session reached to optimal level of performance and it is cost effective to tune the ETL sessions. The project team will change one variable at a time, and time the session both before and after the change. If session performance does not improve, we change the variable back to the original configuration.
Following tasks will be undertaken to improve session performance: Optimize the target Optimize the source Optimize the mapping Optimize the transformation Optimize the session. Partition the sessions 1)Optimize the Target To identify a target bottleneck i.e. write. Team will copy the session and change the target to a flat file target instead of a relation database. If the session performance increases significantly when writing to a flat file, then conclusion is made that the target has performance bottleneck. 2)Dropping Indexes and Key Constraints Write will be slowed down if the target table has indexes or key constraints. All the indexes and key constraints will be dropped before running the session. Pre-Session SQL and Post-Session SQL will be used to drop and recreate indexes and key constraints. 3)Increasing Database Checkpoint Intervals To increase performance, team will increase the database checkpoint interval / commit interval based on the volume of data. 4)Using Bulk Loads Project team will explore the possibility of using Oracle Bulk Load to improve the performance of a session that inserts a large amount of data into an Oracle database. We may not be able to perform recovery if there is any job failure if we use the Bulk Load option. 5)Increasing Database Network Packet Size To improve the performance, Project DBA will study the possibility of increasing the network packet size to allow larger packets of data to cross the network at one time. 6)Optimize the Source The project will use the below methods to identify source bottlenecks: 7)Using a Filter Transformation We will add a Filter transformation in the mapping after each source qualifier. And set the filter condition to false so that no data is processed passed the filter transformation. If there is no significant performance improvement, then we conclude the source has performance bottleneck. 8)Using a Read Test Mapping We will connect the source qualifiers to a file target without any transformations and run the read test mapping. If the session performance is similar to the original session, then the source has the performance bottleneck. 9)Using a Database Query Project team will copy the entire read query directly from the session log and execute the query against the source database with a query tool such as TOAD, Oracle SQL Developer. Project DBA will measure the query execution time and the time it takes for the query to return the first row. If there is a long delay between the two time
measurements, we can use an optimizer hint to eliminate the source bottleneck. 10)Optimizing the Query Project teams will check whether the index is created based on ORDER BY or GROUP BY or WHERE clause. Once the query is optimized, project ETL team will use the SQL override option to update these modifications. 11)Using Conditional Filters We need not load the entire table to the transformation provided the business logic permits. In that case, we will filter the source data on the source database. 13)Increasing Database Network Packet Size To improve the performance, Project DBA will study the possibility of increasing the network packet size to allow larger packets of data to cross the network at one time. 14)Optimize the Mapping ETL team will add a Filter transformation before each target definition and set the filter condition to false so that no data is loaded into the target tables. If the time it takes to run the new session is the same as the original session, then mapping has bottleneck. 15)Configuring Single-Pass Reading Single-pass reading allows us to populate multiple targets with one source qualifier. If there are multiple sessions that use the same sources, then ETL team will the explore the possibility of combing the transformation logic for each mapping in one mapping and use one source qualifier for each source. 16)Optimizing Filters We can filter the data using the below methods We will make use of the Source Qualifier transformation filter rows from relational sources. We will use the Filter transformation filters data within a mapping. We will use Router transformation to drop rejected rows from an Update Strategy transformation if we do not need to keep rejected rows. 17)Optimizing Data type Conversions We can increase performance by eliminating unnecessary data type conversions. We will use the following data type conversions to improve system performance: ETL team will explore the possibility of using integer values in place of other data types when performing comparisons using Lookup and Filter transformations. ETL team will explore the possibility of converting the source dates to strings through port-to-port conversions to increase session performance. Optimize the Transformations We can optimize the below transformation for better performance. 18)Optimizing Aggregator Transformations ETL team will use the below techniques to optimize the performance of an Aggregator transformation: Group by simple columns. Use sorted input. Use incremental aggregation (If possible) Filter data before you aggregate it. Limit port connections. 19)Optimizing Joiner Transformations ETL team will follow the below steps to optimize the Joiner
We will use database joins when possible. Performing a join in a database is faster than performing a join in the session. We will use sorted data for Join if possible. We can improve session performance by configuring the Joiner transformation to use sorted input. We will designate the master source as the source with the fewer rows 20)Optimizing Lookup Transformations We can optimize the below properties to improve the lookup performance: Cache lookup tables. Index the lookup table. Optimize multiple lookups. 21)Caching Lookup Tables If a mapping contains Lookup transformations, we will enable lookup caching if this option is not enabled We will use a persistent cache to improve performance of the lookup whenever possible. We will explore the possibility of using concurrent caches to improve session performance. We will use the Lookup SQL Override option to add a WHERE clause to the default SQL statement if it is not defined We will add ORDEDR BY clause in lookup SQL statement if there is no order by defined. We will use SQL override to suppress the default ORDER BY statement and enter an override ORDER BY with fewer columns. Indexing the Lookup Table We can improve performance for the following types of lookups: For cached lookups, we will index the lookup table using the columns in the lookup ORDER BY statement. For Uncached lookups, we will Index the lookup table using the columns in the lookup where condition. 22)Optimizing Multiple Lookups We will tune the Lookup transformations that query the largest amounts of data to improve overall performance. By doing that we can reduce the number of lookups on the same table 23)Optimizing Sequence Generator Transformations We can optimize Sequence Generator transformations by creating a reusable Sequence Generator and using it in multiple mappings simultaneously. 24)Optimizing Sorter Transformations We will use the following formula to determine the size of incoming data: ~ # Input rows ([Sum (column size)] + 16)
25)Optimizing Source Qualifier Transformations We will use the Select Distinct option for the Source Qualifier transformation to select unique values from a source whenever possible. 26)Eliminating Transformation Errors Transformation errors slow down the performance. Informatica pauses to determine the cause of the error and to remove the row causing the error from the data flow. Next, writes the row into the session log file. ETL team will fix the Transformation Errors to improve the session performance. 27)Optimize the Session We observed that most of the session has small cache size, low buffer memory, and small commit intervals, which are causing session bottlenecks. 28)Run Concurrent Sessions and Workflows Project team will identify the number of concurrent session currently set in production environment. We will study the possibility of increasing the number of concurrent sessions and Load Manager Shard Memory. We will calculate the LMSM based on the below formula and set it accordingly. Load Manager Shared memory = 200KB X number of concurrent session. Allocating Buffer Memory (DTM) Unicode data may require lager buffer pool size. We will make use of the below formula to calculate DTM buffer pool size and DTM block size Session buffer blocks = [(total number of sources + total number of targets) * 2] (Session Buffer Blocks) = (.9) * (DTM Buffer Size) / (Default Buffer Block Size) * (Number of partitions) 26)Disabling High Precision If the business logic doesn’t require high precision, then we will disable high precision to improve session performance. 27)Reducing Error Tracing To improve performance, we will make sure Error Tracing level should be set to ‘Normal’ 28)Partition the Session We can use multiple pipeline partitions and database partitions. However, to improve performance, we have to ensure the number of pipeline partitions equals the number of database partitions. When the number of partitions is increased, DTM buffer size should be increased accordingly. If the session contains n partitions, increase the DTM buffer size to at least n times the value for the session with one partition.
This option requires informatica optional license to implement partition. This will be the last option to improve the performance, as it requires more time and more effort. In the above posts I used the term 'WE' many times. Please note that this is the initial scope of work defined by me while submitting the SOW to the client.