Testing Dw Etl Perspective

  • May 2020
  • PDF

This document was uploaded by user and they confirmed that they have the permission to share it. If you are author or own the copyright of this book, please report to us by using this DMCA report form. Report DMCA


Overview

Download & View Testing Dw Etl Perspective as PDF for free.

More details

  • Words: 1,258
  • Pages: 2
Testing Data Warehouse Applications− the ETL Perspective Abstract: This paper takes a look at the different strategies to test a data warehouse application. It attempts to suggest various approaches that could be beneficial while testing the ETL process in a DW. A data warehouse is a critical business application and defects in it results is business loss that cannot be accounted for. Here, we walk you through some of the basic phases and strategies to minimize defects.

i. ii.

iii. iv. v.

Introduction: This is an era of global competition and ignorance is one of the greatest threats to modern business. As such organizations across the globe are relying on IT services for strategic decision-making. A data warehouse implementation is one such tool that comes to the rescue. Given the criticality of a DW1 application, a defect-free DW implementation is a dream come true for any organization. As QA2 and testing personnel, our role is to ensure this thereby leading to maximized profits, better decisions and customer satisfactions. A bug in the system traced at a later stage not only increases the cost associated with rework, but also associates with it the use of incorrect data to make strategic decisions. Hence, pre-implementation defect detection should be ensured. In light of the above discussion, let us take a look into the various strategies involved in the testing cycle for a DW application.

The DW Testing Life Cycle: As with any other piece of software a DW implementation undergoes the natural cycle of Unit testing, System testing, Regression testing, Integration testing and Acceptance testing. However, unlike others there are no off-the-shelf testing products available for a DW. Unit testing: Traditionally this has been the task of the developer. This is a white-box testing to ensure the module or component is coded as per agreed upon design specifications. The developer should focus on the following: a) That all inbound and outbound directory structures are created properly with appropriate permissions and sufficient disk space. All tables used during the ETL3 are present with necessary privileges. b) The ETL routines give expected results:

vi. vii.

All transformation logics work as designed from source till target Boundary conditions are satisfied− e.g. check for date fields with leap year dates Surrogate keys have been generated properly NULL values have been populated where expected Rejects have occurred where expected and log for rejects is created with sufficient details Error recovery methods Auditing is done properly

c) That the data loaded into the target is complete: i. All source data that is expected to get loaded into target, actually get loaded− compare counts between source and target and use data profiling tools ii. All fields are loaded with full contents− i.e. no data field is truncated while transforming iii. No duplicates are loaded iv. Aggregations take place in the target properly v. Data integrity constraints are properly taken care of System testing: Generally the QA team owns this responsibility. For them the design document is the bible and the entire set of test cases is directly based upon it. Here we test for the functionality of the application and mostly it is black-box. The major challenge here is preparation of test data. An intelligently designed input dataset can bring out the flaws in the application more quickly. Wherever possible use production-like data. You may also use data generation tools or customized tools of your own to create test data. We must test for all possible combinations of input and specifically check out the errors and exceptions. An unbiased approach is required to ensure maximum efficiency. Knowledge of the business process is an added advantage since we must be able to interpret the results functionally and not just code-wise. The QA team must test for: i. Data completeness− match source to target counts

ii.

iii. iv. v.

Data aggregations− match aggregated data against staging tables and/or ODS4 Granularity of data is as per specifications Error logs and audit tables are generated and populated properly Notifications to IT and/or business are generated in proper format

terms of business. Also the load windows, refresh period for the DW and the views created should be signed off from users. Performance testing: In addition to the above tests a DW must necessarily go through another phase called performance testing. Any DW application is designed to be scaleable and robust. Therefore, when it goes into production environment, it should not cause performance problems. Here, we must test the system with huge volume of data. We must ensure that the load window is met even under such volumes. This phase should involve DBA team, and ETL expert and others who can review and validate your code for optimization.

Regression testing: A DW application is not a one-time solution. Possibly it is the best example of an incremental design where requirements are enhanced and refined quite often based on business needs and feedbacks. In such a situation it is very critical to test that the existing functionalities of a DW application are not messed up whenever an enhancement is made to it. Generally this is done by running all functional tests for existing code whenever a new piece of code is introduced. However, a better strategy could be to preserve earlier test input data and result sets and running the same again. Now the new results could be compared against the older ones to ensure proper functionality.

Finally a few words of caution to end with.

Integration testing: This is done to ensure that the application developed works from an end-toend perspective. Here we must consider the compatibility of the DW application with upstream and downstream flows. We need to ensure for data integrity across the flow. Our test strategy should include testing for: i. Sequence of jobs to be executed with job dependencies and scheduling ii. Re-startability of jobs in case of failures iii. Generation of error logs iv. Cleanup scripts for the environment including database This activity is a combined responsibility and participation of experts from all related applications is a must in order to avoid misinterpretation of results.

Acronyms:

Acceptance testing: This is the most critical part because here the actual users validate your output datasets. They are the best judges to ensure that the application works as expected by them. However, business users may not have proper ETL knowledge. Hence, the development and test team should be ready to provide answers regarding ETL process that relate to data population. The test team must have sufficient business knowledge to translate the results in

Testing a DW application should be done with a sense of utmost responsibility. A bug in a DW traced at a later stage results in unpredictable losses. And the task is even more difficult in the absence of any single end-to-end testing tool. So the strategies for testing should be methodically developed, refined and streamlined. This is also true since the requirements of a DW are often dynamically changing. Under such circumstances repeated discussions with development team and users is of utmost importance to the test team. Another area of concern is test coverage. This has to be reviewed multiple times to ensure completeness of testing. Always remember, a DW tester must go an extra mile to ensure near defect free solutions.

1. 2. 3. 4.

DW− Data Warehouse QA− Quality Assurance ETL− Extraction, Transformation and Loading ODS− Operational Data Store

References: 1. Data Warehousing− Soumendra Mohanty 2. Strategies for testing data warehouse applications− Jeff Theobald, DW review Magazine, June 2007 issue 3. The Data Warehouse Toolkit− Ralph Kimball

Related Documents

Dw
November 2019 23
Dw
November 2019 26
Etl Architecture
October 2019 13
Etl Presupuesto.xlsx
June 2020 7