Data Quality Remediation

  • Uploaded by: Xavier Martinez Ruiz
  • 0
  • 0
  • May 2020
  • PDF

This document was uploaded by user and they confirmed that they have the permission to share it. If you are author or own the copyright of this book, please report to us by using this DMCA report form. Report DMCA


Overview

Download & View Data Quality Remediation as PDF for free.

More details

  • Words: 3,031
  • Pages: 9
A DataFlux White Paper Prepared by:

David Loshin

Data Quality Remediation

Leader in Data Quality and Data Integration

www.dataflux.com 877–846–FLUX

International +44 (0) 1753 272 020

Introduction The policies and procedures of data governance are valuable within the organization because they ensure that the quality of enterprise data is maintained at the levels to support successful business activities. The operational procedures are often spelled out within a data quality service level agreement (DQ SLA), which is an agreement between data providers and data consumers about the expected performance levels for data quality. The DQ SLA details the business data quality requirements along all the processing stages in a business process flow, and assertions that can be used to validate the data. However, when errors in the data are identified, the data stewards responsible for the data must take action. This paper reviews the pieces of that immediate action plan: the triage and analysis tasks performed by data quality analysts or data stewards when an issue is identified and logged in the data quality incident tracking system. This includes: •

Evaluating and assessing the issue and determining the scope and extent of the problem from both a business impact perspective and from an operational perspective



Reviewing the information process map to determine the likely locations for the source of introduction of the problem



Determining strategies for correcting the problem



Researching strategies for eliminating its root cause



Planning and applying operational aspects, including data correction, monitoring, and prevention

Evaluating criticality, assessing the frequency and severity of discovered issues, and prioritizing tasks for remediation are all part of the data steward’s role. Formalizing the different tasks to perform when issues of different levels of criticality occur will reduce the effort for remediation while speeding the time to resolution.

The Data Quality Service Level Agreement An emerging trend in the data quality arena is the concept of a DQ SLA, which provides a valuable link between the IT and business sides throughout a data quality or data governance effort. A DQ SLA is a contract between a data provider and a data consumer that specifies the data provider’s responsibilities with respect to different measurable aspects of what is being provided, such as availability, performance, response time for problems, as well as reasonable expectations for response and remediation when data errors and flaws are identified.

1

What Composes the DQ SLA? Within any business process, the DQ SLA lists the expectations regarding measurable aspects relating to one or more dimensions of data quality (such as accuracy, completeness, consistency, timeliness, etc.), along with the specifications regarding conformance to those expectations. The DQ SLA also describes the processes to be initiated when those expectations are not met, especially those related to evaluating the issue, diagnosing its cause, and determining how to solve the problem. Using a DQ SLA is valuable because it formalizes the processes that are put into place for managing the way that the organization deals with emerging data issues. These agreements also suggest ways to track the data issue resolution progress as a way of internalizing lessons learned. If it is determined that data instances do not meet the defined expectations, a data quality incident event is generated, and the appropriate staff members are notified to diagnose and mitigate the issues.

DQ SLAs and Data Issue Severity One aspect of using a DQ SLA is the fact that the data quality expectations are defined in the context of business impacts, and this can provide guidance to the data steward when data issues are reported. Acceptability levels for measured data quality rules can be based on the corresponding financial impacts and the organization’s degree of tolerance to the errors causing those impacts. Acceptability thresholds become the barometer by which severity of issues is measured, and the process of determining those thresholds also contributes the basis for determining severity during issue evaluation and assessment.

Triage: Evaluation and Assessment of the Data Quality Issue There will always be a backlog of issues for review and consideration, created as a byproduct of weighing feasibility and cost effectiveness of a solution against the recognized business impact of the issue. When a data issue has been identified, the evaluation process will take into account these aspects of the identified issue: •

Criticality – the degree to which the business processes are impaired by the existence of the issue



Frequency – how often the issue has appeared



Feasibility of correction – the likelihood of expending the effort to correct the results of the failure



Feasibility of prevention – the likelihood of expending the effort to eliminate the root cause or institute continuous monitoring to detect the issues

The triage process is performed to understand these aspects in terms of the business impact, the size of the problem, as well as the number of individuals or systems affected. Triage enables the data steward to review the general characteristics of the problem and business impacts in preparation for assigning a level of severity and priority.

2

The Prioritization Matrix By its very nature, the triage process must employ some protocols for immediate assessment of any issue that has been identified, as well as prioritize those issues in the context of existing issues. A prioritization matrix is a tool that can help provide clarity for deciding relative importance, getting agreement on priorities, and then determining the actions that are likely to provide best results within appropriate time frames. Collecting data about the issue’s criticality, frequency, and the feasibility of the corrective and preventative actions enables a more confident decision-making process for prioritization. In the example below, shown in Table 1, the columns of the matrix show the criteria, with one row for each issue. Weights are assigned to the criteria based on the degree to which the score would contribute to the overall prioritization. In this example, the highest weight is assigned to the criticality. The data steward will gather information as input to the scoring process, and each of the criteria’s weighted scores are calculated, and summed in the total. Criteria

Issue

Criticality

Frequency

Weight = 4

Weight = 1

Score

Weighted score

Score

Weighted

Correction

Prevention

Feasibility

Feasibility

Weight = 1

Weight = 2

Score

score

Weighted

Score

score

Total

Weighted score

Table 1: Example Prioritization Matrix  The weights must be determined in relation to the business context, and the expectations as directed by the agreements within the DQ SLA. In addition, the organization’s level of maturity in data quality and data governance may also factor in the determination of scoring protocols as well as weightings.

Gathering Knowledge When an issue is reported, each of the criteria is scored, using guidance from the DQ SLA, which may suggest the assignment of points based on the answers to a sequence of questions, such as:

3



How many business processes/activities are impacted by the data issue?



What business processes have failed as a result of the data issue?



How many business processes have failed?



How many individuals are affected?



How many systems are affected?



What types of systems are affected?



How many records are affected?



How many times has this issue been reported? Within what time frame?



How long has this been an issue?

Then, based on the list of individuals and systems affected, the data steward must review business impacts within the context of known issues as well as newly-discovered issues, asking questions such as these: •

What are the potential business impacts?



Is this an issue specifically discussed in the DQ SLA?



Has this introduced delays or halts in processing that must be performed within the constraints of the SLA?

The next step is to evaluate what data sets have been affected and whether these data sets need to be recreated, modified, or corrected using these types of questions: •

Are there short-term corrective measures that can be taken to restart halted processes?



Are there long-term measures that can be taken to identify in the event the issue occurs in the future?



Are there system modifications that can be performed to eliminate the issue’s occurrence altogether?

Assigning Criticality Having collected knowledge about the issue, the data steward can synthesize what is directed in the DQ SLA with what has been learned during the triage process to determine the level of severity and assign priority for resolution. The collected information can be used to populate the prioritization matrix, assign scores, and apply weights. Issues can be assigned a priority score based on the results of the weightings applied in the prioritization matrix. In turn, each issue can be prioritized, from both a relative standpoint (i.e., which issues take relative precedence compared to others) and from an absolute standpoint (i.e., is a specific issue high or low priority). Data issue priority will be defined by the members of the various data governance groups. As an example, an organization may define four levels of priority: •

Business critical – the existence of the problem prevents necessary business activities from completing, and must be resolved before those activities can continue



Serious – there are measurably high impacts to the business, but the issue does not prevent critical business processes from completing



Tolerable – there are measurable impacts to the business, but requires additional research to determine whether correction and elimination are economically feasible and consequently desired



Acknowledged – the issue is recognized and documented, but the scale of the business impact does not warrant the additional investment in remediation

4

Depending on the scoring process, the weighting, and the assessment, any newly reported issues can be evaluated and assigned a priority that should direct the initiation of specific actions as specified by the DQ SLA.

Preparation for Action Once the data steward has reviewed the criticality of the issue, that data steward must decide on the sequence of actions that must be initiated. The first task is to consult the DQ SLA for the specific directives associated with issues of the assigned priority, and a good DQ SLA will provide a full mapping of directives and response times for each priority level, as in Table 2. Business Critical

Serious

Tolerable

Acknowledged

Individuals to notify Maximum Response time Critical tasks Escalation chain Level of Effort

Table 2: Sample DQ SLA directives by priority type.  Once the DQ SLA has been consulted, the data steward will be expected to notify the right people and then perform these tasks: •

Evaluate impacted systems and data sets



Perform root cause analysis



Determine data correction requirements



Determine mitigation strategies



Evaluate those mitigation strategies in context of priority



Make a decision and plan to execute

Evaluate Impacted Systems and Data Sets When a data error occurs, it may be possible that downstream computations, calculation, business processes, and/or reports are affected by the error. The first task is to assess the landscape and identify the impacted systems and data sets. Given a complete information flow mapping that details data dependency chains, the data steward can review which systems and data sets may have been affected and quickly configure tests or queries to check if there are any changes from expected results. Then, they can document any data sets that may need to be corrected or any business processes that need to be rewound and restarted.

5

Root Cause Analysis To identify mitigation strategies, it is necessary to understand where the issues originated and where the best places are for fixing and eliminating the root cause. Alternatively, there may be a place in the business process where the introduction of the issue caused system failures. Reviewing the business process model and traversing the processes helps determine the root cause and provides input into the determination of recommendations for addressing the issue. This step involves reviewing the business process models that map the information flow prior to the point at which the data error was reported. By understanding the processing stage at which the data is valid prior to entry and invalid after exit, the data steward is able to narrow down the location within the information chain where the error is introduced. This isolation process can be repeated on a finer granularity until the data steward – together with the necessary system analysts and programmers – is able to determine exactly where the error is introduced.

Data Correction Requirements If data errors introduced earlier in the process flow have cascaded through other data sets, it is necessary to review those data sets and assess the “damage.” Improper data changes will need to be backed out, and any dependent processing stages may need to be rolled back and restarted. The time frame and urgency of data correction will be set according to the criteria set out in the DQ SLA. Because of the sensitivity of accessing data through “unblessed” channels, both of these tasks must be performed under strict scrutiny and must be documented and reported into the incident reporting workflow. One-off programs intended to perform mass data corrections must be announced to all relevant stakeholders and scheduled to minimize impact on operations.

Mitigation Strategies There may be different approaches for addressing both the root cause of an issue as well as the side effects cause by the issue. At this point, the data steward’s job is to determine alternatives for figuring out ways to eliminate the root cause and assess the feasibility of doing so. If it is not feasible to eliminate the source of the problems, the data stewards should identify sentinel measures or assertions for which inspection routines are used to generate alerts. This way, the data governance team can take the opportunity to introduce any inspection or monitoring routine to prevent the issue from being introduced in the future.

6

There are essentially two tacks to take: root cause elimination along with monitoring and prevention: Root Cause Elimination – If the data stewards and system analysts have determined the specific location and root cause of the introduction of the error, and there are options for correcting the process to eliminate that root cause. At this point, they can: •

Evaluate the level of effort



Determine the time frame for the fix



Provide a development plan



Provide a test plan

If the level of effort and the associated costs are reasonable and the resources are available, then eliminating the root cause of the issue is a good idea. Monitoring and Prevention – If the level of effort to eliminate the root cause exceeds the organization’s ability or desire, the next plan of action is to institute inspections and monitoring processes. When the inspection routines determine that the error has occurred, the data stewards can be notified immediately. As noted in the DQ SLA, the steward can then take the appropriate actions to delay or halt the business process until the identified error can be reviewed and for the offending data to be removed, if necessary, to allow normal processing to continue.

Evaluate and Execute Given the options to eliminate the root cause, institute inspection process, as well as any other potential options for addressing the issue, the next step is a decision to move forward. As with all business activities, it is critical to make sure that the steps to be taken are properly planned so that progress and success can be measured in alleviating the pain introduced by the data issue.

Tracking Workflow The issue and incident tracking logs the decisions made at each point along the way for issue assessment and remediation. As the tasks performed are guided by the requirements as specified in the DQ SLA, the tracking system will also provide performance reporting including mean-time-to-resolve issues, frequency of occurrence of issues, types of issues, sources of issues, and common approaches for correcting or eliminating problems. Since the incident management system is a reference source of current and historic issues and the remediation steps taken (as well as their success ratios), it will also guide the activities moving forward based on best practices developed within the organization.

7

The data quality issues tracking system provides a number of benefits: •

Information and knowledge sharing improves decision-making and staff performance under pressure



Baseline knowledge and sharing status helps reduce duplication of effort



Issues can be analyzed to identify common error patterns

Tracking issues from data transmission to customer support problem reporting supports the data management lifecycle, making sure that as data issues appear, they are identified, reviewed, and the plan of action recorded as each step is taken. Updating the status report according to the directives in the DQ SLA provides current information to the managers of any downstream business processes and data sets to help inform their actions as well.

Summary Operational data governance is served two ways using the data quality service level agreement. As a contractual agreement between data provider and data consumer, its value lies in being the central location for documenting organizational data quality expectations. But moreover, the DQ SLA acts as a run book that guides the data steward in the steps to take when a data issue is reported. A prioritization matrix is used for the evaluation of priority, using an assessment of criticality and frequency of the issue and estimating the costs associated with different approaches to remediation. In turn, the DQ SLA specifies the actions to take based on the priority classification assigned to the identified problem. Next, the data steward examines the scope of the error and determines whether corrections need to be applied to specific data sets, and assembles a plan for either eliminating the root cause or instituting additional inspections and monitoring. Finally, the data quality incident management system is used to both manage the workflow and act as a knowledge repository regarding the issue. Carefully managing the process will result in a more streamlined reaction to emerging data problems and reduce the time for their resolution. As more organizations move to a data governance format, this type of remediation that coordinates staff across functional areas, IT applications and data sources is critical to success.

8

Related Documents


More Documents from ""