Data Quality Project Cost Drivers

  • April 2020
  • PDF

This document was uploaded by user and they confirmed that they have the permission to share it. If you are author or own the copyright of this book, please report to us by using this DMCA report form. Report DMCA


Overview

Download & View Data Quality Project Cost Drivers as PDF for free.

More details

  • Words: 735
  • Pages: 2
© 2001 Giga Information Group Copyright and Material Usage Guidelines

August 1, 2001

Data Quality Project Cost Drivers Lou Agosta

Catalyst A client inquiry

Question What is the cost difference between cleansing older data vs. newer data? Is there a scaling factor that is used in the industry today, which we can use as a guide?

Answer It would indeed be nice to have a simple answer that said if the data is two years old, it costs $X thousands to validate, whereas if the data is four years old, it costs $2X thousands. Alas, there is no such standard — nor will there ever be such a standard — since it does not make sense. Data that is old is not in itself of less accurate or of dubious quality. It simply has a different, older date and timestamp on it. There is simply no correlation between data age and quality. In fact, data systems that have been in production for a number of years tend to have fewer bugs and data discrepancies because these will have been worked out and improved over time. On the other hand, if a system has been neglected, not used or otherwise abused, then it may very well be a source of data quality problems. However, this is a function of poor system documentation or maintenance practices, not of the age. In general, the cost and effort of the data quality verification and improvement project will be a function of the following factors: •= •= •= •= •= •=

The volume of data (the amount) How well the data is structured (unstructured data may require manual intervention or specialpurpose technology and so cost more) The number of data elements The rules by which the data elements are related and the complexity of the interrelations Rules of validation or verification of the permitted data element values by which to evaluate the content Special-purpose hardware, software and professional consulting services required to process the data

Experienced project teams are quite good at estimating volumes, structure, data elements and the usual system costs and consulting services. Data preparation is a major cost, and it closely tracks data volume, structure and number of data elements. The significant unknown, which can represent a substantial project risk, is the understanding and availability of the rules by which the data elements are interrelated and validated. If you or your vendor has the rules in automated form, then the main costs will be in preparing the data for automated processing and problem resolution. If your firm does not know the rules — say because of poor system documentation practices — then the cost will include the effort to discover and formulate the rules. This can result in a necessary but expensive exercise in data and system archeology as the semantics of IdeaByte ♦ Data Quality Project Cost Drivers RIB-082001-00009 © 2001 Giga Information Group All rights reserved. Reproduction or redistribution in any form without the prior permission of Giga Information Group is expressly prohibited. This information is provided on an “as is” basis and without express or implied warranties. Although this information is believed to be accurate at the time of publication, Giga Information Group cannot and does not warrant the accuracy, completeness or suitability of this information or that the information is correct.

Data Quality Project Cost Drivers ♦ Lou Agosta

the data elements are disentangled from the context in a legacy or enterprise resource planning (ERP) system. A vendor will not necessarily know this unless you clearly communicate the scope and limits of the data stores to be analyzed and qualified. This will give you a cost — though a word of caution is needed: This will give you a cost, it will not necessarily give you data quality — unless you have a defined process and standard against which to evaluate the quality of the data and apply them as part of a conscientious program of data quality. In order to get a handle on a data quality project (and so on the costs), the project leader will need to do some homework. The following Giga research will be useful: Planning Assumption, Data Quality Methodologies and Technologies, Lou Agosta Planning Assumption, Vendor ScoreCard: Data Quality, Part 1, Lou Agosta Planning Assumption, Vendor ScoreCard: Data Quality, Part 2, Lou Agosta

IdeaByte ♦ RIB-082001-00009 ♦ www.gigaweb.com © 2001 Giga Information Group Page 2 of 2

Related Documents

Quality Cost
November 2019 21
Cost Data
April 2020 12
Cost Of Quality
May 2020 20
Cost Of Quality
June 2020 13