© 2001 Giga Information Group Copyright and Material Usage Guidelines
June 26, 2001
Data Quality Market Segments Lou Agosta
Catalyst A client inquiry
Question Can you please more precisely define the market segments identified in Planning Assumption, Market Overview: Data Quality, Lou Agosta?
Answer The data quality market is diverse and dynamic. Without wasting too much time on assumptions regarding method, Giga’s approach is to describe the market and abstract the diverse categories within it rather than attempting to impose categories or distinctions from a formal or academic point of view. “Data quality extenders” are applications that leverage data quality technologies, such as searching, matching and de-duplication, in the domains of operational and analytic customer relationship management (CRM), product management and catalog standardization. Since market multiples are more favorable in CRM than they are in moving and manipulating dumb data, many vendors have shifted their technologies, products and offerings in this direction. The key differentiator here is an extension of a technology from data quality in the narrow sense to CRM, for example, product or catalog standardization. “Name and address processing, householding and geocoding” is data quality technology that focuses on the issues relating to leveraging name and address data. This is relevant to direct marketers whose mailing expenses are substantial and that have called upon automation to manage the process of producing a specific output — namely, a name and address suitable for printing on an envelope in a national postal system. This is a very particular market segment — that of optimizing mailing and other marketing initiatives based on geographic data. Geocoding is closely related to the problem of physically locating the individual on the (postal) map and describing demographics based on relevant demographic data. Name and address preparation for the US Post Service (or any national postal system) is one of those dull-as-dirt applications that could cost a small fortune if a firm makes an error; thus, a specialized application is of the essence here. “Identification and searching” are technologies that address the particular computer science problem of how we know entities are properly identified in the context of searching. For example, “Church” can refer to a street, a person’s name or a place of worship, depending on whether the context is 212 Church Street, Alonzo Church or The First Church of Gainsville. Though closely related to the operation of matching, identification is, in fact, a distinct function and is the presupposition for matching, which becomes a special case of finding something again after having indexed and stored it. “Matching and de-duplication’ is the application functions and algorithms to the specific problem of eliminating ambiguity — that is, two names refer to the same entity (person, place, thing). When an organization has two data stores that refer to the same, possibly overlapping (but not identical), population (whether people, products or transactions), the problems presented include building a single master file (database) or extending the amount of information available by aggregating the two sources in a meaningful way. Government organizations conducting censuses and related surveys have faced this problem for a long IdeaByte ♦ Data Quality Market Segments RIB-062001-00226 © 2001 Giga Information Group All rights reserved. Reproduction or redistribution in any form without the prior permission of Giga Information Group is expressly prohibited. This information is provided on an “as is” basis and without express or implied warranties. Although this information is believed to be accurate at the time of publication, Giga Information Group cannot and does not warrant the accuracy, completeness or suitability of this information or that the information is correct.
Data Quality Market Segments ♦ Lou Agosta
time. The result is the need for de-duplication — knowing how to make a determination that eliminates duplicates. In fact, this category (matching and de-duplication) is closely related to identification and searching. In subsequent Giga research, reference is sometimes made to matching and searching. “Data profiling/metadata/analytics” is the description of the structure, data elements and content of data structures that regard content and the validity and semantics of the content in question. This includes information engineering and reengineering. The profiling of defined codes, statuses and permissible business values against the actual content of data stores is especially useful if the results are captured in a central metadata repository for leveraged reuse. Data quality is relative to allowable values, that is, to data standards. The idea of a data dictionary is not new. What is new is the possibility of capturing the data in an automated way to a local metadata repository as the data is mapped from the transactional system of record to the decision-support data store. “Standardization/scrubbing/correction/parsing” is modifying and enhancing the quality and content of data against a specified set of rules, canons or standards that indicates the proper structure and semantic content of the data. At least one data quality vendor, SSA, makes a strong case that standardization is unrelated to data quality. But even SSA acknowledges that standardization is necessary in order to qualify for discounts on direct mail through the national postal service. “Data augmentation” is the correlating of demographic information with basic customer or product data. The large credit reporting and data aggregators specialize in this area. These are not software products, and these infrastructures may also be involved in the delivery of the content. As discussed in previous research, they include Axciom, Equifax, Experian, CACI, Claritas, Harte-Hanks, Polk and TransUnion (see Planning Assumption, Market Overview: Data Quality, Lou Agosta). When data quality services, such as those specified in any of the above, are offered as part of a centralized service to a variety of clients from a single source, then a service bureau model is being invoked. An application service provider (ASP) is an example of a modern approach to a service bureau. Data quality vendors that are trying to generate revenues using the service bureau or ASP model include: Firstlogic (eDataQualityService ASP), Harte-Hanks (in a variety of service forms) and Group1 (dataQuality.net). For additional research regarding this topic see the following: •=
Planning Assumption, Data Quality Methodologies and Technologies, Lou Agosta
•=
Planning Assumption, Vendor ScoreCard: Data Quality, Part 1, Lou Agosta
•=
Planning Assumption, Vendor ScoreCard: Data Quality, Part 2, Lou Agosta
IdeaByte ♦ RIB-062001-00226 ♦ www.gigaweb.com © 2001 Giga Information Group Page 2 of 2