© 2002 Giga Information Group Copyright and Material Usage Guidelines
June 20, 2002
‘Data Quality’ Is a Misnomer Lou Agosta
Catalyst A client inquiry
Question What is the definition and value of data as opposed to information?
Answer “Data quality” is a misnomer. Data in itself is meaningless, data is what is given — it is basic raw material. Whether unstructured or structured content, it is data. If a person asks the value of data, that is easy — data itself is worthless. It is what you do with the data that has value. Data is the content and when it is structured in such a way as to reduce uncertainty, then it has value as information. Thus, data plus structure produces information. Information provides differences and distinctions that reduce uncertainty. A simple example is that the attribute of gender tells us something about a customer. If I am confident that the customer is either male or female but I am not sure which one, then I have not reduced my uncertainty one bit. I do not have any more information than when I started. Whereas if I have the distinction male/female and, literally, the bit of information that the customer is male, then I will plan on selling them a Father’s Day gift rather than one for Mother’s Day. The data without the structure is meaningless; the structure without the data is empty. The structure — the simple male/female distinction — is not in itself information. The application of the structure to the data yields information and provides a reduction in uncertainty. Sell the individual a tie, not flowers. Thus, Giga’s working definition of information and how to transform dumb data into quality information is depicted in the figure below. As the attributes of the data are structured according to a defined process for transforming the data along the three high-level dimensions of objectivity, usability and trustworthiness, the information quality improves in precisely those dimensions. In particular, information = objective(data) + useable(data) + trustworthy(data). Knowledge is not on the same continuum as data and information. The commitment needed might be represented as a point in one of the quadrants or as a circle encompassing the entire diagram. Knowledge = commitment(information).
IdeaByte ♦ ‘Data Quality’ Is a Misnomer RIB-062002-00123 © 2002 Giga Information Group All rights reserved. Reproduction or redistribution in any form without the prior permission of Giga Information Group is expressly prohibited. This information is provided on an “as is” basis and without express or implied warranties. Although this information is believed to be accurate at the time of publication, Giga Information Group cannot and does not warrant the accuracy, completeness or suitability of this information or that the information is correct.
‘Data Quality’ Is a Misnomer ♦ Lou Agosta
From Data to Information Ease of use Trustworthy
Objective
Subjective
Key:
Uncertain Hard to use
Improved information quality
Source: Giga Information Group
From a business perspective, knowledge is qualitatively different than information. There is a gap separating information, no matter how high the quality, from knowledge. The “best available information” never results in knowledge without something special mixed in to strengthen it. Information requires something additional to be added to it in order to yield knowledge. That something is commitment, commitment to goals relevant to the business enterprise such as customer service, launching a new product or attaining operational excellence. (Knowledge = commitment(information).) Data, information and knowledge are overlapping categories that describe different aspects of the world of business. They are different ways of describing the same phenomena. One person’s data may be another’s information and vice versa. Yet the distinctions are valid or why would they exist in the first place? Data is what is given — subjective, uncertain and unclear in its use or interpretation. Add structure to data in the interest of reducing uncertainty and the result is information. (Information = structure (data).) Information is built out of data by applying structure, categories, processes — including data models, functional transformations (ETL), queries and representation — in a process that generates increasing objectivity, usability and certainty. Each of these dimensions is further decomposed. So, objectivity includes such aspects as accuracy, existence, causality, consistency; timeliness; completeness, unambiguousness, precision (not vague); usability includes ease of interpretation, availability, security; and trustworthiness includes credibility, believability and the accumulated lessons of experience. Thus, information quality is improved. However, no matter how much it is improved and how certain it is, information is still not knowledge. It is not as if the information were getting more and more certain and finally resulted in knowledge. To get knowledge from information, something else — a commitment to a business decision — must be added. Philosophers, marketing executives, linguists and scientists have struggled with the distinctions between data, information and knowledge for decades if not centuries. Giga has taken a pragmatic approach to defining these distinctions. We know that lack of data quality costs money — misdirected mail is returned, effort is wasted, rework is incurred, sales are lost and inventory outages occur. Quality implies differences, differences imply distinctions of value and distinctions of value imply market value. Market value implies the dollar value. Like so many things, information quality is a bootstrap operation requiring iteration, a process of learning from one’s mistakes and commitment to business results. Start by employing data profiling to build
IdeaByte ♦ RIB-062002-00123 ♦ www.gigaweb.com © 2002 Giga Information Group Page 2 of 3
‘Data Quality’ Is a Misnomer ♦ Lou Agosta
an inventory of data assets and evaluate the state of information quality within the enterprise on a system-bysystem basis. Be prepared for “roll-up-the-sleeves” hard work. This is likely to be both a top-down and bottom-up task, since the impact on information quality of relations between systems can only be evaluated by including both sides of the interface.
IdeaByte ♦ RIB-062002-00123 ♦ www.gigaweb.com © 2002 Giga Information Group Page 3 of 3