Data Quality And Its Parameters

  • Uploaded by: yash guptaa
  • 0
  • 0
  • October 2019
  • PDF

This document was uploaded by user and they confirmed that they have the permission to share it. If you are author or own the copyright of this book, please report to us by using this DMCA report form. Report DMCA


Overview

Download & View Data Quality And Its Parameters as PDF for free.

More details

  • Words: 2,199
  • Pages: 10
Data Quality and its Parameters –

Data - facts and statistics collected together for reference or analysis. Data are the values of subjects with respect to qualitative or quantitative variables. Data and information are often used interchangeably; however, the extent to which a set of data is informative to someone depends on the extent to which it is unexpected by that person.

Data qualityData quality is a perception or an assessment of data's fitness to serve its purpose in a given context. The quality of data is determined by factors such as accuracy, completeness, reliability, relevance and how up to date it is. As data has become more intricately linked with the operations of organizations, the emphasis on data quality has gained greater attention. Why data quality is important Poor-quality data is often pegged as the source of inaccurate reporting and illconceived strategies in a variety of companies, and some have attempted to quantify the damage done. Economic damage due to data quality problems can range from added miscellaneous expenses when packages are shipped to wrong addresses, all the way to steep regulatory compliance fines for improper financial reporting. An oft-cited estimate originating from IBM suggests the yearly cost of data quality issues in the U.S. during 2016 alone was about $3.1 trillion. Lack of trust by business managers in data quality is commonly cited among chief impediments to decisionmaking. The demon of poor data quality was particularly common in the early days of corporate computing, when most data was entered manually. Even as more automation took hold, data quality issues rose in prominence. For a number of years,

the image of deficient data quality was represented in stories of meetings at which department heads sorted through differing spreadsheet numbers that ostensibly described the same activity. Determining data quality Aspects, or dimensions, important to data quality include: accuracy, or correctness; completeness, which determines if data is missing or unusable; conformity, or adherence to a standard format; consistency, or lack of conflict with other data values; and duplication, or repeated records. As a first step toward data quality, organizations typically perform data asset inventories in which the relative value, uniqueness and validity of data can undergo baseline studies. Established baseline ratings for known good data sets are then used for comparison against data in the organization going forward. Methodologies for such data quality projects include the Data Quality Assessment Framework (DQAF), which was created by the International Monetary Fund (IMF) to provide a common method for assessing data quality. The DQAF provides guidelines for measuring data dimensions that include timeliness, in which actual times of data delivery are compared to anticipated data delivery schedules.

Data quality management Several steps typically mark data quality efforts. In a data quality management cycle identified by data expert David Loshin, data quality management begins with identifying and measuring the effect of business outcomes. Rules are defined, performance targets are set, and quality improvement methods as well as specific data cleansing, or data scrubbing, and enhancement processes are put in place. Results are then monitored as part of ongoing measurement of the use of the data in the organization. This virtuous cycle of data quality management is intended to assure consistent improvement of overall data quality continues after initial data quality efforts are completed. Software tools specialized for data quality management match records, delete duplicates, establish remediation policies and identify personally identifiable data. Management consoles for data quality support creation of rules for data handling to

maintain data integrity, discovering data relationships and automated data transforms that may be part of quality control efforts. Collaborative views and workflow enablement tools have become more common, giving data stewards, who are charged with maintaining data quality, views into corporate data repositories. These tools and related processes are often closely linked with master data management (MDM) systems that have become part of many data governance efforts. Data quality management tools include IBM InfoSphere Information Server for Data Quality, Informatica Data Quality, Oracle Enterprise Data Quality, Pitney Bowes Spectrum Technology Platform, SAP Data Quality Management and Data Services, SAS DataFlux and others.

Defining Quality Data While many organizations boast of having good data or improving the quality of their data, the real challenge is defining what those qualities represent. What some consider good quality others might view as poor. Judging the quality of data requires an examination of its characteristics and then weighing those characteristics according to what is most important to the organization and the application(s) for which they are being used. AccuracyThis characteristic refers to the exactness of the data. It cannot have any erroneous elements and must convey the correct message without being misleading. This accuracy and precision have a component that relates to its intended use. Without understanding how the data will be consumed, ensuring accuracy and precision could be off-target or more costly than necessary. For example, accuracy in healthcare might be more important than in another industry (which is to say, inaccurate data in healthcare could have more serious consequences) and, therefore, justifiably worth higher levels of investment.

Data should be sufficiently accurate for the intended use and should be captured only once, although it may have multiple uses. Data should be captured at the point of activity.  Data is always captured at the point of activity. Performance data is directly input into PerformancePlus1 (P+) by the service manager or nominated data entry staff.  Access to P+ for the purpose of data entry is restricted through secure password controls and limited access to appropriate data entry pages. Individual passwords can be changed by the user and which under no circumstances should be used by anyone other than that user.  Where appropriate, base data, i.e. denominators and numerators, will be input into the system which will then calculate the result. These have been determined in accordance with published guidance or agreed locally. This will eliminate calculation errors at this stage of the process, as well as provide contextual information for the reader.  Data used for multiple purposes, such as population and number of households, is input once by the system administrator. Validity – Requirements governing data set the boundaries of this characteristic. For example, on surveys, items such as gender, ethnicity, and nationality are typically limited to a set of options and open answers are not permitted. Any answers other than these would not be considered valid or legitimate based on the survey’s requirement. This is the case for most data and must be carefully considered when determining its quality. The people in each department in an organization understand what data is valid or not to them, so the requirements must be leveraged when evaluating data quality. Data should be recorded and used in compliance with relevant requirements, including the correct application of any rules or definitions. This will ensure

consistency between periods and with similar organisations, measuring what is intended to be measured.  Relevant guidance and definitions are provided for all statutory performance indicators. Service Heads are informed of any revisions and amendments within 24 hours of receipt from the relevant government department. Local performance indicators comply with locally agreed guidance and definitions. Reliability – Many systems in today’s environments use and/or collect the same source data. Regardless of what source collected the data or where it resides, it cannot contradict a value residing in a different source or collected by a different system. There must be a stable and steady mechanism that collects and stores the data without contradiction or unwarranted variance. Data should reflect stable and consistent data collection processes across collection points and over time. Progress toward performance targets should reflect real changes rather than variations in data collection approaches or methods.  Source data is clearly identified and readily available from manual, automated or other systems and records. Protocols exist where data is provided from a third party, such as Hertfordshire Constabulary and Hertfordshire County Council

Timeliness There must be a valid reason to collect the data to justify the effort required, which also means it has to be collected at the right moment in time. Data collected too soon or too late could misrepresent a situation and drive inaccurate decisions. Data should be captured as quickly as possible after the event or activity and must be available for the intended use within a reasonable time period. Data must be available quickly and frequently enough to support information needs and to influence service or management decisions.

 Performance data is requested to be available within one calendar month from the end of the previous quarter and is subsequently reported to the respective Policy and Scrutiny Panel on a quarterly basis. As a part of the ongoing development of PerformancePlus it is intended that performance information will be exported through custom reporting and made available via the Three Rivers DC website. This will improve access to information and eliminate delays in publishing information through traditional methods.

Completeness Incomplete data is as dangerous as inaccurate data. Gaps in data collection lead to a partial view of the overall picture to be displayed. Without a complete picture of how operations are running, uninformed actions will occur. It’s important to understand the complete set of requirements that constitute a comprehensive set of data to determine whether or not the requirements are being fulfilled. Data requirements should be clearly specified based on the information needs of the organisation and data collection processes matched to these requirements. Refers to the relationship between database objects and the abstract universe of all such objects. ● Includes selection criteria, definitions and other mapping rules used to create the database

Relevance Data captured should be relevant to the purposes for which it is to be used. This will require a periodic review of requirements to reflect changing needs.

Availability and Accessibility: This characteristic can be tricky at times due to legal and regulatory constraints. Regardless of the challenge, though, individuals need the right level of access to the data in order to perform their jobs. This presumes that the data exists and is available for access to be granted.

Granularity and Uniqueness: The level of detail at which data is collected is important, because confusion and inaccurate decisions can otherwise occur. Aggregated, summarized and manipulated collections of data could offer a different meaning than the data implied at a lower level. An appropriate level of granularity must be defined to provide sufficient uniqueness and distinctive properties to become visible. This is a requirement for operations to function effectively. There are many elements that determine data quality, and each can be prioritized differently by different organizations. The prioritization could change depending on the stage of growth of an organization or even its current business cycle. The key is to remember you must define what is most important for your organization when evaluating data. Then, use these characteristics to define the criteria for high-quality, accurate data. Once defined, you can be assured of a better understanding and are better positioned to achieve your goals.

Emerging data quality challenges Over time, the burden of data quality efforts centered on the governance of relational data in organizations, but that began to change as web and cloud computing architectures came into prominence. Unstructured data, text, natural language processing and object data became part of the data quality mission. The variety of data was such that data experts began to assign different degrees of trust to various data sets, forgoing approaches that took a single, monolithic view of data quality. Also, the classic issues of garbage in/garbage out that drove data quality efforts in early computing resurfaced with artificial intelligence (AI) and machine learning applications, in which data preparation often became the most demanding of data teams' resources. The higher volume and speed of arrival of new data also became a greater challenge for the data quality steward.

Expansion of data's use in digital commerce, along with ubiquitous online activity, has only intensified data quality concerns. While errors from rekeying data are a thing of the past, dirty data is still a common nuisance. Protecting the privacy of individuals' data became a mild concern for data quality teams beginning in the 1970s, growing to become a major issue with the spread of data acquired via social media in the 2010s. With the formal implementation of the General Data Protection Regulation (GDPR) in the European Union (EU) in 2018, the demands for data quality expertise were expanded yet again.

Fixing data quality issues With GDPR and the risks of data breaches, many companies find themselves in a situation where they must fix data quality issues. The first step toward fixing data quality requires identifying all the problem data. Software can be used to perform a data quality assessment to verify data sources are accurate, determine how much data there is and the potential impact of a data breach. From there, companies can build a data quality program, with the help of data stewards, data protection officers or other data management professionals. These data management experts will help implement business processes that ensure future data collection and use meets regulatory guidelines and provides the value that businesses expect from data they collect.

Related Documents

Parameters
July 2020 17
Parameters
November 2019 32

More Documents from "Ryza Jazid Baharuddin Nur"