Data Mining , An Introduction by Ruth Dilly 1995
Data mining problems/issues As data mining systems rely on the raw data from different data sources. Following are the main problems or issues that a data analyst has to face during mining data [] a. Limited Information
A database/ data mart / data warehouse is usually designed for running their day to day business as well for some summary reports. So there may be limited no. of attributes required for data mining to fulfill the objective of stakeholder For example cannot diagnose malaria from a patient database if that database does not contain the patients red blood cell count. b. Noise and missing values Data is entered in Databases by the operators or in some cases atutomatic, so data is more vulnerable to errors as well missing values. Attributes which rely on objective can give rise to errors or miss classified. Errors in the values of attributes id called noise. So for accurate data mining results noise and missing values must be removed. Missing data can be treated by the following ways. • • • • •
Discard the missing values Discard the rows having missing values Take general values from others records/record A special value can be assigned for record Or an average value can be used
c. Uncertainty Uncertainty depends upon the severity of the degree of noise in data elements. For example there is a great difference between the records in term of values. d. Data cohesiveness Cohesiveness means how much the data attributes are relevant to each other.