Lung Disease Prediction Using K-Means Clustering and Naïve Bayes Algorithm
Introduction In the real world, Lung cancer accounts for more deaths than any other cancer in both men and women. Lung Cancer disease is the fifth leading cause of death in the world over the past 10 year (World Health Organization 2016). According to the WHO (World Health Organization) report lung Disease is the leading cause of death across the world accounting for 1.58 million, accounting for about 27 % of all cancer deaths. Death rate began declining in 1991 in men and in 2003 in women. Early detection of lung cancer is essential in reducing life losses. However earlier treatment requires the ability to detect lung cancer in early stages. Early diagnosis requires an accurate and reliable diagnosis procedure that allow physicians to distinguish benign lung disease from malignant ones. Health data is rapidly increasing in the world. Health data is very large and complex due to this processing of data using traditional data processing techniques is very difficult. For simplicity, machine learning techniques like KNN, SVM, D.T have been used. Some tool like Python (pandas) and Weka are widely used in the data analytics field.
Objective To study different disease prediction algorithms and literature review. To design a system for lung disease prediction based on patient data. To design a system for higher accuracy in lung disease prediction than already existing systems. To implement a system using multiple algorithms for increased time-efficiency.
Literature survey Rucha Shinde , et.al (2015) ,nowadays people work on computers for hours and hours they don’t have time to take care of themselves. Due to hectic schedules and consumption of junk food, it affects the health of people and mainly heart. So to we are implementing an heart disease prediction system using data mining technique Naïve Bayes and k-means clustering algorithm. It is the combination of both the algorithms. This paper gives an overview for the same. It helps in predicting the heart disease using various attributes and it predicts the output as in the prediction form. For grouping of various attributes, it uses k-means algorithm and for predicting it uses naïve Bayes algorithm. V.Krishnaiah , et.al (2013) Proposed the potential use of classification based data mining techniques such as rule based ,decision based, naïve Bayes to massive volume of healthcare data. The healthcare industry collects huge amount of data which, unfortunately are not mined to discover hidden information for data preprocessing and effective decision making one dependency augmented naïve Bayes classifiers(ODANB) and naïve creedal classifiers 2 (NCC2) are used. This is extension of naïve Bayes to imprecise probabilities that aims at delivering robust classification also when dealing with small or incomplete data sets. S.Sudha, et.al (2013) data mining is defined as sifting through very large amounts of data for useful information. Some of the most important and popular data mining techniques are association rules, classification, clustering, prediction and sequential patterns. Data mining techniques are used for variety of applications. In health care industry, data mining plays an important role for predicting diseases. For detecting a disease number of tests should be required from the patient. But using data mining technique the number of test should be reduced. This reduced test plays an important role in time and performance. T.Karthikeyan, et.al, (2014) presented a extraction algorithm used to improve the predicted accuracy of the classification. This paper applies with Principal Component analysis as a feature evaluator and ranker for searching method. Naive Bayes algorithm is used as a classification algorithm. It analyzes the hepatitis patients from the UCI rvine machine learning repository. The results of the classification model are accuracy and time. Finally, it concludes that the proposed PCA-NB algorithm performance is better than other classification techniques for hepatitis patients. Pallavi Mirajkar, et.al (2011) Cancer identification and prediction are huge challenge to the
researchers. The use of various techniques of data mining techniques has revolutionized the whole process of cancer Diagnosis and Prognosis. We are proposing integrated system which is based on combination of various data mining techniques such as analytical hierarchy process, rule based association, classification etc. that is helpful to predict the patient’s disease status. Cancer disease risk can be discovered by analyzing and identifying various factors and symptoms of the patient before recommending treatments. The vital aim of our system is to help oncologist and medical practitioners in diagnosing the patient by analyzing available data and relevant information.
Priyanka D, et.al (2014) Lung cancer is one of the major causes of death in both genders
when compared to all other cancers. Lung cancer has become the most hazardous types of cancer in the world. Early detection of lung cancer is essential in reducing life losses. This paper presents prediction on lung disease using K means algorithm. This project comprises of three modules. First, admin module which is administrator’s login there the details of the patient will be generated. Now the user will authenticate based on their credentials. The second module is User module there the patient enters his username and password to predict cancer. Third module is Cancer prediction module in which the result will be predicted at the last stage with the help of K means algorithm. The K means will classify the input features into two classes of cancer type (benign and malignant). This project is implemented in java as the front end and mysql as the back end. This project aims to implement an effective prediction on lung cancer with the help of K means algorithm user can know the cancer status. From this project we infer that the K means is suitable for lung cancer prediction Research Methodology To analyze data related to lung diseases for data mining through Weka. K-means clustering and naïve Bayes techniques will be use. Naive Bayes algorithm will be use as a classification algorithm. K-means clustering has the ability to handle massive data and cluster those data efficiently and quickly. A simple and straightforward iterative method will be use to partition the data set into k-number of clusters.
Tentative Outcomes Lung disease prediction system will be developed by combining Naïve Bayes and KMeans algorithm. Weka tools would be used to reduce the execution time of algorithms. The prediction system may be faster, less computationally expensive, time efficient and produce results that are more accurate. The proposed system will help doctors to efficiently predict lung diseases in the initial stages for better treatment.
References [1] World Health Organization (2011) The top ten causes of death. World Health Organization (2013) Deaths from coronary heart disease. [2] V.Krishnaiah, G.Narsimha, N.Subhash Chandra. 2013, “Diagnosis of Lung Cancer Prediction System Using Data Mining Classification Techniques,” International Journal of Computer Science and Information Technologies, Vol. 4 (1), 2013, 39 – 45 [3] Rucha Shinde ,Sandhya Arjun,Priyanka Patil,”An intelligent heart disease prediction system using k-means clustering and naïve bayes algorithm,” IJCSIT 2015 ,vol 6(1),2015 [4] S.Sudha , S.Vijayarani , “Disease Prediction in Data Mining Technique” Vol. II, Issue I, January 2013 (ISSN: 2278-7720). [5] T.Karthikeyan , P.Thangaraju, “PCA-NB Algorithm to Enhance the Predictive Accuracy” 2014,IJET,vol.6(1) [6] Ankit Agrawal, Sanchit Misra, Ramanathan Narayanan, Lalith Polepeddi, Alok Choudhary, “A Lung Cancer Outcome Calculator Using Ensemble Data Mining on SEER Data,” BIOKDD 2011, August 2011, San Diego, CA, USA, 2011. [7] S. S. Mohamed and M. M. A. Salama, “Computer-aided diagnosis for prostate cancer using support vector machine,” Proceedings SPIE Med. Imag., vol. 5744, pp. 898–906, 2005.
[8] MS.Mehdi Khundmir Iliyas, “Heart disease prediction using naïve Bayes and kmeans techniques”, IJRPET, VOLUME 3, ISSUE 6, Jun.-2017, ISSN: 2454-7875 [9] S. Vijayarani and S. Sudha ,” An Efficient Clustering Algorithm for Predicting Diseases from Hemogram Blood Test Samples “ Vol 8(17), DOI: 10.17485/ijst/2015/v8i17/52123, August 2015 [10] Priyanka D ,Ms S Shehar Bano , Prediction on lung disease using k-means algorithm, IJERT vol 1 issue 11, 2014 [11] Tanupriya Choudhury, Vivek Kumar ,“ Intelligent Classification & Clustering Of Lung & Oral Cancer through Decision Tree & Genetic Algorithm ,” IJARCSSE, Volume 5, Issue 12, December 2015 ISSN: 2277 128X .
[12] P.Ramachandran , N.Girija and T.Bhuvaneswari ,“ Early Detection and Prevention of Cancer using Data Mining Techniques ,”IJCA vol (97) no-13,2014. [13] Ada , Rajneet Kaur ,“ A Study of Detection of Lung Cancer Using Data Mining Classification Techniques”,IJARCSSE, vol 3 issue 3,2013