ABSTRACT – Early disease prediction isone of the core elements of biomedical andhealthcare communities to improve thequality of prior diagnostics for fataldiseases like Congenital Heart Disease,Cancer etc.
Advanced Data Miningtechniques can help remedial situations.Experimenting the medical structured datawith data mining concepts like Classifiersand Association Rule Mining (ARM)techniques helps in the detection ofoccurrence for a particular disease.Medical data set obtained from the opensource of United Kingdom is processedand analysed for heart disease predictionand then the system suggests hospital forfurther treatment. Accuracy comparisonbetween the classifier algorithms used isgenerated from R Studio.
This predictionresults pave way for proper diagnosis andearly treatment of chronic diseases. It canbe used to mitigate the death rate increasedue to the late prediction of fatal diseasesonly at the critical stage.Key Words – Data mining, CongenitalHeart Disease, ARM.I. INTRODUCTIONThe healthcare industry collects reliableand huge amounts of healthcare datawhich, unfortunately, are not “mined” todiscover hidden information for effectivedecision making. Clinical decisions areoften made based on doctors’ intuition andexperience rather than on the knowledgerich data hidden in the database.I.
1 ClassificationThere are two forms of data analysis thatcan be used for extracting modelsdescribing important classes or to predictfuture data trends. These two forms are asfollows?Classification? PredictionClassification models predict categoricalclass labels; and prediction models predictcontinuous valued functions. The twomain efficient classifiers implemented hereare the Decision Tree and Naïve baye’sclassification algorithm.I.2 Association Rule MiningAssociation means finding relationshipbetween different data items in a same datatransaction that is used to discover varioushidden patterns. For instance, if someonebuys a desktop (A), then they alsopurchases a speaker (B) in 55% of theoccurrence. This relationship occurs in8.2% of desktop buys.
An association rulein this condition can be A intend B. 55% isthe CF (confidence factor) and 8.2% is theSF (support factor). Apriori algorithm,Pincer search and AprioriDP are theefficient ARM algorithms in data mining.II. LITERATURE SURVEYA review is carried out on differenttechniques used by researchers in theprediction of disease. Enormoustechnologies of Data Mining are involvedin design of disease prediction model.M.
C.S.Geetha et al.
, (2017) have proposeda system of Analyzing the Suitability OfRelevant Classification Techniques OnMedical Data Set For Better Prediction. Asemphasized by the authors, It is complexfor medical practitioners to envisage theheart attack as it requires experience andknowledge. The health sector todaycontains concealed yet significantinformation for making decisions. Hencethey have applied and analysed thecommonly used classification algorithmson medical data set that helps to predictheart disease that accounts to be theprimary cause of death worldwide. Theresearch results do not presents aremarkable difference in the predictionwhen using dissimilar classificationalgorithms in data mining.
The experimentcan serve as an significant tool forphysicians to predict dangerous cases inpractice and counsel accordingly. Therepresentation given in the paper will beable to respond more difficult queries inforecasting the heart attack diseases. Thepredictive accuracy determined byREPTREE, J48 and BayesNet algorithmspropose that parameters used areconsistent indicator to predict the heartdiseases. In the future, more parameterscan be considered for better prediction 1.Sarath Babu et al.,(2017)have desinged asystem of Heart Disease Diagnosis UsingData Mining Technique.
Medical datamining has a great potential for exploringthe hidden patterns in the data sets ofmedical domain. These patterns can beutilized to do clinical diagnosis. These dataneed to be collected in a standardizedform. From the medical profiles fourteenattributes are extracted such as age, sex,blood pressure and blood sugar etc. canpredict the likelihood of patient gettingheart disease.
These attributes are fed in toK-means algorithms, MAFIA algorithmand Decision tree classification in heartdisease prediction, applying the datamining technique to heart diseasetreatment. Decision Tree has tremendousefficiency using fourteen attributes, afterapplying genetic algorithm to reduce theactual data size to get the optimal subset ofattribute acceptable for heart diseaseprediction 2.Ilham KADI et al.,(2015) developed Adecision tree-based approach forcardiovascular dysautonomias diagnosis.In this paper, a case study was performedin order to construct a cardiovasculardysautonomias prediction system usingdata mining techniques and a datasetcollected from an ANS(AutonomicNervous System) unit of the Moroccanuniversity hospital Avicenne. Theprediction system is a decision tree-basedclassifier that was developed using C4.5decision tree algorithm.
A comparisonbetween the accuracy rates obtained usingC4.5 algorithm, K-NN and Naïve Bayes(NB) classifiers was carried out in order toassess the performance of our system,.These classifiers have achieved goodperformance and high accuracy rateswhich were very promising, but still lowerin comparison with the performance ofC4.5 algorithm. C4.5 algorithm is one ofthe well-known decision tree algorithmsbecause of its efficiency andcomprehensive features.
The results wereanalysed based on three main goals namely: accuracy, interpretability andusability. Thus, the prototype wasapproved to be highly accurate,interpretable, time saving and easy to use.In fact, the prediction system developedcan automate the analysis procedure of theANS’s test results and make it easier forspecialists.
It can also provide decisionsupport for cardiologists to assist them andhelp them to make better clinical decisionsor at least provide them a second opinion3.Sandeep Kaur et al., (2016) proposed aDisease Prediction using Hybrid K-meansand Support Vector Machine. Predictivedata mining in the field of medicaldiagnosis is an emerging research area. Ahybrid K-means algorithm and SupportVector Machine algorithm (SVM) fordisease prediction is proposed in this paperto improve the efficiency and accuracy forprediction.
The hybrid K-means algorithmis applied for dimensionality reduction toremove outliers and noisy data. SVMs helpin minimizing the errors and also examinethe medical data in shorter time.The reduced dataset is given as an input toSupport Vector Machine classifier. Thehybrid algorithm is developed byanalysing the various enhanced Kmeansalgorithms and then selecting the two bestenhanced algorithms based on theirperformance. The proposed work is toselect the initial centroids by partitioningthe data into k equal parts.
The simulationis performed on diabetes dataset inMATLAB. The final result of simulationshows that the efficiency achieved byproposed algorithm is better than simpleK-means algorithm. The final result ofsimulation shows that the accuracyachieved by purposed algorithm is betterthan simple K-means algorithm.
The Kmeansachieved the accuracy of 82% andthe hybrid algorithm achieved the accuracyof 92% on the same dataset. The ProposedModel can be applied to any datasetincluding Breast cancer, Pima Diabetes,Surgery dataset, Iris dataset etc 4.Swaroopa Shastri et al., (2017) developedData Mining Techniques to PredictDiabetes Influenced Kidney Disease. Theobjective is to give away a service thathelps the users to have a check up beingsitting in the same place and get the resultof occurrence of the diabetic disease byproviding the details to the application thatis designed to help out the users withappropriate outcomes. In this system thedatasets are analysed using Apriorialgorithm to calculate the probability andgenerate the prediction. In this applicationthe detailed correlation involving diabetesand kidney disease is addressed with asuitable be bothered into a verdict. It helpsdoctors to suggest the best medicationswith the forecast providence utility for theusers that makes them aware in advanceabout the chances of getting the diabetickidney related disease.
5.Jagdeep Singh et al., (2016) developed aPrediction of Heart Diseases UsingAssociative Classification.
In this papervarious association and classificationmethods are implemented on the heartdatasets to predict the heart diseases. Theassociation algorithm like Apriori andFPGrowth are used to finds associationrules of heart dataset attributes. Theclassification algorithms like J48, ZeroR,NaiveBayes, OneR and k-nearestneighbour are implemented on trainingdataset and the output of each algorithm isevaluated of basis of corrected classifiedinstances. The main contribution of thepresent study to attain high prediction accuracy for early diagnoses of heartdiseases.
The proposed hybrid associativeclassification is implemented on wekaenvironment. The comparative resultsshow that IBk (k Nearest Neighbor) withApriori associative algorithms producesbetter results than others. The experimentalresults show that large number of the rulessupport in the better discover of heartdiseases that even support the heartspecialist in their diagnosis judgements.Finally an expert system is developed forthe end user. On the basis of betterperformance and corrected classifiedinstances of implemented algorithm, theIHDPS (Intelligent Heart DiseasePrediction System) is purposed forprediction of heart diseases 6.
Dao-I Lin et al., (2006) designed PincerSearch: An Efficient Algorithm forDiscovering the Maximum Frequent Set.In this paper, the authors presented a novelalgorithm that can be efficiently discoverthe maximum frequent set.
The PincerSearchalgorithm could reduce both thenumber of times the database is read andthe number of candidates considered. Avery important characteristic of thealgorithm is that it does not require explicitexamination of every frequent itemset. Theauthors of this paper, have evaluated theperformance of the algorithm using wellknown synthetic benchmark databases,real life census, and stock marketdatabases. The improvement inperformance can be upto several orders ofmagnitude, compared to the best previousalgorithms. Structural properties and basicdiscovery approaches used are: themaximum frequent set, Closure properties,Discovering frequent itemsets, Apriorialgorithm 7.III. HOW DATA MINING IS USEFULIN THE MEDICAL FIELDDue to the vast use of computers in thehospitals and by doctors who practice, alarge amount of information is gathered. Intoday’s scenario, the medical institutionshave subtle yet enough information of dataof patients.
Huge set of data consist ofrelevant information of the patient alongwith lot of other information which is thenoise. The entire set of data may be usedby the practitioners but the data minershave to extract only specific concernedinformation know as knowledge.Emerging research demands the use oftechnology available to be helpful for thesociety globally. With the available miningtools it is possible to design a model whichcan be helpful for the health care industry.The tools can provide us with accurate andtime to time report needed for thepractitioners so that the patient isbenefited.IV.
CONCLUSIONIn this paper, a survey conducted from2002 to 2016 gives the different modelsavailable and the different data miningtechniques used. With data mining growthin biomedical and healthcare communities,accurate analysis of medical data benefitsearly disease detection, patient care andcommunity services. However, theanalysis accuracy is reduced when thequality of medical data is incomplete.Moreover, different regions exhibit uniquecharacteristics of certain regional diseases,which may weaken the prediction ofdisease outbreaks.
Thus, handling datawith proper parameters and constraintsusing efficient algorithms will givepromising results.V. FUTURE ENHANCEMENTThe main objective is to identify thepatterns and features from the medical dataof the patient by combining Classifiers andAssociation Rule Mining techniques forprediction of diseases.The other objective is to suggest respectivemedical hospitals suitable for the predicteddiseases to engage with further diagnosisand surgical treatments.