HEART DISEASE PREDICTION USING K-MEANS CLUSTERING TECHNIQUE


HEART DISEASE PREDICTION



ABSTRACT
The data mining algorithms are dedicated to probe the hidden, new patterns and collective relations from the enormous datasets. In this proposal K- means Clustering data mining algorithm is dedicated to splitt or cluster the patient heart condition to test whether his/her heart normal or stressed or highly stressed. If we apply the K- means clustering algorithm on complex dataset, the output results of clustering are difficulty to gauge and to retrieve the required results from these clusters. Thus, one more data mining algorithm, the decision tree, is used for the interpretation of the clusters of the K-means algorithm. In this work, integration of K-means clustering algorithm with the decision tree algorithm is aimed. Also, another learning technique such as Support Vector Machine (SVM) and Logistics regression is used. Heart disease prediction results from SVM and Logistics regression were compared.

Keywords: K-means clustering; data mining; Support Vendor Machine(SVM)




OBJECTIVE:
To Cluster heart beat rates using Patient data by applying Data mining techniques (k means) .
To input the clustered results to decision tree algorithm to compile result accurately.
To provide spreadsheet of the patient based on clustering results, it indicates whether that concern patient is normal or stressed or highly stressed.
To apply SVM classification and logistics regression model and to predict heart disease severity.
Compare accuracy level of prediction in k-means clustering output for every cluster
Compare accuracy result of SVM and logistics regression

INTRODUCTION
Data mining (DM) is the extraction of useful information from large data sets that results in predicting or describing the data using techniques such as classification, clustering, association, etc. Data mining has found extensive applicability in the healthcare industry such as in classifying optimum treatment methods, predicting disease risk factors, and finding efficient cost structures of patient care. Research using data mining models have been applied to diseases such as diabetes, asthma, cardiovascular diseases, AIDS, etc. Various techniques of data mining such as naïve Bayesian classification, artificial neural networks, support vector machines, decision trees, logistic regression, etc. have been used to develop models in healthcare research.
An estimated 17 million people die of cardiovascular diseases (CVD) every year. Although such diseases are controllable, their early prognosis and a patient’s evaluated risk are necessary to curb the high mortality rates it presents. Common cardiovascular diseases include coronary heart disease, cardiomyopathy, hypertensive heard disease, heart failure, etc. Common causes of heart diseases include smoking, diabetes, lack of physical activity, hypertension, high cholesterol diet, etc.
Research in the field of cardiovascular diseases using data mining has been an ongoing effort involving prediction, treatment, and risk score analysis with high levels of accuracy. Multiple CVD surveys have been conducted with the most prominent one being the data set from the Cleveland Heart Clinic. The Cleveland Heart Disease Database (CHDD) as such has been considered the de facto database for heart disease research. Recommending the parameters from this database, this paper proposes a framework to apply logistic regression, support vector machines, and decision trees to attain individual predictions which are in turn used in rule based algorithms. The result of each rule from this system is then compared on the basis of accuracy, sensitivity, and specificity.
The methodology aims to accomplish of two goals: the first is to primarily present a predictive framework for heart disease, and the second is to compare the efficiency of merging the outcomes of multiple models as opposed to using a single model.

OVERVIEW
Clustering
Clustering is the process of grouping of data objects that are same to one other within the cluster. They even grouped dissimlar objects into another cluster. It is also called as data segmentation in some applications because it divides large data set into groups according to the similarities.
Requirements of clustering in data mining:-
1) Deals with different types of attributes.
2) Deals with noise data
3) It requires minimum knowledge to determine input parameter.
4) Usability
5) More dimensionality

METHODOLOGY:
K-means clustering algorithm
K-means creates k groups from a set of objects so that the members of a group are more similar and based on this data is clustered as normal, stressed or highly stressed.
Run decision tree algorithm for these 3 clusters
Decision Tree algorithm:
The decision tree produces decision rules as an output.
The cluster is the input data for the decision tree algorithm, which produces the decision rules for the cluster.
Decision tree creates a tree structure to classified data as yes-> prone to heart disease or no-> not prone to heart disease.



HARDWARE REQUIREMENTS
Processor                   : Any Processor above 500 MHz.
Ram                           : 4 GB
Hard Disk                  : 4 GB
Input device                        : Standard Keyboard and Mouse.
Output device                     : VGA and High Resolution Monitor.

SOFTWARE SPECIFICATION
Operating System               : Windows 7 or higher
Programming                     : Python 3.6 and related libraries



Comments