HEART DISEASE PREDICTION USING K-MEANS CLUSTERING TECHNIQUE
HEART DISEASE PREDICTION
ABSTRACT
The data mining algorithms are dedicated to probe the
hidden, new patterns and collective relations from the enormous datasets. In
this proposal K- means Clustering data mining algorithm is dedicated to splitt
or cluster the patient heart condition to test whether his/her heart normal or
stressed or highly stressed. If we apply the K- means clustering algorithm on
complex dataset, the output results of clustering are difficulty to gauge and
to retrieve the required results from these clusters. Thus, one more data
mining algorithm, the decision tree, is used for the interpretation of the
clusters of the K-means algorithm. In this work, integration of K-means
clustering algorithm with the decision tree algorithm is aimed. Also, another
learning technique such as Support Vector Machine (SVM) and Logistics regression is used. Heart disease
prediction results from SVM and Logistics regression were compared.
Keywords: K-means clustering; data mining; Support Vendor
Machine(SVM)
OBJECTIVE:
To Cluster
heart beat rates using Patient data by applying Data mining techniques (k means) .
To input the
clustered results to decision tree algorithm to compile result accurately.
To provide spreadsheet
of the patient based on clustering results, it indicates whether that concern
patient is normal or stressed or highly stressed.
To apply SVM classification and
logistics regression model and to predict heart disease severity.
Compare accuracy level of prediction
in k-means clustering output for every cluster
Compare accuracy result of SVM and
logistics regression
INTRODUCTION
Data mining (DM) is the extraction of
useful information from large data sets that results in predicting or
describing the data using techniques such as classification, clustering,
association, etc. Data mining has found extensive applicability in the
healthcare industry such as in classifying optimum treatment methods,
predicting disease risk factors, and finding efficient cost structures of
patient care. Research using data mining models have been applied to diseases
such as diabetes, asthma, cardiovascular diseases, AIDS, etc. Various
techniques of data mining such as naïve Bayesian classification, artificial
neural networks, support vector machines, decision trees, logistic regression,
etc. have been used to develop models in healthcare research.
An estimated 17 million people die of
cardiovascular diseases (CVD) every year. Although such diseases are
controllable, their early prognosis and a patient’s evaluated risk are
necessary to curb the high mortality rates it presents. Common cardiovascular
diseases include coronary heart disease, cardiomyopathy, hypertensive heard
disease, heart failure, etc. Common causes of heart diseases include smoking,
diabetes, lack of physical activity, hypertension, high cholesterol diet, etc.
Research in the field of
cardiovascular diseases using data mining has been an ongoing effort involving
prediction, treatment, and risk score analysis with high levels of accuracy.
Multiple CVD surveys have been conducted with the most prominent one being the
data set from the Cleveland Heart Clinic. The Cleveland Heart Disease Database
(CHDD) as such has been considered the de facto database for heart disease research.
Recommending the parameters from this database, this paper proposes a framework
to apply logistic regression, support vector machines, and decision trees to
attain individual predictions which are in turn used in rule based algorithms.
The result of each rule from this system is then compared on the basis of
accuracy, sensitivity, and specificity.
The methodology aims to accomplish of
two goals: the first is to primarily present a predictive framework for heart
disease, and the second is to compare the efficiency of merging the outcomes of
multiple models as opposed to using a single model.
OVERVIEW
Clustering
Clustering
is the process of grouping of data objects that are same to one other within
the cluster. They even grouped dissimlar objects into another cluster. It is
also called as data segmentation in some applications because it divides large
data set into groups according to the similarities.
Requirements
of clustering in data mining:-
1)
Deals with different types of attributes.
2)
Deals with noise data
3)
It requires minimum knowledge to determine input parameter.
4)
Usability
5)
More dimensionality
METHODOLOGY:
K-means clustering
algorithm
K-means creates k groups from
a set of objects so that the members of a group are more similar and based on
this data is clustered as normal, stressed or highly stressed.
Run decision tree algorithm for these
3 clusters
Decision Tree algorithm:
The decision
tree produces decision rules as an output.
The cluster is
the input data for the decision tree algorithm, which produces the decision
rules for the cluster.
Decision tree creates a tree
structure to classified data as yes-> prone to heart disease or no-> not
prone to heart disease.
HARDWARE REQUIREMENTS
Processor : Any Processor above 500
MHz.
Ram : 4 GB
Hard
Disk : 4 GB
Input
device : Standard
Keyboard and Mouse.
Output
device : VGA and High
Resolution Monitor.
SOFTWARE SPECIFICATION
Operating
System : Windows 7 or higher
Programming : Python 3.6 and related
libraries
Comments
Post a Comment