An Efficient Feature Selection Algorithm for Health Care Data Analysis

R Mythily, Aisha Banu.W, Dinesh Mavaluru


Health is the primary concern of any person and to be able to predict any disease before it occurs could save thousands of lives. There exist many different types of diseases, and one particular disease, which is present in at least one out of five people in the world, is Diabetes. It is a silent killer, which will slowly kill the person if it goes undetected. The existing system which uses F-score method and K-means clustering of checking whether a person has diabetes or not are 100% accurate, and anything which isn't a 100% is not acceptable in the medical field, as it could cost the lives of many people. Our proposed system aims at using some of the best features of the existing algorithms to predict diabetes, and combine these and based on these features; This research work turns them into a novel algorithm, which will be 100% accurate in its prediction. With the surge in technological advancements, we can use data mining to predict when a person would be diagnosed with diabetes. Specifically, we analyze the best features of Chi-Square Algorithm and Advanced Clustering Algorithm (ACA). This research work is done using the Pima Indian Diabetes dataset provided by National Institutes of Diabetes and Digestive and Kidney Diseases. Using classification theorems and methods we can consider different factors like age, BMI, blood pressure and the importance given to these attributes overall, and singles these attributes out, and use them for the prediction of diabetes.


  • There are currently no refbacks.

Bulletin of EEI Stats