A missing data imputation method based on salp swarm algorithm for diabetes disease

Geehan Sabah Hassan, Noora Jamal Ali, Asma Khazaal Abdulsahib, Farah Jasim Mohammed, Hassan Muwafaq Gheni

Abstract


Most of the medical datasets suffer from missing data, due to the expense of some tests or human faults while recording these tests. This issue affects the performance of the machine learning models because the values of some features will be missing. Therefore, there is a need for a specific type of methods for imputing these missing data. In this research, the salp swarm algorithm (SSA) is used for generating and imputing the missing values in the pain in my ass (also known Pima) Indian diabetes disease (PIDD) dataset, the proposed algorithm is called (ISSA). The obtained results showed that the classification performance of three different classifiers which are support vector machine (SVM), K-nearest neighbour (KNN), and Naïve Bayesian classifier (NBC) have been enhanced as compared to the dataset before applying the proposed method. Moreover, the results indicated that issa was performed better than the statistical imputation techniques such as deleting the samples with missing values, replacing the missing values with zeros, mean, or random values.

Keywords


Classification; Diabetes disease; Machine learning; Missing values; Salp swarm algorithm

Full Text:

PDF


DOI: https://doi.org/10.11591/eei.v12i3.4528

Refbacks

  • There are currently no refbacks.


Creative Commons License
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.

Bulletin of EEI Stats

Bulletin of Electrical Engineering and Informatics (BEEI)
ISSN: 2089-3191, e-ISSN: 2302-9285
This journal is published by the Institute of Advanced Engineering and Science (IAES) in collaboration with Intelektual Pustaka Media Utama (IPMU).