Detecting spam using Harris Hawks optimizer as a feature selection algorithm

Mosleh M. Abualhaj, Ahmad Adel Abu-Shareha, Sumaya Nabil Alkhatib, Qusai Y. Shambour, Adeeb M. Alsaaidah

Abstract


The Harris Hawks optimization (HHO) was used in this study to enhance spam identification. Only the features with a high influence on spam detection have been selected using the HHO metaheuristic technique. The HHO technique's assessment of the selected features was conducted using the ISCX-URL2016 dataset. The ISCX-URL2016 dataset has 72 features, but the HHO technique reduces that to just 10 features. Extra tree (ET), extreme gradient boosting (XGBoost), and support vector machine (SVM) techniques are used to complete the classification assignment. 99.81% accuracy is attained by the ET, 99.60% by XGBoost, and 98.74% by SVM. As we can see, with the ET, XGBoost, and k-nearest neighbor (KNN) techniques, the HHO technique achieves accuracy above 98%. Nonetheless, the ET technique outperforms the XGBoost and KNN techniques. ET outperforms other methods due to its robust ensemble approach, which benefits from the diverse and relevant feature subset selected by HHO. HHO's effective reduction of noisy or redundant features enhances ET's ability to generalize and avoid overfitting, making it a highly efficient combination for spam detection. Thus, it looks promising to combat spam emails by combining the ET technique for classification with the HHO technique for feature selection.

Keywords


Feature selection; Harris Hawks algorithm; ISCX-URL2016 dataset; Machine learning; Spam

Full Text:

PDF


DOI: https://doi.org/10.11591/eei.v14i3.9198

Refbacks

  • There are currently no refbacks.


Creative Commons License
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.

Bulletin of EEI Stats

Bulletin of Electrical Engineering and Informatics (BEEI)
ISSN: 2089-3191, e-ISSN: 2302-9285
This journal is published by the Institute of Advanced Engineering and Science (IAES) in collaboration with Intelektual Pustaka Media Utama (IPMU).