Ensemble learning classifiers hybrid feature selection for enhancing performance of intrusion detection system

ABSTRACT


INTRODUCTION
The currently, the implementation of electronic technologies in every field and industry operation implies an ongoing positive direction in the growth of global connectivity in terms of the number of connected machines and communications.In this situation, communications networks and systems are always being targeted by intruders.Therefore, inspecting and discovering network attacks play a key role in sustaining critical security functions like availability, confidentiality, privacy, and integrity.For example, it is interesting to note two recent assaults, VPNFilter [1] and distributed denial of service (DDoS) [2], [3], in which thousands of computer devices were hacked, causing huge financial effects in addition to human costs.
In this regard, an intrusion detection system (IDS) is the most extensively utilized defensive line in communication and information technology for inspecting and discovering network attacks, acting as a strong instrument to combat versus various forms of network attacks [4].An IDS can be classified into network-IDS and host-IDS.A host-IDS uses access Syslog files from end systems.On the other hand, a network-IDS investigates network traffic that passes via the network by using packet filtering [5].Based on the intrusion detection mechanism, IDS may also be typically categorized as signature-based, anomaly-based, and hybrid [6].Signature-based compares various kinds of intrusions with a pre-defined of signatures (patterns of  ISSN: 2302-9285 Bulletin of Electr Eng & Inf, Vol. 13, No. 1, February 2024: 665-676 666 intrusion).One of its drawbacks is that it unable efficiently identify anonymous intrusions because of outdated databases and zero-day assaults.Anomaly-based reveal intrusions through learning anomalous and normal network traffic behaviors and have good detection abilities for anonymous intrusions.However, according to the issues of class imbalance and redundant attributes in IDS datasets, this approach may guide to false positive rate.Machine learning (ML) (i.e., ensemble learning) and statistical approaches are commonly applied for anomaly-based approaches.Hybrid-based integrates anomaly-based and signature-based [7].
Therefore, ensemble learning has been broadly utilized in IDSs due to their ability to learn and detect patterns of intrusion from network traffic via statistical techniques and algorithms [8].Ensemble learning is a technique that integrates the outcomes of two or more ML classifiers trained separately to produce better performance than individual classifiers.There are various types of integrating, including voting ensemble (i.e., hard voting and soft voting) and stacked model.In this work, we propose ensemble learning classifiers combined by all these types.
Network data traffic contains a lot of irrelevant and redundant attributes or features.Thus, to inspect all attributes, it takes more time (processing cost) and lead to a performance reduction in the classification task.Therefore, it is not suitable to utilize all attributes via the IDS.Consequently, utilizing feature selection (FS) approaches in the preprocessing part have immense potential to enhance the performance of ensemble learning operations when blending with IDS.The benefits of FS involve data reduction, data understanding, reducing processing cost, and determining the amount of storage space required.FS approaches can be classified into filter based, wrapper based, and embedded methods.Many research papers have been implemented based on diverse FS approaches to aid IDS to enhance performance and decrease the rates of false alarms [9]- [11].In this research, we suggest a new IDS depends on a hybrid FS approach which fuses three different FS approaches that can diminish downsides and inherent biases when employed individually and ensemble learning classifiers.The key contributions of this work are as: a.We suggest a new methodology that integrates the advantages of ensemble learning classifiers and FS approaches to obtain an accurate and effective IDS.b.In the feature selectioan stage, we propose a hybrid FS approach contains two techniques, namely, hardvoting and mean.In hard-voting technique three different FS are fused in order to reduce the number of features, then, each feature has three diverse weights due to these methods, mean technique has been applied to assign one weight for each feature and to obtain an optimal subset of features with just 10 features.c.Through data preprocessing, to avoid degrading the model's performance and solve the typical unbalanced dataset problem.

RELATED WORK
Since it is regarded as one of the most difficult risks in network security, intrusion detection, as a classification issue, has become an extremely prominent research area.However, several solutions have been presented to enhance IDS performance.In this part, we consider some works that full within ML-based IDS, use: a.On feature reduction (or selection) approaches In order to reduce computation time, the approach of feature reduction, which can be utilized as a preprocessing stage in ML techniques, aims to improve the performance of IDSs as well as exclude useless features [12].In order to acquire an effective and more reliable classifier.Hota and Shrivas [13] proposed a model that utilized diverse FS approaches to exclude unessential features.The outcomes demonstrate that C4.5 with mutual information (MI) can gain the maximal accuracy for the NSLKDD dataset with just 17 features.Ustebay et al. [14], used recursive feature elimination (RFE) with RF for CICI-DS-2017.This dataset contains more than 80 features.They utilized RFE in the experiment to assess the outcomes of choosing 1 to 81 features.The most vital features, Src-port, flow-packets, flow-IAT-Std, and flow-IATmean, are selected.Then, MLP for IDS performed with a classification accuracy of 0.89.Because of the small size of the dataset used to train the model, the performance is inadequate.In order to find important features for network-IDS, Khammassi and Krichen [15] implemented a wrapper-based FS that depends on genetic algorithms and, for classification, logistic regression is used.The outcomes show that their strategy produces detection rates with just 20 and 18 features for the UNSW-NB15 and KDD-Cup99 datasets, respectively.Alazzam et al. [10] proposed a new way for FS that depends on pigeon-inspired optimizer.The proposed method binarized continuous variables depends on the cosine similarity measure and is compared with the standard swarm algorithm, which utilizes a sigmoid function.The authors evaluated their method on the UNSW-NB, KDD-CUP99, and NLSKDD datasets.The proposed method outperformed various wellknown FS methods owing to false positive rates (FPR), true positive rate (TPR), F1-score, and accuracy.b.On ensemble learning classifiers Furthermore, ensemble learning are standard ML techniques that mix various base learner models to minimize FPR and provide more reliable findings than just an individual model.Hsu et al. [16] used ensemble classifier technique for network-IDS, using support vector machine, auto encoder models, and RF.Depending on their outcomes, the researchers showed that ensemble classifiers reduce FAR and improve classification accuracy.Jabbar et al. [17] suggested an ensemble classifier (cluster-based) for IDS, which utilizes the k-nearest neighbor (KNN) algorithm and alternating decision tree.They showed that the proposed classifier performs better than other existing methods due to detection rate and accuracy.Kumar et al. [11], have introduced an ensemble model that relies on chi-squared automatic interaction detection (CHAID), C5, quick unbiased efficient statistical tree algorithms (QUEST), and classification and regression tree (CART) tree-based models.They have utilized the UNSW dataset for the training phase and to evaluate their work versus unseen attacks.The researcher decreases the count of features to only 13 by using MI FS.Then, they utilized decreased features to classify network attacks, which are probe, DoS, exploit, normal, and generic.The proposed model achieved an accuracy of 83.4%.Zhang et al. [18] proposed the Relief algorithm and information gain (IG) FS techniques with RF for IDS.On the NSL-KDD dataset, they performed three numbers of experiments.First, the authors investigate the performance utilizing the ReliefF algorithm and IG independently and then check it with their integrated method, ReliefF-IG.The ReliefF-IG method can initially utilize IG to decrease the number of features and then rank the significance by using the ReliefF algorithm, which results in reduced computation complexity and time needed for FS.The outcomes indicate that the ReliefF-IG method can obtain better accuracy than the single ReliefF and IG techniques.Megantara and Ahmad [19] used RFE and mean decrease in impurity (MDI), a hybrid-based FS with the NSL-KDD dataset.To determine the rank of features, MDI is used as a filter approach.Then RFE can therefore reduce the dimension of features via a decision tree classifier.The experimental results show that utilizing decision tree classifier U2R, R2L, probe, and DoS categories obtain 99.4%, 81.3%, 91.2%, and 89.1% accuracy in performance individually.c.On hybrid methods Nowadays, various hybrid methods employing both FS and ensemble approaches have been achieved to enhance the performance of the IDS.Kasongo and Sun [20] utilized an ensemble FS based on the Xgboost algorithm to IDS, and performance was evaluated on the UNSW-NB15 dataset utilizing ML approaches.The authors picked 19 out of 42 features due to Xgboost algorithm.The outcomes indicate that applied Xgboost algorithm with decision trees, the detection accuracy was enhanced by 1.9% relative to the benchmark performance utilizing all attributes.To build a model with high accuracy and low FPR, Malik et al. [21] proposed a hybrid approach of particle swarm optimization (PSO) and RF.The proposed method improves the accuracy of the model by choosing the most important features for each class.Pham et al. [22] proposed a hybrid approach that employs gain ratio (GR) as feature reduction and bagging to integrate tree-based classification models.Bagging models that employ J48 as the classifier model and use 35 features from the NSL-KDD dataset produced the highest performance in experiments.Tama et al. [23] proposed a new IDS that depends on integrated FS and two-stage classifier ensembles.The experimental outcomes demonstrate that it performs an important enhancement of the recall measure on the UNSW-NB15 and NSL-KDD datasets.

Feature selection
FS is one of the crucial stages in ML techniques and IDSs.The determination of the appropriate FS approach and it's utilized in operations has an impact that will improve the performance of the IDS.It also has the impact of reducing the operational load as it reduces the number of features on the dataset and creates new relationships between features [24].Therefore, there is no one way or technique for FS [25].FS approaches can be classified into filter based, wrapper based, and embedded methods.In filter-based approaches, evaluate the significance of the features and the choice of the features (or attributes) depend on the statistics.Wrapper methods, on the other hand, use prediction performance as part of a subset of FS and evaluation operations.While embedding approaches are computationally less costly since they involve an association between the choice of features and the learning procedure [26].The technique to be utilized may differ due to the form of the dataset.The major issue in FS is choosing the feature that can effectively recognize between classes.Various FS approaches may be more suitable for various sets of data.Up-to-date intrusion datasets usually include lots of duplicate and useless features.Thus, the first stage in this research is to choose meaningful features and decrease the dimensionality of the used dataset.In this research, a hybrid approach merging by two techniques hard-voting and mean is proposed in order to boost the efficiency of the FS operation and improve the classification accuracy.The key role of this strategy is to assess the redundancy and the importance of the elected subset of features, which is explored in the provided search space in order to the optimum solution.

Mutual information
Having one attribute's information allows you to reduce the uncertainty in the other attribute to a certain extent.In other words, MI is a superior metric to demonstrate the interconnections among attributes X and Y, and it is known as [27]: ) (1)

ExtraTreesClassifier
An ExtraTreesClassifier is an ensemble learning approach that constructs numerous randomized decision trees to effectively model the data.This technique is designed to mitigate the risk of overfitting by introducing randomness into the process of splitting the data.Unlike traditional decision trees that determine split points based on metrics like entropy or Gini impurity [28], the ExtraTreesClassifier makes random splits for all observations in the dataset.By doing so, it encourages diversity among the constituent trees, ultimately contributing to a more robust and less prone-to-overfitting model.

Recursive feature elimination
RFE is a wrapper-based method.RFE starts by recursively eliminating predictors (features) and constructing a model depending on the remaining predictors.It utilizes model performance (i.e., accuracy) to decide which predictors engage the most in order to indicate the target predictor.RFE needs a specified number of predictors to keep, therefore, it is usually not known beforehand how many predictors are optimal [29].To acquire precise predictors the KNN algorithm is used with RFE FS method in this work.

Base (single) learner in ensemble learning classifier
The two biggest challenges for each IDS are the FAR and classification accuracy.FAR reflects the number of normal instances detected as attacks (or anomalies), whereas accuracy indicates the number of accurately detected instances.The goal of selecting the base learner in an ensemble learning model is not only to decrease the FAR but also to improve the classification accuracy of the IDS.In this work, RF, XGBoost, and MLP were realized as the base classifier.

Random forest
RF is a machine-learning technique that depends on lots of decision trees.Initially, it specifies how many decision trees are required to be constructed and then utilizes the bootstrap method to randomly pick a group of data for each tree.RF builds its component decision trees in order to reduce the relationship between individual trees.The randomness in the FS operation contributes to the RF performance gains, not the split points in the decision trees of the selected features [30].

Extreme gradient boosting
XGBoost is a popular and effective algorithm.Gradient boosting is a supervised learning approach that combines a set of estimates from many weaker and simpler classifiers to accurately predict a target variable.The XGBoost algorithm achieves well in ML challenges owing to its powerful dealing of distributions, relationships, and a large variety of data types.Moreover, it can deal with a wide range of hyper-parameters that can be fine-tuned.XGBoost can address ranking, regression, and classification issues [28].

Multilayer perceptron
MLP is a neural network, containing an input layer, an output layer, and one or several hidden layers [31] as shown in Figure 1.For classification issues, the number of classes is exactly the number of nodes in the output layer, whereas the amount of features is exactly the number in the input layer.The layers among output and input layers are usually dense (fully-connected) layers and are trained through back-propagation.() =  ( (2) +  (2) (( (1) +  (1) ))) Where f(x) denotes the output matrix, w(1) and w(2) are the weight matrices, b(1) and b(2) are the bias vectors, and g and s are the transfer functions.In our case, we use Relu as the transfer function for hidden layers, which transforms values less than 0 to 0, and the softmax function for the last output layer, which can help specify the best possible prediction.
Figure 1.The proposed hybrid FS approach and ensemble learning classifiers

Proposed method
To improve the recognition capability of IDS and protect the network or service providers from assault, we suggest ensemble learning classifiers and a hybrid-based FS approach.Throughout the trials, we divide the dataset into training and test sets to verify the efficiency of the models and identify normal data and diverse kinds of assaults (or attacks).Figure 1 shows the proposed hybrid FS and ensemble learning classifiers, which contains of the next four major stages:

Data pre-processing
The initial stage is to convert raw or initial data into an appropriate format for inspection by implementing cleaning data, undersampling data, oversampling data, label transforming (or encoding), and standardization of the InSDN dataset.The data preprocessing steps are defined in detail in section 4.2.

Feature selection
It is possible to assist the classification task in reaching its ultimate goal by selecting the appropriate feature and number of features.In addition to overwhelm the issue of the curse of dimensionality.Feature reduction phase is an important stage of model classification that could be carried out using a knowledge field or different intrinsic processes.In this work, the feature reduction stage is divided into two steps.In the first step, we trained separately three diverse FS approaches (extra tree classifier importance, RFE, and MI).Then, a hybrid approach is proposed to decrease the dimensions of the dataset and pick the most meaningful features for diverse kinds of attacks.The hybrid FS approach explains in detail in section 4.3.

Classification phase
In this stage, we employ the acquired reduced subset of features, obtained using the mean technique.These features are utilized to train three individual base models: XGBoost, RF, and MLP classifiers.Subsequently, ensemble learning classifiers, including hard voting, soft voting, and stacked models, are constructed based on these base models.This comprehensive approach aims to enhance the classification accuracy of the IDS.For more detailed, the classification phase is shown in section 4.4.

Evaluation results
Thanks to the ensemble learning classifiers, we can effectively identify and categorize various types of attacks as well as normal network traffic.These approaches achive low FAR and high classification accuracies, providing robust defense mechanisms for our system.The evaluation results are demonstrated in detail in section 4.5.

Dataset description
One of the widespread issues for ML IDSs is the unavailability of the datasets.Illegal issues and privacy are the primary reason for the absence of datasets in the IDS area.The network traffics include sensitive information, where the visibility of such information might disclose company and clients' secrets.To solve the preceding gap, several authors are generating their information to avoid any sensitive issues.In this research paper, we assess the proposed ensemble learning classifiers with hybrid FS utilizing the newly released InSDN dataset [32].The InSDN dataset contains recent popular attack types like DDoS, Probe, DoS, Botnet, password-guessing, web, and exploitation.Moreover, the normal network traffic in the dataset includes common applications like hypertext transfer protocol (HTTP), hypertext transfer protocol secure (HTTPS), electronic mail (Email), distribute database system (DNS), secure shell (SSH), and file transfer protocol (FTP).The InSDN dataset contains 361,317 observations for attacks and normal traffic, wherein 292,893 for attack observations and 68,424 for normal observations.Table 1 shows how these data observations are distributed.

Data pre-processing
In order to train the model, it is necessary to prepare the dataset because the data can be duplicated, noisy, inconsistent, and incomplete.The InSDN dataset is known in a flow_based form, with over 80 features obtained using the CICFlowMeter tool [33].Thus, in this research paper, the pre-processing stage contains cleaning data, under sampling data, oversampling data, label encoding, and data normalization.a. Cleaning data: the dataset includes socket information like Src-IP, Dst-IP, flow-ID, and so on.All these features are eliminated to overcome the over-fitting issue, and in addition, the socket features may vary from one network to another.b.Undersampling data: different portions in dataset classes lead to dataset imbalance, which is a serious issue that affects ensemble learning and degrades the model's performance.Therefore, we eliminate instances picked randomly from normal, DDoS, probe, and DoS classes.c.Oversampling data: from the statistics of the InSDN dataset as shown in Table 1, we can observe that the number of observations (examples) for web attack, botnet, and exploitation are small, and most ML approaches will ignore them, causing bad performance.Therefore, we used the SMOTE technique to duplicate the instances in minority classes.The new statistics of the InSDN dataset after performing undersampling and oversampling are depicted in Table 2. d.Label encoding: the used dataset includes continuous, binary, and symbolic values.For example, the attribute 'protocol' in the InSDN dataset contains symbolic worth for instance: "udp", "icmp", and "tcp".Because various models receive just numerical inputs, the conversion step is regarded as crucial and has an important effect on IDS classification accuracy.In this research, we substitute each specific worth with an integer number to deal with the non-numerical features.e. Normalization: various ranges between features can degrade ensemble learning classifiers, for instance, a feature that takes on a high integer value, such as 'Flow Duration' can dominate the classification performance.Thus, we utilized a fast and simple normalization method named min-max method [34] using (3) to map the feature values into 0-1 range.
Wherein     denote the maximum and minimum values of feature.

Feature reduction or selection phase
It is crucial to select the most meaningful features in order to achieve a low FAR, high detection rate, and low computational time.In this research, feature reduction (FR) is divided into two steps.First, three FR approaches (shown in section 4) were tested separately on the InSDN dataset.Due to the threshold value, each of them produces a different subset of features, wherein each feature that does not improve classifier performance is removed.Then, a hybrid FR method is proposed to exclude irrelevant and inconsistent features and provide a reduced dataset (an optimal subset of features), and utilizing that reduced dataset, ensemble learning classifiers can provide good results in various types of classification issues.The hybrid method contains two techniques.First, a hard-voting technique is used to reduce the subset of features from 77 to 15, where each feature that has just one vote will be removed, for instance, 'Fwd Pkts/s' feature is selected from only a MI method.On the other hand, each feature that has two or three votes will be selected.Second, the mean technique is used to combine the outputs of three FR approaches and assign a single weight for each feature because each feature has three weights from the previous step.This technique reduced the number of features from 15 to 10.

Classification phase
When an optimal subset of features is picked through the proposed hybrid FS approach, it will be fed into the classification phase (training stage), where two steps are employed.In the first step, three classifier models (i.e., RF, XGBoost algorithm, and MLP) are trained individually.In the second step, in order to achieve better classification accuracy, the classifier models commonly integrate numerous base classifiers in some manner.These models are efficient to handle the same issue and jointly perform a predicting outcome with constancy and improved accuracy by constructing numerous separate classifiers and merging them.
The aims for utilizing ensemble learning classifiers to enhance the efficiency are computational reason, generalization problem, and statistical reason.Initially, maybe an individual classifier is not sufficient to achieve the optimal generalization in the hypothesis area, thus, it is required to merge individual models to boost the model's performance.Secondly, when the original dataset is not adequate to train the classifier, the outcome may guide to a false or weak hypothesis.Finally, to produce an appropriate hypothesis, a separate classifier might consume a large amount of computational time, in which case the technique could be more likely to encounter issues.
Hard voting, soft voting, and stacking classifiers are most common in ensemble learning.They typically achieve better outcomes in classification tasks and are extensively used to construct many ensemble learning models.Furthermore, ensemble approaches have been demonstrated to enhance accuracy in several scenarios, such as IDS.Ensemble learning classifiers provide techniques for security experts to inspect similarities to previously known harmful or normal samples.Among tree-based methods, RF and XGBoost have been extensively utilized in the area of anomaly detection owing to their simple parameters and high efficiency, in addition to MLP are chosen to build the ensemble for multi classification IDS in this research.

Analysis of results
The proposed approach is assessed depending on its ability to classify network-traffic data into a valid kind.To assess the efficiency of the hard voting, soft voting, and stacking models, six metrics were used to assess the performance of the final classification for the proposed approach.The mathematical computations of the used evaluation criteria are clarified in [35].Before the proposed hybrid approach for FS, we preprocessed the InSDN dataset samples to overcome overfitting features.Then, we implemented extra tree classifier importance (ETCI), RFE, and MI methods on the training set to attain the significant ranking of 77 features as shown in Table 3.There are some features that have the least significant ranking in these methods, which may reduce the model's performance.We select 0.01 and 0.7 as the thresholds for ETCI and MI FS approaches, respectively, to exclude insignificant features, while the unwanted features are excluded recursively owing to the scoring model (i.e., accuracy) utilizing the RFE method on KNN algorithm.Thus, in ETCI features with a significant value greater than or equal to 0.01 were kept, whereas RFE chose the features using the KNN algorithm (k=3), and the selected features (29 features) had an accuracy of 0.98 were kept in RFE while in MI features with a significant value greater than or equal to 0.7 were kept.After eliminating insignificant features from three methods, respectively, three subsets of features were attained.15

673
Two hybrid techniques were used to obtain their union group: hard-voting and mean.In hard-voting, 15 features were kept based on hard opinion (or majority voting), which means that the feature is selected if and only if it has two or three votes.Then, using the mean technique, because each feature has three weights, take their sum and divide it by three to uniform these weights.After that, we pick 0.11 as a threshold for mean technique and obtained the optimal subset of features that includes just 10 features.After implementing the proposed hybrid-based feature reduction approach on InSDN dataset as shown in Algorithm 1, 10 significant features were finally chosen which can be further used in the classification phase.Table 3 depicts the details of an optimal subset of features for InSDN dataset.Finally, to greatly enhance the model performance of IDS, hard voting classifier, soft voting classifier, and stacked model are proposed.Algorithm 1.The proposed hybrid FS approach Input: list1-ExtraaTreeImp-sorted, list2-RFE-sorted, list3-mutualInf-sorted Output: Opt-sorted-list //an optimal subset of features Begin Hard-voting-dic=0, mean-dic=0 For each feature If that feature is found in at least two lists then // hard-voting technique Save the feature in a hard-voting-dic.// the output of this step is a reduced subset of features.End For each feature in a hard-voting-dic Acquire that feature and get all its three weights from list1-ExtraaTreeImp-sorted, list2-RFE-sorted, and list3-mutualInf-sorted and then compute a mean value and save it in a mean-dic.End For each feature in the mean-dic If the weight of that feature exceeds a threshold value // Mean technique Save it in an Opt-sorted-list End Sort the opt-sorted-list from most weighted to least weighted.Return an Opt-sorted-list // This list contains an optimal subset of features, which is then used to train the proposed models.

Comparison performance between our proposed FS approach and with no FS
To assess the efficacy of the proposed IDS approach, we compared the results among no FS and the suggested hybrid FS to identify normal instances and attacks.Gratitude to the chosen of important features (an optimal subset of features) via the proposed hybrid FS approach, the average worths of these measures, like accuracy, detection rate, precision, F1-score, and FAR, have improved significantly.Table 4 sums up the classification performance depend on the InSDN-dataset, which contains the outcomes of the single learner classifier and ensemble learning models.It is shown that the ensemble learning models are not better enough in many measures with no FS approaches.On the other hand, Table 4 indicates that the proposed hybrid FS with all ensemble learning classifiers achieves good results on the InSDN dataset.In particular, ensemble soft voting model yields the best and the highest accuracy of 0.999, precision or positive predictive value of 0.998, sensitivity or recall of 0.998, F1 score of 0.998 which is a good measure to utilize for the performance of each classifier, in the multi-classification task, and the minimum FAR of 0.02 relies on the InSDN dataset.Also, every single classifier utilizing an optimal subset of features achieves higher detection accuracy, precision, recall, F-score, and lower FAR compared to the single classifier with full features as shown in Tables 4 and 5, which significantly confirms the importance of the proposed hybrid FS approach.
Table 6 shows the building (or training) and testing times for all single and ensemble classifiers on the InSDN dataset.According to the number of selected features, the proposed hybrid FS approach with all single and ensemble sharply decreases the time overhead.It is also shown that the ensemble stacked model and single classifiers are faster than the ensemble soft voting classifier, which is neglected due to improved performance.The aim here is to come to terms between accuracy and speed-up to gain the best outcome possible.Overall, the outcomes illustrate that the number of selected features impacts the time required to construct and test all classifiers.At last, from Tables 4-6 the prediction outcomes of the individual and ensemble classifiers utilizing the optimal subset of features are all greater than the classifiers utilizing all features in terms of all metrics.Thus, the proposed hybrid FS approach with all proposed ensemble learning classifiers can be crucial in distinguishing between normal and attack instances.

CONCLUSION
The objective of this work is to present the significance of hybridization FS and ensemble learning classifiers in order to enhance performance of the IDS.We propose a new IDS approach, which depends on hybrid FS and ensemble learning classifiers to reduce the high dimension and deal with unbalanced data networks with low FAR as well as improve accuracy and F1-score metrics.A proposed hybrid FS approach depends on fusion of three different FS approaches using hard-voting and mean techniques.First, we get 15 relevant features due to the hard-voting technique via extra tree classifier importance, RFE, and MI.So, according to these three methods, each feature has three diverse weights; therefore, a mean technique has been used, to assign one weight to each feature and obtain an optimal subset of features with just 10 features.Then, the ensemble learning classifiers depend on RF, XGBoost, and MLP combined by hard voting, soft voting, and stacked model are introduced to build the prediction model.Finally, the suggested IDS is validated by utilizing an up-to-date InSDN dataset.
Experimental outcomes indicated that all the proposed ensemble learning classifiers had low FAR and high accuracy, and the best classifier due to accuracy, precision, recall, F1 score, and FAR is the ensemble (RF+XGBoost+MLP) combined by soft voting method with the subset of features (10 f.) obtained via our proposed hybrid FS approach.Accuracy of this classifier is 0.999, precision or positive predictive value is 0.998, sensitivity or recall is 0.998, F1 score is 0.998, which is a good measure to utilize for the performance of each classifier, in the multi-classification task, and the minimum FAR is 0.02.Since ensemble soft voting achieved higher accuracy, higher F1 score, lower FAR, and decrease significantly the training (or building) time from 462.483 s to 232.502 s, it is clear that ensemble soft voting is very effective in detecting and classifying normal and attacks type.Also, when compared to utilizing full features (77 f.), that is, without FS approach, it illustrates improving performance on diverse measures.Ensemble soft voting classifier had the best and highest performance, so we can show that the proposed hybrid FS with ensemble soft voting classifier is the most efficient and accurate one compared to other single or ensemble classifiers trained in this work.
In data preprocessing stage, we solved the imbalanced dataset issue to avoid degrading the model's performance.Therefore, we performed both undersampling on majority classes (normal, DDoS, probe, and DoS) and oversampling (i.e., SMOTE technique) on minority classes (web attack, botnet, and exploitation).In the future, we plan to implement the proposed FS approach over other IDS datasets with different ML models.

Table 1 .
The proportion of classes in the InSDN-dataset

Table 2 .
The new statistics of classes in the InSDN-dataset
Ensemble learning classifiers hybrid feature selection for enhancing performance … (Hasanain Ali Al Essa)

Table 4 .
Comparison performance for individual and ensemble classifiers with no FS approaches

Table 5 .
Comparison performance for individual and ensemble classifiers by utilizing our proposed hybrid FS approach

Table 6 .
Summary of building time and test time comparison