A tree growth based forward feature selection algorithm for intrusion detection system on convolutional neural network

ABSTRACT


INTRODUCTION
Computer networks play an important part in today's society, but they are subject to invasions.As a result, the best available ways to defend the systems are required [1].Any action performed to compromise the system's integrity, confidentiality, or availability is referred to as an intrusion.A few intrusion prevention solutions would be used as a key security feature; however, they are incapable to cover entire demand for preventing security breaches.Because of design and programming flaws, as well as numerous penetration tactics, there will always be exploitable weaknesses in systems as they get more complicated [2].As a result, intrusion detection system (IDS) is required as a second stage to protect the system against such flaws.The IDS is useful not only for detecting existing intrusions, but also for preventing future intrusions [3]- [5].The DARPA"98 intrusion detection datasets are frequently used to assess IDS efficacy [6].They are grouped into four categories (DoS, U2R, R2L, and probe), which are further separated into 22 different types of attacks.In a denial-of-service (DoS) assault, intruders prevent authorized clients such as smurf, neptune, back, teardrop, pod, and land from using the service.In a probe attack, the intruders collect information about the host.
Large amounts of data that imitate real network data augment the temporal complexity of training and validation in intrusion detection systems, which is known as the dimensionality curse.Large data also consumes resources and can lead to decreased detection stability.It is practical to delete data that does not contribute to detection before processing.This leads to the creation of an effective feature extraction and Bulletin of Electr Eng & Inf ISSN: 2302-9285  reduction policy that not only helps to cut training time but also improves accuracy and protects against unforeseen threats.The training set is categorized into different groups with fewer numbers, and also simple subsets, using fuzzy clustering.This enables the artificial neural networks (ANN) to examine each subset considerably more quickly, with greater and better accuracy.The experimental findings reveal that the developed FC-ANN achieves an average accuracy of 96.71% [7].
Used the grid-search (GS) and support vector machine (SVM) algorithm to devise and detect anomalous behaviours [8].The goal is to improve the detection rate while lowering the false alarm rate.The paper employs the knowledge discovery and data mining tools competition (KDD CUP 1999) algorithm to test the program's functionality.The results demonstrate that this strategy boosted the rate of detection and decreased the number of false alarms [9].
Devised IDS utilising support vector machine and provided certain feature selection strategies [10].They investigated features using two feature selection methods.The forward feature and the linear correlation coefficient were used as techniques.The proposed approach selection algorithm, forward feature selection algorithm (FFSA) and the suggested modified-mutual information feature selection algorithm (MMIFS).
For better performance, created a network IDS.The first step is to model intrusion detection using deep neural network (DNN).The system is built using many source technologies, including tcpdump for packet capture, bro for traffic analysis, and tensorflow for machine learning.However, the results of repeated operations with DARPA"98 datasets and the resulting network flow test reveal that the proposed IDS-DNN system improves accuracy as well as attack detection [10]- [13].
The previous problem in IDS system is degradation of performance when number of data increases in the network.Deep learning is not suggested for using in small data processing environment.Deep learning requires large to massive data generation environment for performing efficient learning operation [14]- [16].In spite of these advantages, false positive data prediction degrades the accuracy of deep learning due to growing size of data.Huge data volume and input space reduce the quality of classifying intrusion in the network.This misclassifying data makes to increase in false positive results and degrades the performance of IDS.So, it is need of feature selection algorithm to identify the relevant features and classification is finally done on that feature.This sequence helps to improve the classification performance in IDS scenario.Feature engineering is dominant and interesting field in machine learning platform [17]- [20].Feature selection is classified as filter, wrapper and hybrid models.Some feature selection model uses intelligent algorithms for improving performance of the system.In this paper we use wrapper based intelligent techniques for improving intrusion detection.

BACKGROUND OF VARIOUS MACHINE LEARNING CLASSIFIERS 2.1. Support vector machine classifier
The SVM gains more usage among machine learning application due to its simplest processing style.Recently, cloud computing, big data, distributed environment uses SVM in all its data optimization and classification models.Supervised learning technique is the base for SVM, which classify various data in network.Linear problems as well as nonlinear problems are easily handled by SVM classifier.SVM first generates Hyper plane one to many among high dimensional data space.Hyper plane separates the data with optimal splitting strategy using type of class data belongs to and perform best classification [21]- [24].

K-nearest neighbor
K-nearest neighbor (KNN) is a most traditional and well-known classification model in machine learning platform.It computes the Euclidean distance for classifying the data in space [25]- [27].Let p and q be data in space S, then the distance of p and q, d(p; q), is derived in below expression: total number of data in space is represented as 'n'.The KNN uses Euclidean distance for selecting data in space.When the distance is minimum, data is nearest and optimal for selection.

Naïve Bayes
Naïve Bayes (NB) is old classifier based on Bayes' theorem [28]- [30].Features in the dataset are assumed as independent (naïve).Let A be an object with K number of features, which is to be classified with vector representation X(x1…..xn).NB works as given (17) for class M(k).(2) X is decided by (3).

Feed forward deep neural networks
Deep learning is widely accepted and proved in more complex solutions.Neural networks are various types which compute using artificial neurons.These ANN are reflection of biological neurons in the brain.In feed forward neural network, ANN forwards the received information at input.The real time problems with enhancing learning and approximation in nonlinear circumstances are used in feed forward ANN [24].Here rectified linear unit is used as an activation function.

Decision tree with random forest
Most complex data mining strategy uses decision tree (DT) feature extraction techniques.DT algorithm produces predictive copy in the tree shape based on process [31].There are three main nodes such as root (one node), internal (many nodes) and category (child nodes).Top-down classification is occurring in classification of DT till optimal leaf node is found.Multiple decision tree execution is termed as random forest classification.Table 1 describes the survey of various feature selection model.

Reference
Feature selection Dataset Technique [15] Greywolf optimizer-based feature selection UNSW-NB15 Machine learning [16] Nature-inspired optimization framework J48 (C4.5)UNSW-NB15 Machine learning [17] EvoloPy-FS optimization framework UNSW-NB15 Machine learning [18] Firefly algorithm and support vector machine NSL-KDD Machine learning [19] J48 (C4.5 decision tree) classifier NSL-KDD Machine learning [20] Deep boltzmann machines NSL-KDD Deep learning [21] SVM, extreme learning machine (ELM) and random forest (RF) NSL-KDD Machine learning [22] Deep auto-encoder based NSL-KDD and KDDCup 99 Deep learning [23] Recurrent neural networks NSL-KDD and KDDCup 99 Deep learning [24] GA-LR wrapper approach KDD Cup 99 Evolutionary based algorithms [25] Ant colony optimization (ACO) KDD Cup 99 Machine learning [26] Flooding DoS attacks KDD Cup 99 Machine learning An independent abuse identification method that combines a self-taught approach with the advantages of learning algorithms.Frameworks and MAPE-K while completing the plan activities within the MAPE-K platform, mployed sparse auto encoder for the unsupervised technique method to discover relevant attributes.The KDD Cup'99 and NSL-KDD databases were followed in the research.The fundamental flaw is that the user to root attack and root to Local assault module has a low discovery rate [32].
A 2 phase strategy approach on deep layered auto encoder that is both efficient and effective.With probability values, the dataset was first categorized into the assault and usual classes.These probabilities achieved then were submitted to the final judgment step for regular and multi-class classification attack categorization as an extra feature.The suggested model's performance is analyzed using the KDD Cup'99 and UNSWNB15 databases.A separate technique was used for these datasets to alleviate challenges presented by a class imbalance in the datasets.Downsampling was used during the KDD Cup'99 to eradicate replacement recordings.SMOTE was used to upsample the dataset to equalize the distribution of entries in UNSWNB15.The DR performance of attack classes with training samples improved considerably as a result of this processing of the database [33].
Three IDS models: completely connected systems, KyotoHoneypot, Variational AE, and Seq2Seq.IDS2017, MAWILab, NSL-KDD, and UNSW-NB15, recordings were all used to test these models.Found that when evaluating different methods in terms of detection rate among all datasets, the Seq2Seq trained model with two recurrent neural networks (RNNs) fared the best [34].Developed an ID model instancebased aggressive variational AE with regularisation and DNN (SAVER-DNN).The model's performance was evaluated using benchmark information from NSL-KDD and UNSW-NB15.The representation usefulness in recognizing lower occurrence and fresh threats has been confirmed by experimental evidence [35].475 Developed a multistage stage comprising the ID convolution operation and 2 heaps of fully linked levels, using the principle of AE.To recreate the data once more in the unsupervised step, two auto encoders were trained and evaluated using normal and attack streams.These recently rebuilt items are employed in the direct step to make new enhanced datasets that were inserted in 1D The outcome of that convolution operation was flattened and supplied convolution level, a database in conclusion classification used this softmax level.Experiments were passed away on the KDD Cup'99, CICIDS2017 UNSWNB15 database, their optional method outcome other DL models in comparison.They haven't proved how this model works for alternative groups.The next weakness was that it doesn't disclose any details on the attack's attributes [36].
Intrusion detection system solutions are not able to cope pace with the advancements in the field of high-speed networking.Before IDS products that are now used in the gigabit network can provide comprehensive protection against attacks, significant improvements are necessary despite their extensive documentation, intrusion detection solutions on the market can only identify 50% of all threats.As a result, the present focus of this thesis is to create a network-based IDS that can identify attacks in a big, high-speed, high-volume venture network.Research by Antunes et al. [37] offer a scenario in which an attacker attacks a specific system without authorization while avoiding harming other networks with the necessity to look for, and send hidden and sensitive data.Experts can carry out such an assault to conceal the entire attack and avoid detection.Large corporate IDS detection is also impossible since a single connection is targeted.The much more critical aspect of the assault is that existing systems are incapable of detecting such attacks, and the hacker can erase all traces of his or her activity before exiting the system.In the case of a complete invader, with exception of worms, imitation does not make it more difficult to recognize their destination by diminishing its functionality.Despite being able to identify denial of service intrusions, the IDS should be able to detect attacks as well as the intruder's motives.These attacks are network-specific, and the attacker's goal is to gain money.As a result, the severity of attacks has risen, as has the focus on a particular network.As a result, these attacks are difficult to detect with basic IDS, necessitating the use of improved IDS to identify severe attacks.Let's look at the history of network security to see how we might identify harsh attacks and learn more about security.

IMPROVED DEEP CONVOLUTIONAL NEURAL NETWORK 3.1. Data preprocessing
Step 1: data normalization Normalizing the data is the first step to reduce the noise.Removed noise from dataset helps to process without interruption.PCA is computed after normalization.Data is normalized by taking away data which is greater than mean value from the certain column.Consider there are two dimensions P and Q, all P become p and all Q become q.Then the mean computed will be zero for this condition.
We take dataset with 2D information and matrix will be as: where: Step 3: calculating the eigenvalues and eigenvectors.
Here we compute eigen vectors and eigen values for covariance matrix.The square matrix can be easily computed to eigen values and vectors.Eigenvalues is λ for given matrix and solution is obtained by (5); where: I : identity matrix of A matrix with equal dimensions D : matrix determinant λ : eigen value for eigen vector M get from following equation Step 4: the components are chosen and feature vectors are formed.
Here dimensionality reduction is done in features.Highest value is considered as principle component of the input dataset.It computes based on choosing value of eigen.Dimensions are reduced by choosing appropriate first x eigenvalues and rest is ignored.Note that if eigen value is less then we will loss huge needed feature.So, eigen value is selected based on covariance computed.

Tree growth based FFSA 3.2.1. Binary TGA algorithm
For feature selection we use binary tree growth algorithm (TGA) [1].Binary TGA is explained in following section.TGA is accomplished for continuous optimization problem with good performance.Major concern is, TG algorithm cannot be applied directly to feature selection process.Agarwal et al. [26] suggested novel enhanced TGA called improved TGA with grey wolf optimizer strategy.But grey wolf does not efficiently compute TGA for IDS.
In our proposed model we use fire fly optimizer with improved binary-TGA.Fire fly fitness value embeds with improved binary-TGA for optimal detection solution.In first phase binary TGA for feature selection is computed.Second phase linear mechanism is used for  computation.Third phase we embed firefly fitness with binary TGA to increase the performance of exploration and exploitation categories in solution.Feature selection is considered to be a NP hard problem due to large growing size of feature.TGA works perfectly by proving superiority in continuous problem.Various advantages TG algorithm encourage us to use in feature selection problem.TGA has to be modified with n dimensional Boolean space by using value 0'S OR 1'S.The transform function is used and utilized in TGA for working as binary function on in TGA.For changing continuous to discrete, all values in dataset are changed as continuous to discrete value with following (7).
Whereas n represents dimensions and it is a total feature in the dataset.Solution is obtained by (8).
Where as s i represents selected feature in ith position.If the s(i)=1 then it means feature is selected, else it is not selected.Example considers the selected value as S=(1, 0, 1, 1, 1, 0….), the features first, third, fourth, fifth are selected.The fitness function must be designed, such that it must provide more accurate feature in feature selection as well as classification model.In this model we use fitness model for evaluating the subset selection with high classification accuracy.
Classification error rate is computed as   (), |p| denotes number of selected features, |T| denotes total features in the dataset,  and  are two parameters to evaluate criteria 0 or 1.Here  ∈ [0,1],  = 1 −  used from [3].SVMs are learning techniques that employ a hypothesis space of linear functions in a highdimensional higher dimensional space.SVMs are built on the concept of linear separability, sometimes known as a hyper-plane classifier.Consider that there are N training pieces of information.{(x1, y1), (x2, y2), (x3, y3), . . ., (xn, yn )}, where xi belong to Rn and yi∈ {+1,−1}.Consider a hyper-plane defined by (w, b), where w is a weight vector and b is a bias, see Figure 1.We can classify a new object x with () = sin(. + ) = sin (∑ (.) +

𝑁 𝑓
). Remember that training matrices xi are a linear combination; there's a Lagrangian integrator I for each repetition.The Lagrangian integrators indices, I represent the significance of each interactive medium.When the biggest marginal high energy is found, only areas near it will also have I > 0. All of those are essential procedures.Figure 1 shows that i=0 for all other locations.This means only points near to the hyperplane constitute the assumption, optimised using minx ||w||2 /2 subject to yi(xi•w+b) 1.
These key pieces of information act as support vectors. Figure 2 displays two classes together with their margins, or borders.Support vectors are solid objects whereas nonsupport vectors are vacant.The margins are solely influenced by training examples; removing or adding empty entities won't modify them.However, any modification to the solid objects, such as their addition or removal, might alter the margins.The results of include items in the margin region are shown in Figure 1.As can be seen, adding or removing items that are too far from the boundaries, such predictor variables 1 or 2, has no impact on the edges.However, eliminating or make additions that were close to the intervals, such piece of proof 2 and /or 1, resulted in more margins.Under this configuration, a grouping trees is regularly created.
One of hierarchical trees employing this method is seen growing in Figure 3.The bold nodes are regression coefficients.Nodes 1, 2, 3, 5, 6, and 9 may grow since they're support vector nodes.Since nodes 4,

Bulletin of Electr Eng & Inf ISSN: 2302-9285 
A tree growth based forward feature selection algorithm for intrusion … (Mathiyalagan Ramasamy) 477 8, 7, and 10 are not vector support networks, we halt their growth in the interim.Growing tree increases information set's points for a more accurate performance.Developing a tree to a specific size/level using the set values (+3, +4, +5, +6, +7, 3, 4, 5, 6, 7).The dash (--) nodes indicate group without support vectors.The gradient boosting links are shown by the prominent nodes.We add +4 and +7's offspring to the training set.The SVM nodes 5 &7 are same.The updated database is (3,4,10,11,6,12,13), (+3, +11, +12, +5, +6, +14, +15).Notice that +4 and +7 are omitted since their offspring are in the training set.Extending vertices to the boundary area has already been fixed, and the make better (dotted line) is much more precise.From doing this, the classifier is correctly adjusted, increasing the accuracy level.RNN excellent correctness in prototypeidentification of keysection of IDs.Compared to feedforward and Elman recurring neural network models, RNN fared better.

Utilizing the enhanced inertia weight-based dragons flyer enhancer, feature choosing (IIW-DFO)
The International maritime organization and oceans is imitating the activities of the gonfly during migrating or during foraging.Swarming comes in two flavours: dynamic and static.Narrow groups of dragonfly chase other swarms in a limited zone, causing significant local activity changes.In a dynamical swarm, many dragonfly fly in one direction for a great range.Research and exploitation of steady and transient activity are key to swarm-based methods.To enhance domestic and international searching, tweak the five parameters.Following data methods, it has been determined that: a. Partition The aversion toadditional agents in the district to distinguish themself them.It is computed in the same way as in (5.1).
where X is be the position, N is no. of neighbors of the dragonfly,  is partitionmovement of the i th individual,   is position of the  th neighbor.b.Position Synchronizing of an individual's rapidity to that of a neighboring individual.It is the agent's configuration of velocity with the neighboring dragonflies' velocity vector.It is computed in the same way as in (11): where, A i is alignment motion for i th iterm and V is velocity of the j th neighborhood.c.Cohesion Measuring a person's distance from the neighborhood's center.It's written like (12): where,   is cohesion of the  th term and N is neighborhood of size.

d. Repulsion towards provisions
The motion of a dragonfly more toward the lure of food is depicted in (13).
=  + −  (13) where   is attraction of food of i th terms,  + is location of source of food.e. Enemies of distraction: dragonflies avoid adversaries, as shown in (14): where   is distraction,  − is enemy position.Position vector X, have been used to maintain the location of the particular dragonfly.The phase vector is much like the PSO computation velocity vector Dhanabal et al. [34], which is specified as Yin et al. [22] in the code (15).
where  is weight of interia and  is iteration of counter.The proportional gain of the step direction has been enhanced in this suggested study, as in (16).
where,  max and  min are initial and finish ranges,  max is iteration of max and t-iterations.As the number of repetitions increases, the weight value decreases, leaving the universal retrieval accuracy.After step vectors calculation, important while designing is changed (17).

IIW-DFO of pseudo code
The 5 synaptic weights, velocity distribution, and positional matrices get increased inertia strength up until the things that change is reached [37].If the dragons fly has any neighbors, the location is also changed.Customized attributes are put into such a deep learning-based classification scheme called DBCNN for vulnerability scanning (2020).
Input: Number of dragonflies & stepped vector Xi, i=(1,2,...n) 1: repeat the process in as instances as feasible (t maximum) 2: The goal ranges are established.3: Refresh the provisions supply &adversary.4: 5 weighted components S, A, C, F, and E using (10).( 15) 5: Update your neighbors' perimeter 6: dragonfly has a companion 7: Update angular rateSydne et al., (2020) using it (16) 8: Increase the inertial weighting using the inertial weighting (17) 9: Update the positioning and velocity with it (18) 10: if it not, 11: The location variable is updated using levy flight Hammad et la., (2020).12: If all else not pass, called it per daytime.13: The newedlocation of the dragonfly were determined by altering borders.14: End all Outcome: bring back the chosen features and functionality.

RESULTS AND EXPERIMENTS
MATLAB is used to conduct the experiments.The DARPA"98 dataset is a performance benchmark for intrusion detection methods.It is most well-known IDS with extensive ID data sets [27].It is classified into five categories, including normal and four forms of invasions (i.e., DoS, Probe, U2R and R2L) [10].It is made up of both training and test data.There are about 5 million records of connections in the training data, and over 2 million links in the testing data.All of the data is classified as either normal or intrusive.Real data is measured by using TPR as the real data with following calculation: The intrusion data is identified as an intrusion in the network using proposed TGFF algorithm is calculated as TNR using: The intrusions in the network sometime judged as good feature.This identity is measured as: The normal real feature sometimes judged as intrusion feature.This identity is measured using: Nest we calculate accuracy of intrusion detection in the network in percentage.Accuracy is defined as intrusions that are correctly predicted as intrusion feature and real feature is detected as real feature using proposed TGFF algorithm.Where: a. Accuracy: it's described as the design's overall appropriateness, and it's measured as the ratio of real classification.The accuracy comparison, such as FFSA, DNN, scale-hybrid-IDS-AlertNet (SHIA), and improved deep convolutional neural network (IDCNN), is shown in Figure 4. Systems are measured on the x-axis, and precision is measured on the y-axis.Kalman filtering is a method used in IDCNN-based attacker identification.The accuracy is calculated as like in Figure 4. b.Precision: by examining the false positive and true positive instances.Figure 5 shows a precision comparison of the SHIA, DNN, FFSA, and IDCNN algorithms.Methods are on the x-axis, and precision is on the y-axis.Kernel random vectors are used in IDCNN to compute the relevance vectors and number of samples.The great precision in intrusion detection is achieved by considering the large likelihood samples using IDCNN.According to the experimental data, the IDCNN achieves a precision of 90%, whereas other approaches such as KMIE, multilayer Bayesian, DNN, and LSSVM achieve 78%, 81.5%, 87.89%, and 86.78%, respectively.

CONCLUSION
A difficult area of research is finding network intrusions.There are other capabilities offered by the incursion that allow you to use the network as a real user.The effectiveness of detection in an IDS system is affected by a number of features.IDS is a widely used network security method that protects the integrity and accessibility of critical resources in secured systems.Machine learning is used to improve the efficiency of IDS, despite the fact that there are varieties of supervised and unsupervised research methodologies.In this article, we looked at various methods for detecting intruders, such as SHIA, FFSA, DNN, and improved DCNN.However, based on observations, there is still need to improve current intrusion detection algorithms in order to get appropriate results.

Figure 1 .Figure 2 .Figure 3 .
Figure 1.SVM's linear separating Bulletin of Electr Eng & Inf ISSN: 2302-9285  A tree growth based forward feature selection algorithm for intrusion … (Mathiyalagan Ramasamy) 479 as ratio of number of predictions is considered as correct.It is calculated by P: P+R: Sensitivity of the detection is calculated by representing all true positive evaluations divided by all positive evaluation.It is given by: Eng & Inf, Vol. 12, No. 1, February 2023: 472-482 480 Accuracy is otherwise tested by using F-measure score.It balances the precision and sensitivity in detecting real intrusion: