A novel classification and clustering algorithms for intrusion detection system on convolutional neural network

At present data transmission widely uses wireless network framework for transmitting large volume of data. It generates numerous security problems and privacy issues which laid a way for developing IDS. IDS act as preventive technique in securing computer networks. Previously there are numerous metaheuristic and deep learning algorithms used in IDS for detecting threats. Some are affected by dynamic growth of feature spaces and others are degraded in performance during detection of threats. One fine-grained model for intrusion detection can be developed by selecting accurate features and testing them with the intelligent algorithms. Based on these explorations, in this research IDS is implemented with intelligence from preprocessing to feature classification. At first stage, data preprocessing is done using binning concept to reduce noise. Secondly feature selection is done dynamically using dynamic tree growth algorithm with fire fly optimization techniques. Finally, these features are processed using DTB-FFNN for detecting anomalies perfectly. This DTB-FFNN is evaluated with popular KDD dataset. Our proposed model cable news network (CNN)-classification is compared with existing intelligent techniques: feed forward deep neural network, support vectors machines, decision tree, and CNN-clustering is compared with k-means, density-based spatial clustering of applications with noise (DBSCAN). The experimental outcome proves that dynamic tree based FFNN and CNN-clustering produce higher accuracy than the existing models.


INTRODUCTION
Intrusion detection is a device system helps to monitor the traffic in the networks for detecting threatened activity when discovered in network.This device is designed using software application for monitoring harmful features entering in the network for infringe network policies.If any malicious data enters the network and trying to access, then it is detected by intrusions detection system (IDS).The malicious activity is reported to administrator [1]- [3].Wireless computer networks are highly prone to numerous malicious attacks.Wireless networks are naturally opened with flexibility and mobility in medium [4], [5].These networks are secured using intelligent intrusion detection system [6].IDS can be classified in to five types based on host IDS, network IDS, protocol IDS, application protocol IDS and hybrid IDS [7].Further above IDS system uses three types of detection techniques using signature IDS, anomaly IDS and hybrid IDS [8], [9].IDS based on anomaly detection monitor and analysis the network as normal process and IDS is considered as an effective detection technique in terms of malwares, unknown entry in network.Due to high performance of detective system, IDS achieves less false positive rates and high accuracy in classifying intruders' features [10].The main crucial work is to analysis of machine learning techniques for IDS which helps to decrease false detection rate.When most of data processing environment communicates via wireless networks, we in this research design the intrusion detection using well defined deep learning techniques for wireless networks.IDS main role is to monitor the network with eagle eyes to detect the malwares entering to network for hacking data.our techniques must work like eagle eyes to detect intruders with high accuracy.The network intrusion detection system (NIDS) is capable of detecting malware entry in whole network through traffic feature.There is no need of all features for detecting the intruders in the network.When we choose consequential feature, it helps to minimize the execution time and increase malware detection rate.These weighted features favor the deep learning algorithms by increasing the accuracy of learning features.Major advantages of using feature selection and deep learning techniques are as follows: i) over fitting problem is prevented; ii) it avoids noise resistance; iii) executes detection operation in less time; iv) performance of prediction is improved.Some of popular machine learning techniques in IDS are k-nearest neighbor (KNN), decision tree, support vector machine (SVM), random forest, naïve bayes and multi-layered perceptions operates with deep learning algorithms.Huge data in the IDS degrades the performance of the machine learning techniques.One of imperative method to overcome this problem is choosing suitable algorithms and classification model to improve efficiency of the IDS system.This research tries to improve deep learning strategy with good feature selection models.Deep learning was first introduced by LeCun et al. [11] as advanced machine learning methodology for complex data computation.Multiple machine learning structures are represented and discussed in [12].Advanced fields like language processing, image processing and medical research are more success by using deep learning techniques [13]- [15].This makes deep learning to include in IDS classification models.The main contribution of this article includes, a. Reducing the feature noises with high accuracy using normalized PCA.Then the feature selection is processed by tree growth algorithm with fire fly optimizing techniques.b.Selected features are deeply analyzed using the feed forward neural network to ensure the selected feature is outsider or intruder in the network.c.The accuracy of prediction and less false positive rate is obtained by this proposed technology.
Rest of the paper is organized as follows: section 2 studies the existing articles on intrusion detection; section 3 describes the proposed methodology and working principal; section 4 describes result and experimental setup; section 5 concludes the article with future scope of the article.

LITERATURE SURVEY
In this study we tend to discuss various feature selection models in feature selection as well as classification using many good algorithms in AI.Deep leaning algorithm with feature selection provides optimal solution in various applications.The concept of IDS using non-symmetric deep auto-encoder (NDAE) and for the feature classification by using stacked based NDAE provides good accuracy.This IDS concept evaluates by using NSL-KDD and knowledge discovery in database (KDD) cup 99 datasets.The performance of two datasets yielded an accuracy of 97:85% on the KDD Cup 99 dataset than accuracy of NSL-KDD dataset [16]- [22].
Mondal et al. [23] proposed feature selection process based on the labeled multi-objective algorithm using the concept of mutual information (MOMI).This MOMI evaluate the experiments using WEKA tool in the classifiers of SVM and naive bayes (NB) [24].For the feature selection algorithm this paper proposed a framework of multilayer perceptron using controlled redundancy (FSMLP-CoR) [25].In this approach of FSMLP-CoR architecture has input layer, output layer with multiple hidden layers and it is used for regression, classification and prediction in several domains [26]- [28].In the dataset KDD cup 99 for feature selection ant colony optimization (ACO) algorithm is used in intrusion detection.This dataset has 41 features and ACO technique implements how ants remember their path using pheromones.It also evaluates the classifier library of binary SVM in WEKA [29].Hybrid model of SVM classifier with genetic algorithm in the intrusion detection system.The advantage of this hybrid model can reduce the features and implement it using 10 features [30].In reducing the number of features, it will increase the detection rate and performance of network intrusion detection.

PROPOSED METHOD FOR IDS
Network intrusion detection system is an important member of networks for detecting and stopping anonyms entry in to the network.while accessing the network, system features with personal features are identified by routers and networking devices.Every entry is monitored by the IDS device and if any intrusion feature is detected, it will inform to administration of network system.In this research we implement IDS in three phases: Figure 1 shows general working architecture of our proposed model.In first phase data is preprocessed by removing all high-level noises.In second phase tree growth-based firefly algorithm is used to detect the anonym's features.In third phase we use SVM and FFNN classifier for classifying the intrusions from selected features.

. Basic tree growth ideology
Tree growth (TG) is swarm intelligence-based metaheuristic approach inspired by trees growing behavior in jungle [3].Initially tree growth algorithm uses candidate set for constructing the tree.Then the tree population is divided using the fitness value.Best fitness tree is allocated in first group where it will grow further.In second group trees competitively grows with competition and moves to nearest of best trees with various angle to receive light.In third group replaces the weakest tree with new best one and fourth group where best tree multiplies to reproduce new one.Basically, tree growth algorithm has four mathematical backgrounds as follows: Steps to basic TG algorithm: Step 1: tree population is randomly generated as T1 initially.Fitness value used to select best first group tree using (1) Where i represent population among tree, Ɵ represents power reduction due to factors like aging, growth and reduced light (food) etc, 'ran' represents random tree selection parameter.Random numbers are binary 0,1 and root moves to    growth rate.If new tree attains better value than the existing, then new is replaced for old tree.Tuning the Ɵ is core part of TGA.
Step 2: next tree T2 moves to best tree its various value of angle.Each tree in at N2 set uses ( 2) and ( 9) for computing the distance between the trees. (2) Where t2 represents current tree in the 'i' th population.Two outcome x1 and x2 are chosen within minimum distance dis (i) by using ( 4) Step 3: tree which are worst T3 are eliminated.The size of population is, T=T1+T2+T3 Step 4: masking operation is performed and T4 is generated.

Tuning parameter
The factor  is important parameter used to adjust and select the best of tree in the tree population.In literature paper [3] used to tune the  before the tree simulation.Also, in literature [3] it is tuned after simulation.These two scenarios tuned in extreme conditions.It seems this tuning cannot provide reasonable tree growth strategy.In this article we claim to tune  from to ending at each iteration.As discussed in TGA,  is defined as reduction power of tree.Growth rate of tree cannot be constant at any time.In jungle, tree growth rate is based on soil nutrients.Based on this slow growing strategy we tune  linearly based on increasing pattern using (6).
where, c is current iteration, total (iter) denotes total number of iteration tree algorithm computes.This linear method increases tree reduction power with it age and nutrient.The result of the proposed ideology improves the feature selection model.

Embedding TGA with fire fly strategy
Fire fly is the swarm intelligence based meta-heuristic search tool used to search an optimal feature among the large feature in our research article.This firefly (FF) algorithm is based on social behavior of flies with lighting mathematical model [3].FF algorithm works by using light intensity variation and attractiveness.For optimal selection of feature, objective function is designed.Firefly light intensity (l) is proportional to objective function.Contraction of light by flies using Gaussian distance procedure is given in [3] and based on these ideology and attractiveness parameter, new parameter is defined in enhanced tree growth fire fly (TGFF) algorithm The best initial solution cannot always produce best final output [36].Because, after each iteration features may be best or better or good.This makes of exploration and exploitation imbalanced in most of metaheuristic techniques.Proposed enhanced TGFF algorithm is given in algorithm 1.

Convolutional neural network-based clustering
Figure 1 shows the technical architecture proposed in this research for KDD based on CNN (KDD CNN), which consists of two parts: a similarity measuring technique and a two-stage clustering process.In the similarity measurement procedure, similarity between data is processed by training KDD in CNN.KDD dataset is divided into four categories based on similarity which in turn used as the training set for CNN in two-step clustering.This process is known as preliminary clustering.Then, in the target clustering phase, CNN is trained using the preliminary clustering findings, and the test set is separated into three groups.These two sections will be described in full further down.

Similarity measurement algorithm
Figure 2 depicts the construction of CNN.The convolution portion is made up of three-layer convolution blocks, each with sizes of 256, 512, and 256.Each block is made up of a layer filled with 1-D kernels without striding, followed by a BN layer, which is subsequently activated by the ReLU function [17].The block sizes in the kernels are 16; 10; 6.The convolution layer is connected to the global average pooling layer.The linear layer provides the output of the similarity measurement.

Two-stage clustering
The suggested two-step clustering algorithm consists of the following two steps.In the initial step, the cluster construction approach is utilized to combine partial data in the dataset with a higher similarity.The cluster generation algorithm.The cluster construction process is dependent on the data selection criteria, or which data should be clustered first.The value of similarity and the similarity rating with the current data are utilized to judge in this algorithm.If you solely utilize the similarity value to determine whether or not to cluster, you may end up collecting too much data, especially if the similarity difference between data is small, lengthening your training time.The collection of data with low similarity may occur from selecting only a small quantity of data based on similarity rating, limiting clustering accuracy.The clusters created in the first stage are used as the network's training set, and the KDD dataset is divided as the network's test set in the second step.The construction of CNN in the two-step clustering phase is the same as in Algorithm 1, with the exception that the last layer is Softmax rather than liner.

RESULTS AND EXPERIMENTS 4.1. Dataset description
In this research work we tend to use dataset called UNSW-NB15.This dataset is developed for network security.It is a hybrid dataset which contains current real data to attacked data.

Experimental evaluation
Our proposed model is evaluated based on two classifiers: SVM and FFDNN.Other two classifier does not provide best result among the four.K-NN algorithm produce 97.32% of accuracy in detecting the  The Figure 3 presents the outcome of SVM classifier with various IDS feature detection methodologies.From Figure 3, sensitivity of the detection is detected.It's clear that PSO and our proposed technique TGFFA provides best result.Also, PSO performs slight better in this process.Figure 3 depicts precision of the various detection algorithm.Proposed TGFFA shows high precision in detecting IDS. Figure 3 shows accuracy of IDS detection.The proposed TGFF performs better and meet good accuracy level when compared with other methods.in the F-measure analysis of Figure 3, proposed TGFF and PSO+GW+GA+FA performs moreover equally in predicting the real true feature and real wrong features in IDS.Table 6 discusses various results of proposed FFNN with TGFF algorithm.The Figure 3 presents the outcome of FDNN classifier with various IDS feature detection methodologies.From Figure 4, sensitivity of the detection is detected.It's clear that GWO, PSO and our proposed technique TGFFA provides best result.Also proposed performs slight better in this process.Figure 3 depicts precision of the various detection algorithm.Proposed TGFFA shows high precision in detecting IDS. Figure 3 shows accuracy of IDS detection.The proposed TGFF performs better and meet good accuracy level when compared with other methods.in the F-measure analysis of Figure 4, proposed TGFF performs moreover equally in predicting the real true feature and real wrong features in IDS.

CONCLUSION
Detecting the intrusions presents in the networks is a challenging research work.there are many features available in by the intrusion for acting as a real user in the network.Number of features affect the performance of detection in IDS system.The key role of our research work is choosing a best feature selection technique for tackling intelligent inntrusion avails in the network.In this IDS we use UNSW-NB15 dataset .Our proposed model using tree growth based firefly algorithm selects the feature efficiently than the other methods.The selected features are processed using feed forward neuarl network and classification of IDS is analysed.We are using TPR, TNR, FPR, FNR for performance analysis.In all evaluation result our proposed acheives 95.65%.Also PSO selects only 26 features from the dataset, GA selects 23 features and proposed TGFFA achieves 36 features with high efficiency and accuracy.In future artificial intelligent with blockchain technology can be used in monitoring the network intrusion.


ISSN: 2302-9285 Bulletin of Electr Eng & Inf, Vol.11, No. 5, October 2022: 2845-2855 2846 detect intrusion deviations in network.IDS based on signature uses predefined keywords of malwares to pinpoint the intrusions in networks.Here databases are updated manually by system administrators.

Figure 1 .
Figure 1.General architecture proposed model

Figure 3 .
Figure 3. SVM classifier-based result evaluation with multiple methods

Figure 4 .
Figure 4. FDNN classifier-based result evaluation with multiple methods

Table 1 .
Table 1 describes the survey of various feature selection model.Survey on feature selection in IDS

Table 2 .
[4]1storm IXIA used to create dataset.It is composed of nine families of attacks and real data.It consists of 49 features in the network flow.Most of the features are listed in table.Our proposed result is compared with various feature selection model in[3],[4]and result is discussed.Features are listed in Table2.Dataset features A novel classification and clustering algorithms for intrusion detection system(Mathiyalagan Ramasamy)

Table 4 .
Number of selected features

Table 5 .
Table 5 presents various results of existing techniques.The result obtained using SVM classifier

Table 6 .
The result obtained using FFDNN classifier