Automated diagnosis of attacks in internet of things using machine learning and frequency distribution techniques

,


INTRODUCTION
In our digital world, security is of the utmost importance. The idea of the internet of things (IoT) is based on connecting objects around us to make our everyday lives more efficient and thus provide more comfort and productivity in our business and personal life. But these connections also expose sensitive data. It is evident that the requirement for security is undoubtable. The internet of things displays new usage challenges. As the quantity of associated devices in our lives develops and the measure of information (data) that is collected everyday skyrockets, security a progressively crucial.
The objective of MQTT is to give a lightweight and simple-to-use communication protocol for internet of things. The protocol itself indicates just a few security mechanisms. At the network level, the intrusion detection system (IDS) is used to detect various anomalies and protect our IoT systems. The objective of this research is to adopt machine learning techniques to enhance network level security in power constrained devices that use the lightweight MQTT protocol for their functioning. This work is interested in testing various machine learning techniques that could help improve intrusion detection systems. The multiclass classification models are provided with data containing frames under various types of attacks and normal frames labelled respectively. The pertained model can then be utilized to identify and thus prevent unwanted attacks or intrusion in the IoT system. In this research, we were also able to classify the DDoS attack even if the there is a low rate of attack.

RELATED WORK
To detect the anomalies in traditional types of networks there are many approaches in machine learning. The dataset used for that are KSL-KDD and KDD99 [1]. This data set contains different attacks followed on TCP protocols (traffic is analysed). Balancing the dataset was indeed an important task to increase the accuracy of the model, which was done by balancing classes [2][3][4][5]. Fuzziness based on supervised learning has also been implemented to improve the accuracy of recognition of attack by [6] whereas sequential extreme learning has improved the machine learning detection of attack by [7]. Thus, the obtained results clearly depict that these algorithms (machine learning approaches) are the appropriate approaches to enhance the detection of intrusion at the network layer. There are also approaches in deep learning used to detect the intrusion and any anomalies [8]. One way of doing that is to apply DBM (deep belief network), for selecting various features on the KDD dataset and then applying SVM over that. This predicts the type of attack [9][10]. Another approach is adopted to find the fisher score using deep learning approaches. In this case, a classical statistic approach is used along with auto encoder to decrease the number of features and exact the features of the highest importance [11]. Deep learning is not only used for intrusion detection, but it is also used for classifiers. As the Temporal data sequence of intrusion detection is also useful, so LSTM (long short term memory) has also been used to find the attack (using the KDD dataset) [12,13]. With respect to the IoT (internet of things) and IDS (intrusion detection system), there are many approaches that use edge computing and fog computing. Using fog computing and edge computing, a simulation is created for the NSL-KDD dataset. This approach to detecting IDS has exhibited good time dependence performance and good accuracy [14]. Over the KDD dataset, the rules of IDS detection are also modified using machine learning approaches such as SVM and KNN [15]. As IoT as a field is still developing, the given solutions cover fewer aspects of IoT attacks [16]. There are other datasets such as AWID [17], in which TCP frames for the WLAN network is collected and analysed for attacks on 802.11. In this case, study of Wi-Fi intrusion was done by a neural network classifier [18]. For training purposes, the CICIDS dataset [19] is used to validate the intrusion detection algorithm with recurrent neural networks [20,21]. The present research paper is mainly based on IoT scenarios and help in detecting vulnerability concerning the IoT. The IoT data traffic is analysed, which uses MQTT protocol for communication between the publisher and various clients.
Here we have used the KDD dataset, but the problem with the KDD data set is that it was not clean (not pre-processed) and has redundant values. Second, the parameters (variables) of the experiment were not clear. Third, the important variables, which clearly help in spotting the attack, were not used in the KDD dataset. In order to overcome these problems, we have created our scenario, as mentioned in the KDD data set links. We even generated the data set and found the parameters, such as sequence number, MAC address, socket number and others, which greatly helps in finding the attack, and even reduced the dimension of the KDD dataset (which was not important). This paper then used the combined data set of KDD and the scenario data that we have generated, which has ultimately given us better results for this given research.

PROBLEM STATEMENT
While there is no doubt that many kinds of security are a requirement, including information security in our day-to-day life, we face a lot of implementation challenges when it comes to the Internet of Things, which demands high usability. Security has always presented a trade-off off between the degree of insurance and the level of ease of use. This trade-off gets significantly more intriguing with the Internet of Things. Normally, IoT devices create very low memory capacities and little computing power. High-security cryptographic algorithms require considerably more assets than small IoT devices can have. So, until we find that Holy Grail of compact energy sources, we must search for different lightweight ways to provide highlevel security to our tiny but crucial IoT devices that make our lives easier, more productive and ergonomic.
Security approaches that depend vigorously on encryption are not a solid match for these constrained devices, since they are not equipped to perform complex encryption and decryption rapidly enough to have the option to transmit information safely, progressively and securely in real-time [22][23][24][25]. IoT frameworks systems should make use of multiple layers of safeguard and defence to make up for these device impediments. Applying "security intelligence" for detecting, recognizing and mitigating attacks as they occur is an IoT security challenge. One approach to this, as used in our work, is to use multiclass classification of frames using standard machine learning techniques to correlate frame characteristics with frame type. A longer-term goal may include unfolding threats by applying AI to predict adaptively modified security method, applied depending on the viability of past activities and previous actions. As can be seen in the preceding Figure 1, the setup uses a. There are three levels in the hierarchy-level 1, 2, and 3. Each component in the different levels plays a different role. The base units work at the base level and collect data from the sensors and appliances and transfer it to the intermediary node. This transferred data, received from all leaf nodes, and is aggregated at the intermediary node. From there, it is forwarded to the server via MQTT protocol. The Server collects data from all the intermediary nodes and then processes and analyzes it. MQTT is indeed an application layer protocol, whereas at transport layer TCP is used to increase the reliability in communication. At network layer we have IP and at data link layer the sensors works, whereas, physical layer used to send the data in binary form. But, as we can see that there is no security aspect in IoT architecture so we need ways to protect the IoT network. Too overcome this problem the machine learning algorithm has been embedded in the application layer of the IoT architecture.

Figure 1. Hybrid star tree topology
Security approaches that depend vigorously on encryption are not a solid match for these constrained devices, since they are not equipped of performing complex encryption and decryption rapidly enough to have the option to transmit information safely progressively and securely in real-time. IoT frameworks systems should make utilize of multiple layers of safeguard and defence to make up for these device impediments. Applying "security intelligence" for detecting, recognizing and mitigating attacks as they occur is an IoT security challenge. One approach to this as used in our work is to use Multiclass classification of frames using standard machine learning techniques to correlate frame characteristics with frame type. A longer-term goal may include unfolding threats by applying AI to predict adaptively modify security method applied dependent on the viability of past activities and previous actions

EXPERIMENTS
Various software tools that are used are Low Orbit Ion Cannon (LOIC), Ettercap, and Wireshark. Wireshark is used to store the traffic of data moving from my system to the internet and also data coming from the internet to our system. For attacking, the system LOIC and Ettercap tools are used. Once the data is stored, we have applied a machine learning algorithm using python. As far as hardware is concerned, this research paper has used other os ar9271 Wi-Fi adapter to capture the packets in monitor mode. We have used different types of tools in order to perform different types of attacks. These tools are LOIC (for performing DoS attack) and Ettercap (for performing MitM attack) [9,14], as shown in Figure 2 Ettercap [20,26], which is depicted in Figure 2, is mainly used in Kali Linux Devices. It is used to perform different types of attacks on our device that is running the Ettercap, used as a malicious node to perform the man-in-the-middle attack. First, we need to have all the target sources and the malicious node in the same network, then the malicious node opens Ettercap and clicks on scan for hosts that will scan and give the results of available nodes/devices in that network. While performing the attack, we need to collect the data for our analysis. We are using the Wireshark data from the victim's device. That data is converted to CSV format and labelled according to the attack type. Now, after completing the above process, we have three datasets, which are the DDoS, MitM and the normal dataset. These datasets need to be cleaned and then combined to form one dataset, then used for the classification process. With DDoS, we will follow a different approach, as it is hard to distinguish from usual traffic. DDoS is a type of denial of service attack where multiple compromised nodes that are distributed over the globe attack a server that offers service. Mainly, DDoS attacks are one of two types. High volume attacks are also known as Brute Force attacks. They can be easily detected because of the sudden high traffic. The other attack type is vulnerability attacks that attack weaknesses in the protocol. As the traffic graph is similar to normal traffic, they are difficult to detect. Also, high volume attacks are no longer viable for attackers even if the servers with cloud providers are now in a large quantity. Additionally, since multiple tenants share a single cloud server, it can impact multiple services in a cascading manner. Low rate DDoS attacks are instead periodic in nature and come from multiple machines, trying to exploit deficiencies in different protocols in a computer network. Due to them not being easily distinguishable from normal traffic, they are more difficult to detect and can be more harmful. Thus, to find DDoS we will follow a different approach given in a later section.

DataSet
We have generated the datasets as mentioned in the section above, but those are unorganized. Now comes the pre-processing that needs to be performed before applying the model. After completion of the preprocessing of the dataset, it is clean and organized. We need to build a model that performs the classifications based on the label given in the dataset. We need to prepare our training and testing data using sklearn.model_selection() we used train_test_split() to split our data into 70% training and 30% testing.

Binary classification
Binary classification of the three datasets as shown in Figure 3, was done as a base to view distinctions between normal frames and frames under attack. Initially the data sets are shown in the figure below:   The fast algorithm of XGBoost was used to develop the trees for gradient boosting. In this tree, growing method just a subset of conceivable split values is considered. The classifiers classify 5 samples incorrectly out of a total of 5000 samples in the DoS dataset, whereas it misclassifies 24 samples out of 1400 samples in MitM, and intrusion dataset. The binary classification, although very good, is not of much use as using different filters will increase load on the devices, slowing down real-time communication. Thus, a multiclass classifier needs to be developed. For better finding DDoS attacks, we have used the following ways to gather the data set: As a very first step in our process, we will be grouping data by frequencies / 10 milliseconds. Our current dataset has the timestamp and the address. We need to combine all those timestamps and group them by 10 seconds. This will allow us to construct a frequency chart. When we use the frequencies over a grouped interval of 10 milliseconds, and create a power spectrum distribution of them, using normal fast Fourier transform (FFT), we get the following for a normal Non DDoS scenario and a DDoS scenario respectively. This will be done only after splicing the different types of data. After this, we will be using Matplotlib to plot the data.

Multiclass classification (ensemble methods)
Combined data sets look as shown in Figure 5. The data is imbalanced, as can be seen in the bar graph of Figure 5. Different numbers of samples and imbalance in individual datasets makes the combined dataset imbalanced. The combined data contains (X and Y axis) 17500, 10000, 2500, and 5000 samples of frames with normal, DoS, intrusion and MitM labels respectively. This can be dealt with using oversampling techniques, like SMOTE and random oversampling, if needed. For the time being, however, the research leaves it to the model to perform under this imbalance. We take care of this fact in the coming sections as needed. This paper used the standard parameters, which are provided by Sklearn's random forests, for this result. A grid search was performed later to decide upon the best parameters of the model. Vanilla neural networks don't give very good results when applied to images because they lose the spatial relation between pixels, which is not only preserved but also enhanced by computer vision algorithms. However, in this moment, the researcher tends to adopt a different method. Tabular row data is converted into an image to process and thus use the benefits of CNN, as depicted in Figure 6. The attacks are either high rate or low rate. High rate attacks are easy to detect due to their high volume, while low rate attacks are difficult to detect as the traffic spikes are not observed. Low rate attacks normally attack deficiencies in the low level protocols. In the coming sections we will also see how we can predict whether it is a DDoS attack in a low traffic attack. For finding the DDoS attack, we have used the algorithm showed in Figure 7. In this implementation, as shown in Figure 7, we will first analyze multiple PSD transformations on attack data on different protocols. Further, we will be analyzing which transformations are more susceptible to a DDoS attack and thus are better to detect one as it happens. We will be using multiple PSDs and seeing the periodicity observed in them with respect to normal traffic and traffic under a DDoS attack. Additionally, we will be analyzing the time taken to process an input to analyze computational efficiency.

RESULTS AND DISCUSSION
The training followed by testing data led to the following outcomes.

Ensemble methods
Both random forest and XGBoost perform quite similarly. XGBoost results are shown in Table 1. Even though the classification accuracy is 98%, the classification isn't as good as desired because of the accuracy paradox described above. The imbalance of the dataset leads us to this problem. The confusion matrix proves to be a good metric to test classification correctness. Other measures, Precision and recall provide good insight into the correctness of classification. These are determined as entireties and proportions of various pieces of a confusion matrix, as depicted in Figure 8.  The confusion matrix points out that MitM and intrusion data frames were quite frequently misclassified as normal samples. 320 and 520 samples were misclassified out of 810 and 1490 samples of intrusion and MitM datasets, respectively. This indicates inefficient classification. Multiclass classification problems tend to be more complex than binary problems, which makes getting better results more difficult for these problems. Although the imbalance was dealt with, there were enormous contrasts between classes. This may have influenced the precision in a few models negatively.

Deep learning methods
The CNN model outperforms both ensemble methods after enough training. The training curves for the model can be seen in Figure 9. With increasing of epochs (iteration over the defined batch size) the training and testing accuracy of the model increases, i.e. it learns the parameters (improve over initial random prediction basically through back propagation) that describes our data perfectly and helps in better classification. The logarithmic loss (multiclass log loss) plot shows a gradual decrease of epochs both for training and testing, signifying that the model is making less and less errors in classifying training and testing samples shown by the blue and orange curve respectively. The CNN model performed better than ensemble methods on all parameters, as mentioned in Table 2. The reason for that lies that CNN resulted in low variance as compared to ensemble methods. Even the parameters such as Accuracy, f1_score, recall score of CNN model has outperformed ensemble methods. As shown in Figure 10, again, the confusion matrix is used to see how well the samples labelled MitM and intrusion (minority samples) are classified by our new model, because, as discussed in previous sections, this data faces the accuracy paradox due to imbalance. The confusion matrix shows that the deep learning model outperforms ensemble models when classifying minority samples. The following confusion matrix is obtained:   Table 3 summarizes the model that gave the above confusion matrix:

Result comparison
The XGBoost model is compared in terms of Model logarithmic loss in Figure 11, which clearly shows that the data is balanced. This work achieved a log loss of 0.0193, 0.0184 improving upon the previous works log loss of 0.079, 0.0753 for the validation set and training sets respectively, as shown in Figure 12.  Based on DFT or discrete wavelet transform the normal TCP flow possesses periodic property, this is not the same for attack flows. Periodicity can be estimated using the power spectral density. But as a downside even if the TCP flow is not periodic due to legitimate reasons, it can be marked as DDoS attack. To overcome this, frequency domain can be used as depicted in Figures 13-15 respectively; these attacks have high energy in low frequency bands. Other than relying on frequency or power domains, collaborative filtering using routers or template matching can be used. These allow using the previous attack characteristics so as to find what might be a DDoS attack. Special hardware systems such as FPGA PSD converters can be used. Also, as eventually a DDoS attack leads to congestion in the network, the traffic with a higher congestion participation rate can be termed as malicious traffic. So, given below are few algorithms which clearly shows how we can detect malicious attack by change in frequency.
Here in Figure 13, it clearly shows the change in frequency is sharp as soon as attack happens, while protocol used here is LDAP (lightweight dictionary access protocol), sharp rise in frequency can be observed during the duration of the attack. In Figure 14, it clearly shows the change in frequency is sharp as soon as attack happens, while protocol used here is UDP (user datagram protocol), sharp rise in frequency can be observed during the duration of the attack. In Figure 15, it clearly shows the change in frequency is sharp as soon as attack happens, while protocol used here is MySQL. The graph always looks uniform as the attack is low rate, and thus frequency techniques are not sufficient to detect the presence of an attack But anomalies also exist in some types of data like, which results in randomness, as shown in Figure 16. and it is sometimes hard to find i.e no specific pattern is observable in any subset of the data due to the inherent randomness of genuine traffic. Figure 17, depicts that in an attack scenario, the PSD is showing periodic properties, while no periodicity is observed in a normal scenario, as shown in Figure 16. This is the way the attack is programmed; the traffic tends to be periodic. While that is not true for normal traffic, where the frequencies might increase or decrease. Here, a strong repetition of pattern is observed and thus, an attack scenario is created and hence, is difficult to be completely random. In Figure 18, similar graphs are observed for other PSDs such as correlogram, covariance, and Yule-Walker to predict the DDoS attack scenarios. Here it is clearly observed that while the degree of periodicity might be different, each of these algorithms does show uniformity in an attack scenario. While correlogram is observed to be depicting periodicity here as well, as shown in Figure 19, none of the other PSD algorithms show any uniformity for a non-attack scenario, validating our claim. But, as a few still show periodicities, it cannot be taken as a general claim that PSD will never be uniform for non-attack cases. Figure 18. Plots for several different power spectral distribution algorithms like correlogram, covariance, and Yule-Walker for the DDoS attack scenario of a single protocol Figure 19. Plots for several different power spectral distribution algorithms like correlogram, covariance, and Yule-Walker for the non-attack scenario