Bulletin of Electrical Engineering and Informatics

Received Jun 9, 2022 Revised Aug 30, 2022 Accepted Sep 29, 2022 One of the most pressing issues in wireless sensor networks (WSNs) is energy efficiency. Sensor nodes (SNs) are used by WSNs to gather and send data. The techniques of cluster-based hierarchical routing significantly considered for lowering WSN’s energy consumption. Because SNs are battery-powered, face significant energy constraints, and face problems in an energy-efficient protocol designing. Clustering algorithms drastically reduce each SNs energy consumption. A low-energy adaptive clustering hierarchy (LEACH) considered promising for application-specifically protocol architecture for WSNs. To extend the network's lifetime, the SNs must save energy as much as feasible. The proposed developed cluster-based loadbalanced protocol (DCLP) considers for the number of ideal cluster heads (CHs) and prevents nodes nearer base stations (BSs) from joining the cluster realization for accomplishing sufficient performances regarding the reduction of sensor consumed energy. The analysis and comparison in MATLAB to LEACH, a well-known cluster-based protocol, and its modified variant distributed energy efficient clustering (DEEC). The simulation results demonstrate that network performance, energy usage, and network longevity have all improved significantly. It also demonstrates that employing cluster-based routing protocols may successfully reduce sensor network energy consumption while increasing the quantity of network data transfer, hence achieving the goal of extending network lifetime.


INTRODUCTION
The private cloud is one of the most important services in computing, which allows the possibility of providing the service either in public via the internet, or in particular via an internal network, and users or clients choose private cloud services based on institutions and service companies with many benefits that provide them with public cloud services, as it provides them with self-service and expandable devices and flexibility in dealing with data in addition to many control and management advantages over various sources within the network infrastructure.Moreover, the private cloud network provides a very high level of security and privacy by relying on firewall services and internal hosting to ensure the reliability of operations and statements that are carried out only by authorized parties and do not allow unauthorized persons to access these services [1].There are many related research directions presented in this work.
A threat classification scheme was proposed in [2] which is used to detect security issues threats, taking into consideration three various criteria: the first is the machine learning (ML) algorithm as supervised or unsupervised learning, the second is determining the input model features, and the third is based on the type of threats such as, the state of network-specific threats or cloud-specific threats.Research by He et al. [3] they suggest a detection system based on the denial of service (DoS) attack within the source side element in the cloud network, based on specific ML techniques.The used system gets benefits from the statistical  ISSN: 2302-9285 Bulletin of Electr Eng & Inf, Vol. 12, No. 1, February 2023: 561-569 562 information efficiency of both components the cloud server's hypervisor and the virtual machines, and consequently to provide a control rule preventing particular network packages from being sent out to the outside network.
A data protection model was proposed in [4] as a suitable way to work within cloud computing, through an algorithm adopted to optimally deal with the data in private data of cloud computing, and represented by a public-key algorithm as rivest shamir adleman (RSA).Furthermore, a protocol was used, namely challenge handshake authentication protocol (CHAP), as authentication ciphertext challenge handshake which is represented by the protocol to improve the data authentication.It provides secure authentication for communicating parties within the network, as the results proved the proposed model safe and practical.Accroding to Kolli et al. [5], a system was proposed for improving the data protection mechanism within the cloud network based on two factors, which is implemented depending on the cloud system, as the sender sends an encrypted message to the recipient using the cloud environment then the process of retrieving the encrypted message is done through two factors the first of which is a unique security device that is independent device connected to the computer system, while the second factor is a secret private key stored in the computer system.The results show the confidentiality of the cloud data together with revocability services.
Research by Mašetić et al. [6] an algorithm has been proposed based on the support vector machine (SVM) algorithm.The evaluation of results depends on several stages which based on categorizing DoS attacks and normal network behaviors.By the same token, these stages are represented by simulating the attack, collecting the data, choosing the features and classification, moreover the results prove the ability of the proposed system, depending on this algorithm, to detect DoS attack within the cloud computing environment.
The proposed method is based on a specific system that depends on the principle of an intrusion detection system (IDS) in [7].The used system as a network intrusion detection system (NIDS) to detect anomaly-based cases, as it monitors, analyzes, and detects traffic of data targeting the cloud environment.The proposed system was based on dealing with types of attacks (DoS, Probe, R2L, U2R).In addition, the proposed system used a SVM to classify connections to a network.The results show the intrusive network connections detecting and high detection accuracy with low false alarm rates (FARs).Virupakshar et al. [8] relied on dealing with DDoS attacks, the proposed method targets the various types of networks, especially the cloud networks, which greatly affect them to stop the various services provided by this network in order for the entire network to be weak and disrupted as the proposed system implemented with OpenStack.Where the results indicated that the proposed system is a secure model that provides very integrated protection for monitoring data traffic, as algorithms were also used a specific scheme which eventually is DDoS attacks detecting also the administrator of the private cloud is notified based on decision tree (DT), K nearest neighbor (KNN), Naive Bayes, and deep neural network (DNN) algorithms for a training model to detect these types of attacks.
Tor Hammer attacking mechanism and IDS has been proposed in [9] and implemented in cloudstack environment to work with a novel dataset.The dataset analyzed with different ML algorithms as the k-means, DT, random forest (RF), Naïve Bayes, SVM, and C4.5.The result shows the best result for network performance analysis and accuracy with intrusion detection in the case of C4.5 and SVM algorithms [10].It is clear that each of the reviewed works suffer from some limitations such as encoding within preprocessing, modeling stage issues with classifier choice, less use of multiple classifiers, and finally, outdated datasets are used in most of the studies.As a matter of fact, a security is considered a key requirement for the cloud as a meaningful solution.These risks have motivated us to think about a solution to protect data stored in private cloud and the proposed system provides a better solution to deal with these limitations sorted from related works.
ML is being increasingly utilized for a variety of applications of IDS and others.An IDS is a system that monitors and analyzes data to detect any intrusion in the system or network requests [11].In other words, ML algorithms are used to solve security issues and manage data efficiently as a combination module with IDS to deal with the security issues in the cloud system, by training the system model to classify data traffic as normal and abnormal flow.In this paper, improving the secure private cloud using ML and cryptography is implemented.It is presented as follows: section 1 is introduction, section 2 is the proposed algorithm, section 3 is method, section 4 is results and discussion, and section 5 is conclusion.

THE PROPOSED CLOUD SECURITY ALGORITHM
To secure private cloud we have suggested the model work as third party between users and this private cloud.Figure 1 shows a general view of proposed work.The work being built based on ML techniques to work in two phases with two datasets.In the first phase, user requests are classified as normal or abnormal (malicious (attacks).Next, normal requests are passed and the malicious ones are dropped.A datasets UNSW-NB15 is adopted to train the classifier in this stage, which is considerably recommended for configuring such systems; in the second stage of the model, users' (natural) requests are taken and their data Finally, data is encrypted according to its last classification with algorithms.
Figure 1.General view of the proposal AES, 3DES and RC4 encryption algorithms are used.There was a challenge to find a data set with label classes in a degree of confidentiality, so the BBC News data set was chosen to train the classifier into their different classes including business, entertainment, politics, sport, and technology records.Figure 2 shows the proposed approach of unclassified request based on the two phases of request classification and data classification.

Intrusion detection system
The first stage of the proposed system is based on the status of IDS.So, the proposed system detects the normal behavior from the malicious behavior of incoming requests to access the databases.The dataset UNSW-NB15 is one of the most important private datasets to deal with data analysis within the IDS state, a large set of raw data being captured to build this dataset.Likewise, various network analysis tools are represented as IXIA Perfect Storm tool and a TCPdump tool.In addition, it contains several properties created using Argus tool, the Bro-IDS tool, and 12 developed algorithms.Also, the number of properties is determined as 29 features which have been classified into five groups: flow features, basic features, content Actually, the primary focus is binary classification.The UNSW-NB15 Distribution sample was performed as a preprocessing on the data set using a specific method called integer encoding.The first step was integer encoding, thereupon each unique category value was assigned an integer value.For example, in the protocol type column, "TCP" is 1, "UDP" is 2, and "OSPF" is 3.This is known as a label encoding concept or an integer encoding.In some cases, it proved to be enough.The integer values have a natural ordered relationship between one another and ML algorithms may be able to understand and get benefits from such a relationship.Notably, ordinal variables in this case the "UDP" example above would be a good example as a label encoding would be sufficient.Equally important, the RF algorithm builds adaptably within this stage as it is the highest accuracy compared with the other algorithms.Figure 3 shows the proposed method for discovering malicious behavior.As mentioned, we used a feature selection strategy with different types of feature selection methods in the proposed system (within the first phase-IDS) after preprocessing (integer encoding) the results were decreased.For this, the proposed system is not implemented feature selection in this phase.

Classification data with machine learning
The proposed system in the second phase is based on the four sub-phases to deal with the text data in the BBC news data set, which are sorted as follows: i) load BBC dataset, ii) pre-processing, iii) feature extraction, and iv) classification.The proposed system uses on BBC news datasets as: British Broadcasting Corporation (BBC) news data set "https://www.bbc.com/news" is in the form of raw text documents and contains 2225 text files that obtained from website of BBC news identical to events in five local regions in the time between 2004 and 2005.These documents are arranged into 5 folders named with the class label as (entertainment, business, sport, politics, and tech) and each of them contains new articles related to that class label.

Pre-processing
Preprocessing is a very important step which is done to get a better quality input that is performed by tokenization and removing stop words.The main advantage of preprocessing is cleaning and arranging the text to be classified.For the proposed BBC dataset: each folder contains many documents corresponding to each article stored from the directory names of the dataset (the articles related to business are inside the folder name 'business').After that, each category is labeled as id (business:0, entertainment:1, politics:2, sport:3, tech:4).Table 1 shows the distribution of categories in this data set is as follows: The classification phase is based on different steps explained in Figure 4, which contains the data preprocessing, feature extraction and model training-testing phases of the ML classifier.The pre-processing datasets are applied due to reasons: 1) reducing the dataset size to get hold of the best efficient data analysis, 2) making dataset adaptable to provide a better analysis selection method.Also, the goal of feature extraction step is to decrease the dimensions of the dataset by omitting characteristics which are not related to the categorization [12].
Here,  , is the recurrence of term x in document y, c is the count of documents in the text gathering and   is the count of documents where term x manifests.
As mentioned before, the classification features are based on two phases as follows: a.The request validation phase: it is used to (check and validate normal request or abnormal requests).b.The classification phase: it is a significant stage, whose goal is to categorize the invisible news to their respective classes.The data preprocessing is based on the following strategies: a. Tokenizer; the first step is to divide the text to suitable units (words or letters or expressions).The units named as tokens.b.Stop word removal; it is an efficient way in text preprocessing, which use natural language processing (NLP), its main work is to remove meaningless and useless words to get a more useful text document.Dataset dimensions are reduced based on stop-words.In that case, the remaining keywords within the dataset can be identified depending on the extraction automatic feature methods.c.Normalization; in this step, all words are organized and united in lower case form, which means all upper case forms will be transformed to lower case, removing samples such as numbers.Stemming; the proposed system uses a stemming process to eliminate any suffixes or prefixes from the word and to return it to its origin or root.Here, snow ball stemming type is used.

METHOD
ML techniques have been adopted in the two phases of the proposed approach; the first phase is to classify requests destined for the cloud and to build a model for intrusion detection that targets the private cloud and then to analyze the requests whether the requests whether they have normal behavior or malicious behavior.This classifier has been trained by relying on this phase on the UNSW-NB15 dataset.Next: the second phase which aims to classify the data to be stored in the cloud based on the importance (sensitivity) of data as follows: top secret (highly confidential), confidential, and standard data (basic data).Then these three types are encrypted in the server as three types (AES, 3DES, and RC4).The proposed system is based on the two ML algorithms as follows:

Random forest
The main steps of RF working steps are: i) selecting the random samples from a dataset, ii) creating a DT for each sample and getting a prediction result from each DT, iii) performing a specific vote for each result predicted, and iv) selecting the prediction result with the most votes as the final prediction.

Stochastic gradient descent learning
The proposed system is based on the stochastic gradient descent (SGD) learning as the second method, which replaces the actual gradient that is calculated from the entire proposed dataset by estimating calculated values from a randomly selected subset of the data.In such a manner where high-dimensional optimization problems are available, the algorithm provides less computational power (as the number of resources required to run), and it achieves faster iterations in trade for a lower convergence rate procedure of SGD.

RESULTS AND DISCUSSION
The proposed approach is based on configuring a server that acts as a third party between the user and the cloud to secure protection between the two parties.It uses ML and encryption techniques to protect the cloud and the data stored within the private cloud.In the first phase, to detect the attacks, we train the classifier using "UNSW-NB15" dataset.In addition, the BBC News dataset was adopted for the second phase to category the data.In the first phase, the Naive Bayes, SGD, LR, KNN, RF, and DT algorithms were tested.
The method of classifying data based on the proposed ML algorithms is divided into two main processes, namely the data training process and the testing process.The performance comparison parameters used as precision, accuracy, recall, and F1-score, which they based on some of the following [13]: a. True-positive (TP), represents the positive examples that are properly classified [14].b.False-negative (FN), represents the positive examples that are incorrectly classified.c.False-positive (FP), denotes the negative examples that are incorrectly predicted and classified.d.True-negative (TN), denotes the negative instances that are properly predicted by the classifier [15].The evaluation metrics were defined based on the confusion matrix, as shown in (1) to (5).a. Precision is the number of TP divided by the number of TP multiple by FP.The precision can be computed based on (1) [16].
b. Accuracy is the number of correct predictions, which is divided by the total number of predictions.The accuracy can be computed based on (2) [17].
c. Recall is the number of TP divided by the number of TP multiple by the number of FN, as show in (3) [18].
e. Detection rate (DR) is defined as the ratio of correct positive predictions to the total number of positive predictions, as shown in (5) [20], [21].
f. False alert rate (FAR) and false positive rate (FPR), represents the proportion of negative predictions; this is considered as positive-anomaly-for all negative predictions.The lower value is the better.This metric shown in ( 6) [22], [23].
g. Error rate: it can be defined as the number of all wrong predictions divided by the entire number of the dataset [24], [25].h.ERR = b+c a+ b+c+d The result file of used algorithms is shown in Table 2 and Figure 5.The RF algorithm and DT algorithm provide better results from the point of accuracy, detection rate, building algorithm, and so on compare with other algorithms and the RF algorithm is considered the best in terms of speed as it depends on randomness.This paper shows that the efficiency of SGD and LR are the best for analyzed datasets due to the ease and simple of implementation and fast training model.In other words, SGD converts dataset attributes into groups of relational attributes and then classified them based on this point for this the execution time is less.While the LR deals with the overall dataset attributes which lead execution time is large and the accuracy is the same for both of them.Figure 5 shows the result from the proposed phase 2.

CONCLUSION
In this paper, a secure private cloud using ML and cryptography is proposed.The experimental evaluations were performed on two datasets for different purposes: UNSW-NB15 of the normal-attack dataset and the BBC classification dataset.The proposed method is adopted to encrypt the non-attack data securely and reduce the over-processing consumption of the differed data by making multi encryption levels according to the importance of the data.
Phase one of the proposal which detects normal and attacks flow does provide better results in the case of the ML algorithm of RF which is adopted from among many ML algorithms.The study shows that the accuracy of the RF algorithm is 100%.Thus, the best for the first analyzed phase is RF.In addition, the high accuracy of the second analyzed phase is 98% from the SGD learning algorithm to classify different data categories.
Bulletin of Electr Eng & Inf ISSN: 2302-9285  Security of private cloud using machine learning and cryptography (Ali Abdulsattar Jabbar) 563 stored in the cloud is classified according to their importance (high confidentiality, confidentiality, basic).

Figure 2 .
Figure 2. The proposed system approaches


ISSN: 2302-9285 Bulletin of Electr Eng & Inf, Vol. 12, No. 1, February 2023: 561-569 564 features, time features, and additional generated features.Besides, this command includes the database counted types of attacks from them: backdoor, DoS, generic, reconnaissance, analysis, Fuzzers, Exploit, Shellcode, and Worms.The UNSW-NB15 dataset comes along with pre-defined splits of a training set of 175,341 samples and a testing set of 82,332 samples.However, the publicly available training and testing set both contain only 44 features: 42 attributes and 2 classes.Only the training set of UNSW NB15 training set is used for both training and testing in this paper.

Figure 3 .
Figure 3.The proposed classification of normal and malicious behavior model

Figure 4 .
Figure 4.The data classification phase block diagram

Figure 5 .
Figure 5.The results of algorithms used phase 2

Table 1 .
The proposed distribution of categories dataset Security of private cloud using machine learning and cryptography (Ali Abdulsattar Jabbar) 565

Table 2 .
The results of the algorithms used Phase 1