Electricity-theft detection in smart grids based on deep learning

Received Feb 16, 2021 Revised May 2, 2021 Accepted Jun 9, 2021 Electricity theft is a major concern for utilities. The smart grid (SG) infrastructure generates a massive amount of data, including the power consumption of individual users. Utilizing this data, machine learning, and deep learning techniques can accurately identify electricity theft users. A convolutional neural network (CNN) model for automatic electricity theft detection is presented. This work considers experimentation to find the best configuration of the sequential model (SM) for classifying and identifying electricity theft. The best performance has been obtained in two layers with the first layer consists of 128 nodes and the second layer is 64 nodes. The accuracy reached up to 0.92. This enables the design of high-performance electricity signal classifiers that can be used in several applications. Designing electricity signals classifiers has been achieved using a CNN and the data extracted from the electricity consumption dataset using an SM. In addition, the blue monkey (BM) algorithm is used to reduce the features in the dataset. In this respect, the focusing of this work is to reduce the features in the dataset to obtain high-performance electricity signals classifier models.

significant Non-technical losses is electricity theft. There is a large group of investigations on detecting electricity theft. Traditional ways of detection of electricity-theft contain physically examination problematical meter set up or disconfirmation, associating the irregular meter readings with the regular ones and observing a line of the transition of the by-passed power. These ways are ineffective, tremendously timeconsuming, and costly. The presence of SGs brings chances in resolving of electricity-theft. SGs are comprised of conventional networks of power, grids of communications linking smart devices (for example, smart sensors and meters) in networks, and calculating services to sense and regulate networks [10]. Information and energy move in smart networks attach companies of service and employers. In this way, smart sensors or meters can assemble various data including status information of networks, using electrical energy, information of financing, and cost of electrical energy [11]. The electricity network is considered as an SG that can smartly mix the actions of whole employers linked to its producers, customers and those that do both to proficiently provide maintainable, financial, and safe sources of electricity [12].
Lately, matters of privacy and security have been the issues of comprehensive research since the national economy, public security, and safety depend significantly on the grids of energy. Unfortunately, fears of privacy are not continuously completely understood in SG metering, and thus more is required to deal with threats of electricity theft. Some good survey papers have discussed the matters of privacy and security in the field of SGs. They listed the overall cybersecurity challenges including trust models, connectivity, management of security, the privacy of consumers, software vulnerabilities, and human factors. Possible solutions to these challenges were also proposed [13], [14]. Deng and Shukla surveyed the vulnerabilities and countermeasures, especially for the transmission subsystem within SG [15]. They focused on the point of weaknesses of technology of phasor measurement units (PMUs) and wide area measurement system (WAMS). Wang and Lu examined challenges of security in the grid of SG, containing home area networks (HANs), advanced metering infrastructures (AMIs), subsystems of distribution and transmission [16]. They showed the necessities of security and estimated network fears with matter studies. Komninos et al, presented SG and smart home safety study [17]. Those authors generally assumed the communication amid the environments of SG and smart home are categorized their hazards of safety. Mohassel et al, explained a study on (AMI) advanced metering infrastructure [18]. They studied the main ideas of AMI. They showed the physical and cyber safety challenges containing privacy briefly.
Some literature studied the methods of ETD, which uses consuming data of smart meters to discover deceitful consumers. Observing of load profiles of consumers for marks of electricity-theft in conventional power schemes has attracted the concerns of academics to this point. Angelos et al, utilized five characteristics containing maximum consumption, mean consumption, inspection remarks summation, standard deviance, and the mean consumption of the neighborhood to produce a usual form of consumption of power for every consumer. K-means-based fuzzy clustering was achieved for the collection consumers with the same profiles. Customers with ample spaces to the cluster centers were assumed potential cheats. Gathering the consumers and depending on long-period measurements limited the accurateness of this ETD system and produced a long detection delay. Possessing more detailed metering information may deliver a much better ETD with a much shorter delay [19].
Other earlier work has been directed towards deep learning and convolutional neural networks (CNNs). Abdel-Hamid et al, explained the CNN innovation in the variability of domains associated with forming appreciation from image treating to voice recognition. The most advantageous feature of CNNs is decreasing the factor number in artificial neural networks (ANNs). This attainment has encouraged both designers and academics to approximate bigger models to resolve difficult jobs, which was not probable with classic ANNs [20]. Indeed, Mallat extended earlier presented tools to produce a mathematical framework to analyze the properties of general CNN architectures. At a significant level, the extension was attained via substituting the requirement of invariants and contractions to translations via contractions along with adaptive collections of local symmetries [21].
Furthermore, an emphasis is given to review some research papers that have been conducted on using CNNs for ETD. In this respect, Krizhevsky et al, explored the use of CNNs for the task of theft detection. Motivated via the numerical model method, the periodicity of consecutive data is of considerable significance for the classifier. Therefore, an adequate explanation of the periodicity can be beneficial to develop the accurateness of the detection of electricity theft. Concretely, they suggested adjusting the multiscale DenseNet, which can automatically capture the short-term and extensive-term periodic characteristics of the consecutive data [22].
Bhat et al, investigated three deep learning methods for the detection of electricity-theft, specifically, CNNs, (LSTM) recurrent neural networks (RNNs), and loaded autoencoders. Nevertheless, the working of the detectors was examined by utilizing synthetic data, which did not permit a dependable valuation of the performance of the detector associated with shallow architectures [23] poor accurateness of detection of electricity-theft as they are based on one dimensional (1-D) data of electricity consumption and failed to arrest the electricity consuming periodicity [24]. Therefore, in this paper, the focus is on proposing an efficient technique of electricity-theft detection (ETD) to deal with all worries mentioned above. In specific, we initially suggest a convolutional neural networks (CNN) with a recently proposed nature-inspired metaheuristic optimization algorithm called the blue monkey (BM) algorithm model [25] to recognize the thieves of electricity. The CNN part consists of several convolutional layers, a pooling layer, and a completely connected layer. Principally, the CNN component can capture the periodicity of electricity consumption data. This style mixes the profits of the CNN component and BM algorithm to facilitate efficient ETD. To the best of our knowledge, it is the first research to suggest such a deep algorithm model (mixing CNN with the BM algorithm) and carry it out to study electricity-theft in smart networks. In addition, we have done widespread trials on a huge accurate electricity-consuming dataset.

THE PROPOSED ELECTRICITY THEFT DETECTION SYSTEM
This section focuses on the design aspects of the proposed electricity theft detection system. Using statistical examination of the consumption of electrical energy data of both thieves of energy and usual consumers, one can find that the electricity consumption data of energy thieves are typically less nonfrequent or frequent, associated with that of usual consumers. This monitoring can facilitate classifying the irregular using of electricity and the periodicity of the electricity consumption.
Nevertheless, it is challenging to examine the periodicity of the electricity-consuming data because of many reasons as: 1) it is problematic to study the periodicity of the electricity-consuming data because it is 1-D time series data with enormous size, 2) The electricity consumption data is frequently incorrect and loud, 3) Several traditional methods of data investigation, e.g., ANN and support vector machine (SVM) cannot be straight carried out to the consumption of electricity data because of the calculation difficulty and the restricted simplification ability. Thus, to face the above challenges, the CNN approach has been adopted in this work.
A realistic electricity consumption dataset released by State Grid Corporation of China is used to train the models. This work is intended to identify electricity theft from the power consumption pattern of users, utilizing CNN-based deep learning and BM techniques. This classifier model is trained to utilize a dataset consisting of daily power consumption data of both normal and fraudulent users in a supervised manner. First, the data is prepared by a data-preprocessing algorithm to train the model. The preprocessing step also involves synthetic data generation for better performance. At the next stage, the proposed model is hyper-tuned and finally, the optimized model is evaluated via the test data. The overall system is depicted in Figure 1.

Electricity consumption data
The research is performed on a series of real consumer electricity usage data, made accessible by the State Grid Corporation of China (SGCC). This dataset consists of 42,372 rows and 1,035 columns. The first column includes the costumers' ID, and the second column includes a pointer of prediction called "Flag" while the days' columns start from the third column up to the column (1,035). The Metadata types in the dataset are a set of characters, numbers, and missing or erroneous values called non-numeric (NaN). The numbers and missing or erroneous values represent the amount of electricity consumption (electricity signals) for each consumer over two years.
In addition, the metadata in the flag column is (zero and one) and it is referring to the type of consumers (normal or thief), where the numbers of zeros in the "Flag" column represents the normal consumer of electricity and the total number of them is (38,757). While the number of one in the "Flag" column represents the thieves and the total number of them is (3,615). Finally, this means that the number (42,372) represents electricity consumers' data on electricity usage within 1,035 days (from Jan. 1, 2014, to Oct. 31, 2016).

Modifying the dataset
The given dataset of electricity consumption is passed in various stages of modifying to reduce it to be used in building operations of ETD templates using various algorithms. These stages are shown as: 1) Generating new dataset by replacing all null and Nan values in an original dataset with zero to get rid of null or non-numeric values (NaN), because the neural network accepts numbers only, and these values are not defined, so we have converted them into zeros until the neural network understands them, 2) Splitting new dataset into two parts, one part used for training (80%) and the other part used for testing (20%), 3) Reducing new dataset by dropping location and flag columns from a new dataset. The reason is to reduce the complexity and the time as those two attributes will not be used in the proposed system.

BUILDING THE ELECTRICITY-THEFT DETECTION MODEL
Our system used for ETD goes through several steps. The first step is our dataset passed in several modified operations to reduce it as discussed above. Then, the sequential model (SM) is built, as will be described in the next subsection. The third step is to build the prediction model (the ETD model), and this can be done by two operations. The first operation is done by using the developed SM, while the second operation is done using the BM algorithm. The input is SM with the reduced dataset and the output is the model of ETD. This algorithm consists of a set of fully connected layers, convolution layers, and softmax layers to train and test the dataset (electricity consumption data).

The sequential model
The SM is appropriate for a plain stack of layers where each layer has exactly one input tensor and one output tensor. The input to this SM algorithm is the dataset and the output is the reduced dataset. The first step in this algorithm is defining the input shape to be compatible with the SM. After that, there are two cases to use SM: The first case is predicting electricity signals using original fully connected layers of the SM. This can be done by sending the electrical signals to the SM and SM will return the class of these electrical signals. The second case is using an array and will use to build, and train a given dataset.
In addition, there are three types of 2D (3*3) convolution layers. The three convolution layers (3*3) are linked with fully connected layers (activation, dropout), then linking the other fully connected layers (dense, activation, dropout) with each other. Finally, the resulting metrics from the above operations are used to generate the reduced dataset. This approach uses three fully connected layers (dense) which are (layer1, layer2, and output), where the value of (layer 1) is 128 nodes and the value of (layer 2) is 64 nodes, while the (output) value represents the customer's type to be either thief or normal customer depending on the dataset. In addition, applying four fully connected layers (dropout), where the values of those layers have been selected after trying many possible values. It was found that the best values for them are 0.25 for the first two layers and 0.5 for the last two layers.
The input is SM with the reduced dataset and the output is the model of ETD with its accuracy and loss. Where the dataset is divided into two parts, the first part is used for training, which is 80% of the dataset and the second part is used for testing, which is 20% of the dataset. This algorithm is used to test the best configuration of the neural network in terms of several layers and parameters, beginning with two layers and ending with four layers. The maximum dimension of the layer has been 128 nodes and the minimum one has been 16 nodes. The best architecture has been obtained with two layers, where the first layer contains 128 nodes, and the second layer contains 64 nodes.

The blue monkey algorithm
The BM algorithm has been represented as a function to enhance the ETD template and return the solution of the best location, as shown in Figure 2. The input to this function is the ETD template. The BM algorithmic program mimics the behavior of the BM. BM is a set of solutions for parents and children each one of parents and children has random values. The input to this algorithm is a set of solutions each one represents the template of reducing the dataset and the ETD. The number of solutions used is 10 solutions, each solution has a length of 1035 values generated randomly using zeros and ones. Now there is a template coming from BM, which has a length of 1035 values (0 or 1). This template will be used in two steps: The first step is to reduce the dataset according to modify function, where the input to this function is the template from BM and the original dataset (in the case of the building model). The output is a new dataset, which is less than the original dataset. Then building a model that has an input shape equal to the size of a new dataset. The second usage is when there is a new electricity signal to classify; it should reduce the values of the electrical signal according to the same template. Hence, the electrical signal can be classified using this model. The important goal of building the BM template is to reduce the number of features in the given dataset.

RESULTS AND DISCUSSION
Electricity signals classification is one of the core problems in the world with a large variety of practical applications. In this section, the proposed system is tested to obtain and discuss the results to indicate the effectiveness of this system. Various experiments have been done using the electricity consumption dataset. These experiments include testing the electricity signals classifier configuration, applying the BM model with the best configuration (of two selected layers), measuring accuracy and loss using the deep CNN and BM model, and finally comparing results of loss and accuracy resulting from CNN and BM model with those obtained by the CNN model alone.

Configuration of classifier part experiments
The configuration of fully connected layers in terms of many layers and nodes was tested beginning with two layers and ending with four layers. The maximum dimension of each layer is 128 nodes and the minimum one is 16 nodes. In Table 1, each row in these tables represents the complete model configurations and results obtained from this model. The columns of this table represent the model number (the sequence of the model in the experiment), the number of fully connected layers in the model, the number of nodes in each layer (16 - 128 nodes), the best accuracy of the architecture, the worst accuracy value, the average accuracy value, the average training loss (This value equals to the difference between a true label and predicted label of the electrical signals inside training, which should be minimized as much as possible), and the time consumed for training, respectively. The selected architecture is that has two layers as it has the best compromise of average accuracy and consumed time.

Two layers experiments
This subsection explains the results for two layers in detail. In Table 2, the rows represent ten training rounds of the model. The columns comprise the number of nodes in each layer (128 and 64 nodes for the first and the second layer, respectively), the number of layers (2 layers), the sequence of each training round, and the loss, accuracy, and time of each model, respectively. The table also contains the values of the average of loss, accuracy, and time for the ten training rounds, respectively. Finally, it is possible to note that the values of accuracy and time are convergent, and this indicates the presence of stability in the network.

Applying BM with best configuration of two layers in the CNN model
This subsection describes the results of accuracy and loss using CNN and BM model, after describing the number of iterations, which are ten iterations, and the number of solutions, which are ten solutions. This model consists of two layers, which are 128 and 64 nodes, as shown in Table 3. The loss is equal to 0.265327107, while the accuracy is equal to 0.916932165, and the time equals 10 seconds. The aim of combining the BM algorithm with CNN is to reduce features that will help in the reduction of time and complexity for the execution process. The most important benefit of reducing features is that it can be useful in terms of reducing time. In addition, if the accuracy is improving, then the feature reduction can be useful in the process of eliminating the contradiction between the features.

CONCLUSION
The most important conclusions of this paper are that the supervised learning techniques are better than other techniques because there are labeled data that makes training of models has high performance. In addition, pre-trained models have high power in addressing electricity consumption data because these models are trained using big datasets and powerful computers. When extracting the data of the dataset using normal CNN, the accuracy can be low comparing to addressing electricity consumption data using the SM. In this work, the dataset has been reduced before building the models to increase the performance of building models and classifying new electricity signals. Indeed, using an optimization algorithm (the BM algorithm) leads to reducing the extracted features to speed up the performance of the designed system. Outcomes from these experiments explain that our deep algorithm model (based on CNN and BM) can overtake several other current methods.