Data mining and analysis for predicting electrical energy consumption

ABSTRACT


INTRODUCTION
Power efficiency was gaining importance in the power world's priorities, it is named as the "invisible fuel" stating that the greatest choice is not to waste power.In areas like Europe with extremely external power supply dependences, an optimization at each stages in the power chain is a must from both ecological as well as economic point of viewpoint as mentioned in [1].Laws as well as obligations set with the power efficiency directive are in that path.A target of the European Union is to reduce up to 20% the energy consumption, achieving by 2022 an energy consumption lower than 1.474 of primary energy or less than 1.078 of final energy, but setting an objective adequate to each country characteristics [2].In addition, small and mediumsized plants built primarily using renewable power sources (PV and wind) or advanced techniques like fuel cells [3] produce close to where they are consumed, which prioritizes demand by more efficiently utilizing resources by storing energy that can be used when and where needed thanks to a bidirectional network, taking advantage as well as the data mining to shave consumption peaks as well as engage a customer, who will have an active role as mentioned in [4], as shown in Figure 1.
For the existing work, a special attention is given to the smart meters and to the advanced metering infrastructure using the electrical energy consumption using data mining and analysis, as from them the consumption data generated [5].This infrastructure is still under development but many regulators and  ISSN: 2302-9285 Bulletin of Electr Eng & Inf, Vol. 12, No. 2, April 2023: 997-1006 998 governments put the focus especially to the smart meter deployment.It is the case of the European Union [6], inside the energy strategy for 2022 program.Electrical markets were heavily regulated in some countries, fact that difficult the implementation of new technologies and quick changes to the conventional market scheme, due to its rigidity.This has arisen the implementation of private smart meters mainly to the industry, commercial and office building; as allows an energy management with future economic benefits as mentioned in [7].However, in the residential sector the energy savings are not compensating the investment on private sub-metering equipment, and so relying on the "official" smart meter implementation to start managing their energy consumption as mentioned in [8].Operators perform the ongoing management functions, typical operations may involve routing and switching, fault management, real-time network calculation, operational statistics a well as sreporting, and dispatcher training are all terms used to describe various aspects of running a network.They have a role to smooth the process of the energy system, many to these tasks are accountability of a regulated utility such as the maintenance and construction, meter reading or security management; and some of them can be outsourced to other domains as discussed in [9]. Figure 1.The scheme for estimation of buildings electrical energy consumption [4] Cuerda et al. [10] suggests a further investigation on the households' features and householders' characteristics is required to be able of determine a possible-causes and the origin of the load patterns and whether exists or not any correlation with the households' features and the load patterns; with so it would be possible to provide more accurate and detailed energy-savings recommendations.For example, the load profile it won't reflect the same whether the kitchen is electric or gas; also, whether the hot water boiler is electrical or gas; due to the fact that it they are electric the peaks are likely to be visible.The base consumption leads to a lower consumption that a house has at all times, especially at night and during times when there is no activity at home, as seen for example in [11].
Also, nowadays the time-of-use tariffs allows to set different prices each hour, the utilities in order to shave the peaks set higher prices to the peak time, usually the evening.So, again a behavioral change is needed in order to reduce the peaks, the simultaneity coefficient of devices used at the same time should be reduced.For instance, use the washing machine at night, out of peak hours that have been rather classified with known machine learning (ML) algorithms in previous study [12], shown Figure 2. The usage and deployment of different model in science.
Cogeneration plants (electricity and heat production), region heating plus cooling, the integration of renew power using data mining and data analysis are the principal areas of attention for energy efficiency in Bulletin of Electr Eng & Inf ISSN: 2302-9285  Data mining and analysis for predicting electrical energy consumption (Inteasar Yaseen Khudhair) 999 energy supply.As a result of provider's obligations as well as the use of white licenses [13], energy savings were achieved.Energy infrastructure modernization contributes to a last deployment of intelligent networks, lowering network failures (nearly 30%) from power generation, transit as well as distribution, which is achieved through the modernization of the infrastructure.From the perspective of the present work, the main focus of attention is on the energy efficiency directive, which is related to the energy efficiency in energy use.Aiming for a massive smart meter's rollout (72% in electricity by 2022) and allowing consumers to have access to real-time and historical information of their consumption and billing; will provide inputs and knowledge to the end consumers that will enable them to have an active role to decide how and when is the best time to use energy.Optimizations in addendum to the power efficiency directive utilizing random forest, decision tree, support vector machine (SVM), logistic regression as well as Naïve Bayesian, there are many other directives devoted algorithms on building's thermal plus electrical demand, like buildings signify about 40% of the total power request in world these directives are: − Power performance to buildings directive: involve building power certificates when retailing as well as renting buildings; business renewal of building components to a small power need, each new buildings should be near-zero power buildings.− Power labelling directive: meaning to assist users to take more data to select power efficient results (air conditioners, television, washing machines, and lights).− Eco-strategy directive: directed at product producers, requiring manufacturers to create minimum power efficiency standards for their product.
Figure 2. The usage and deployment of different model in science [12] In contrast, with the data mining and data analysis is not just delivering the service as well involving other features that enhance the whole process.This directive differentiates the power effectiveness in power supply and power efficiency in power use.The measures adopted can be summarized: − Power suppliers and retailers must decrease annually 1.5% their power consumption using data mining and data analysis.− Annual power efficient restoration of at minimum 3% of buildings owned or occupied.− Incentive the buildings renovation, i.e. adding insulation, double glaze windows, high efficient boilers; to enhance their power performing with data mining.− Mandatory power performing licenses when rental or selling buildings.− Set of algorithms or standards for a range of methods, like decision tree, random forest, logistic regression as well as Naïve Bayesian.− Regular power audits for major corporations plus inducements for tiny and middle-sized businesses to conduct data mining-based energy audits.− In order to properly control energy use, consumers' rights to full information, real-time and historical access to power consumption and billing data must be protected.− By 2022, there will be a total of 200 million smart meters installed (72% of the total).

METHOD
Providing a solution that allows each individual consumer to see a visual depiction and fulfills all the desired functions is intriguing.This facilitates the comparison among consumers, permits to access easily to the individual electric consumption main characteristics that, for instance, facilitate a quick audit on its consumption and detect possible irregularities or problems.Among these purposes are: − Differentiate the weekdays and weekends load profile, are compare them to the load profile output accounting all days not differentiating weekdays and weekends.− Differentiate the load profile for each day of the week (Monday-Sunday), to know if among the weekdays the load profiles are similar or not (Friday may differ from Monday-Thursday).− Representation in absolute values, percentage of consumption per hour, accumulated in a period.− Study if there is a consumption difference among months and seasons.− Find the characteristic load profile per each month of the year.− Be able to specify exact dates and plot the consumption at that period.− Using different kinds of visualization graphs shapes (bars, points, lines, and boxplots).− color to facilitate the information visualization.− Facilitate the comparisons between users by using interactive graphs.
A variety of methods can be employed to group electrical load patterns, each with its own unique approach to achieving the same result.In that sense, for the sake of the current analysis the considered methods are; i) decision tree, random forest, ii) Naïve Bayes classification, and iii) SVM. Figure 3 explain the algorithm with variable responsibility, data regarding the electricity consumption is the main one used in the later analysis; it is numerical data as its records the measurements of electrical consumption for every household participating into the project [14]- [24].States of the data when performing a data analysis before classification as shown in Figure 4.When analyzing the data, there are three things to keep in mind: − It is necessary to reshape the dataset to provide a desired graphical output in the majority of circumstances.− The days' treatment is a challenge when traveling the data, as time series information requires further and various approaches.Particularly, if division among weekdays and weekends, or monthly distinction is required.− They focus on the specifics of electricity consumption data, analyzing which is the best approach to get to the information or conclusions you are looking for 2.1 clustering analysis.
The similarity characteristics of the data are not known in advance.The dataset is divided into clusters or groups.Objects within the similar group have like properties of each another; and differ from instances of other groups.It is an unsupervised learning, due to the absence of a training dataset able to provide prior knowledge.The objective is to label or group the observations, according to their similarity.

Data mining techniques
In this section present the data mining techniques in several parts as support vector machines, random forest, decision tree, Naïve Bayes and logistic regression.Predicting electricity usage is our primary goal, and this can be broken down into two parts: classification models based only on energy consumption data may accurately detect whether a residence is occupied; prediction models based on occupancy detection data can accurately predict whether a residence is occupied for a smart system.

Support vector machines
The SVM is a ML as well as data mining algorithm was to find the most reliable indicators of how much energy a person uses.Classification methods like increasing trees and general additive simulations were employed to help us find an answer to our inquiry.Using forward, backward, and best subset collection, we were able towards identify the subset of predictors that most strongly correlated with utilization.Recursive binary splitting of the predictor space was used by the SVM as a tree-based method for stratifying the predictor area into test regions.The study chose to utilize the increasing tree technique as this is well-known to be one of the extremely strong tree established types.SVM as well have a great ability to contract with high-level dimensionality data.In this work, three types of SVM models were used: linear kernel, radial (or Gaussian) kernel and polynomial kernel as shown in Figure 6.

Random forest
In the random forest, each predictor's mean response is calculated.In the random forest, each response is added to the total sum of the space each answer happened from the mean of every predictor for a sum.Individuals with a high distance value are those that regularly deviated from the mean on each survey.Calculating the mode of each response made finding the frequencies of people who repeatedly classify the samples simple.It was considered a high-energy consumption response if a response's mode was more than 90% of all inquiries asked.There are a lot of responses to this question posted.Upon closer examination, it became clear that all the respondents had used the identical response.The random forest model incorporates these responses, phases of ensemble random forest approaches to solve classification problems as shown in Figure 7.

Decision tree
Depended on a decision-enhancing machine, a decision tree method is used to discover the most significant predictors of energy consumption by analyzing the energy consumption component.To accomplish this, we used the decision tree method that was mentioned in the document.Trees are built utilizing knowledge from the preceding one so that the model improves over time.The amount of trees, shrinkage factor, and the amount of splits in every single tree are among the tuning parameters as shown in Figure 8.

Naïve Bayes
The output variables of subset selection, the factors having a greatest relative effect in classifying, and a combination of variable quantity from both generalized additive and Naïve Bayesian models can all be Bulletin of Electr Eng & Inf ISSN: 2302-9285  Data mining and analysis for predicting electrical energy consumption (Inteasar Yaseen Khudhair) 1003 compared using Bayes.It compares the precision of each top model's predictions in order to make a direct comparison.With the use of a variety of splines, second grade polynomials, as well as linear predictor variables, it was able to better understand the link between individual predictors and the answer.Predictors that need being found to get nonlinear correlations together along with our reply variable have being given more polynomials and splines as shown in Figure 9.

Logistic regression
When preprocessing a logistic regression information, it observed that there were an amount of lost components distributed out evenly throughout our data collection.When removing all observations with any missing data, it lost about 30% of our dataset.This is because the missing data was not localized to a few variables or predictors for energy consumption.Since our data set is of a relatively small size, it decided that a better approach was to replace missing non-categorical data points with the average of that column, with the main purpose to keep as much data as possible.The data has a small variance, so adding averages does not heavily affect the overall statistics.

RESULTS AND DISCUSSION
In this work, the geometrical and statistical center of each class is computed first, and the distance between the two classes equals the distance between the two centroids for different types of classification algorithms as shown in Figure 10 overall classification results-based data mining approaches used on statistical dataset.The output results used to carry out the classification analysis vary depending on the specific aim of the load profile's energy consumption segmentation plus classification using decision tree classification, and it is up to the analyst to decide which are the most convenient input data units for output results; the use of absolute values in result including number of records, precision, recall, F-measure, accuracy and time for the use dimensionless data with time factor for generating and execution is absolute 0.43 seconds with recorded precision of 0.968 for data analysis as shown in Figure 11 consumption prediction accuracy results based on data mining approaches.
The output results used to carry out the classification analysis vary depending on the specific aim of the load profile's energy consumption segmentation plus classification using SVM classification, and it is up to the analyst to decide which are the most convenient input data units for output results; the use of absolute values in result including number of records, precision, recall, F-measure, accuracy and time for the use dimensionless data with time factor for generating and execution is absolute 0.99 seconds with recorded precision of 0.906 for data mining.In discussion section, after analyzing these features, they could detect the group of households with large energy savings potentials, mainly due to the high number of electrical appliances.This concluding that the characteristics that were more relevant to the define a consumer with high savings potential are: type of occupant's employment, number of adults/children, kind of space heating, kind of domestic hot water, heating, overall amount of home uses, and dwelling construction year.This data is not available in the current study scenario, as the features' data is significantly more constrained in terms of samples and attributes.However, while this work describes data related to the household and householders, this data is not complete and must be further processed to eliminate inaccurate and redundant information.Refined data can be evaluated using histograms, which depict the results and serve as the basis for drawing conclusions from them, comparison of accuracy achieved for all data mining techniques shown in Table 1 and comparison of proposed technique with existing literature shown in Table 2.   [18] Machine learning 41.71 [19] Big data analysis 13.56 Proposed Data mining 44.73

CONCLUSION
The purpose of the study work is to study the feasibility to developing data mining and data analysisbased system for predicting the electrical consumption on general-purpose-processor. The study is a sample of the general management pioneer inventiveness to attract the users plus foster the electric power efficiency and consumption among them, pointing to provide power knowledge, understanding plus guidance to the user in order to decrease its consuming using data mining and data analysis.To be able to produce this type of study in the huge level the arrival to the data as of the classification algorithms are vital.As a result, the combination of the intelligent meter implementation and the analytics data are known as to play a vital on the power sector.The data mining techniques create great quantities of raw data that require to be achieved and, once studied can be transformed into valuable data that advantages both the usefulness and the customer, as aim to enhance the client engagement and the value of the serving.Creating new business opportunities, mainly related to data mining and data analysis in response to the market needs.

Figure 5 .
Figure 5. Clustering and classification grouping in data mining while clustering on the left and classification on the right [25]

Figure 7 .
Figure7.Phases of ensemble random forest approaches to solve classification problems[15]

Figure 9 .
Figure 9. Naïve Bayes process on left with respect to SVM on right side for classification building[17]

Figure 10 .Figure 11 .
Figure 10.The overall classification results-based data mining approaches used on statistical dataset

Table 1 .
The comparison of accuracy achieved for all data mining techniques Data mining and analysis for predicting electrical energy consumption (Inteasar Yaseen Khudhair) 1005

Table 2 .
Comparison of proposed technique with existing literature