A comparative study of Gaussian mixture algorithm and K-means algorithm for efficient energy clustering in MWSN

ABSTRACT


INTRODUCTION
Modern real-world applications where the sensor nodes are mobile heavily rely on mobile wireless sensor networks (MWSNs).MWSNs are a collection of sensor nodes distributed throughout a particular location.Sensors could process data, recognize data, and communicate wirelessly.Every sensor node is typically powered with worked-in battery-confined power.Since the sensor nodes in MWSNs can be installed in any circumstance and can adapt to quick topology changes, they are far more flexible than static wireless sensor networks (WSNs).The task of mobile sensor nodes involves sensing countless physical phenomena such as light, temperature, humidity, pressure, and mobility.Hence, these networks can be utilized in several applications including environmental monitoring, mining, meteorology, seismic monitoring, acoustic detection, monitoring of processes in the healthcare industry, protection of infrastructure, context-aware computing, undersea navigation, smart spaces, inventory tracking, and tactical military surveillance are among the major uses of MWSNs [1].Furthermore, there are several applications available for MWSNs that provide helpful solutions for certain real problems.A portion of the early applications of MWSN are estimated as: health sector, home management, policing, monitoring, natural designing applications, military applications, and directing conventions [2]- [6].
In fact, since sensor nodes may be placed in dangerous or unusable environments, charging or replacing the battery may be inconvenient or impossible [7].Hence, energy consumption is one of the key distinctions between a WSNs and a conventional wireless network.In order to extend the network's lifespan, it is important to establish methods for dependable data relay from sensor nodes to the sink and energy-efficient route creation at the network layer.Prolonging the lifespan of the network has a significant impact on how well  ISSN: 2302-9285 Bulletin of Electr Eng & Inf, Vol. 12, No. 6, December 2023: 3727-3735 3728 sensor network applications operate.One of the applied approaches is the energy dependent architecture (EDA).This technique is employed to reduce the overall power consumption of MWSNs according to the type of power distribution that occurs through the network.
Recent research studies have shown how organizing center nodes in MWSNs might result in energy saving.In one of three states which are (work, send, and receive), the sensor center point typically operates.These scenarios have a significant impact on the amount of power used.Continuous research has shown that sensor networks with center-point with express events can be used to arrange sensor power-hold [8].One of these practices is to let sensor center points stir and rest.A few realistic computations are employed to regulate and control sensor power use needed [9], [10].In addition, to save energy, the transmitting methods are also been identified as crucial concerns for WSN networks [11].Due to the limited power available at sensor center points, data gathered based on objective conditions is directly relayed to the base station (BS) [12], [13].A center named BS (also known as sink) negotiates for obtaining data along several sensor center locations.The BS examines the short proximity of the supplied information, which is used in the route.Additionally, the BS can transfer this information to multiple networks set up in different areas along with being ready to use it locally [14]- [17].
The clustering of sensor nodes is one of the most effective strategies to preserve the energy of the sensor node [18].Throughout the clustering, the network is divided into numerous groupings known as clusters.A cluster head (CH) is a member of each group.CHs can gather local information from nodes of the cluster, aggregate it, and communicate the data to a distant BS directly or via other CHs.The abolition of duplicate data, improved network scalability, and preservation of the capacity of transmission are other advantages of clustering [19].The selection of CHs is the crucial step in the clustering process, though, since it has significant implications for energy conservation in member sensing nodes and is crucial to the longevity of the network.Additionally, it has an efficiency in energy impact on the data routing procedure, which is the main goal [19].Therefore, care should be used in selecting CHs.The selection of CH can be considered an NP-hard optimization problem [20].The field of CH selection in WSNs has been investigated and researched in numerous research studies as shown in Table 1.

Table 1. Summary of related works
Ref.
Clustering approach Enhancements Simulator [21] Two phases clustering model Improve the total network lifetime using the energy efficient clustering algorithm (EECA), for WSNs.

C programming language [22] An enhanced genetic algorithm and data fusion technique
An improved CH node is used in the proposed energyefficient routing protocol to choose a method that can assess the remaining energy and directions of each participating node.

MATLAB [23] CH selection based on particle swarm optimization (PSO)
The lifetime of the network is increased when PSO is used for the best CH selection.

MATLAB [24] CH selection based on a firefly optimization
Increasing the network longevity by utilizing a fireflybased optimization strategy algorithm for CH selection.

MATLAB [25] K-means clustering-based routing protocol
The K-means clustering-based routing protocol that is suggested which takes into account the ideal fixed packet size to minimize the energy consumption of individual nodes and increase the network lifetime.

MATLAB [26] Dynamic clustering and distance aware routing protocol
The algorithm is focused on the role of super CH in saving the power of CH when nodes are too far from the BS.

MATLAB
Through reviewing the above related works and investigating the modern articles concerning efficient clustering, and their many types of WSN clustering techniques.In this paper, two key clustering algorithms are implemented for power saving in MWSNs, which are the K-means algorithm and the Gaussian mixture models (GMM) algorithm.A comparison between the results is made.A brief description of K-means and GMM algorithms is demonstrated in section 2. The complexity analysis of the proposed clustering algorithms is presented in section 3. The evaluation section is presented in section 4. Section 5 includes results and discussion.Finally, the conclusion is stated in section 6.
clustering algorithm to combat the dispersion and power loss within the MWSNs that was produced as a result of the battery consumption and the challenge of supplying continuous energy, especially for mobile sensors.Clustering methods are unsupervised techniques; thus, the input points will not be labeled at the same time.As a result, the problem solution will depend on the algorithm's expertise from analyzing similar problems throughout the training process.Two primary categories of clustering algorithms are offered in the literature, hard, and soft clustering, such as [27]- [31]: i) K-means algorithm and ii) GMM algorithm.We will explain the K-means algorithm and the GMM algorithm in the next paragraphs.

Gaussian mixture models algorithm
A function called a Gaussian mixture is made up of several Gaussians, each of which is denoted by the notation  ∈ {1, … , }, where K is the number of clusters in our dataset.The following parameters are the components of each Gaussian k in the mixture: -An average μ that characterizes its center.
-The covariance's width.In a situation of multi variables, this is equivalent to the dimensions of a defined ellipsoid.-The probability of mixing defines the shape of the Gaussian function.
In this study, the proposed technique has been chosen to be the GMM algorithm.This clustering technique is characterized by its high efficiency to form clustering rings around the moving nodes in the wireless sensors network.Hence, Gaussian clustering is a suitable nomination for our proposed model due to its high efficiency and speed for accomplishing clustering groups among the moving wireless sensors in the communication network.The GMM algorithm will satisfy as (1) [32]- [34]: where   is the random variable,  2 is the variance, and   is the mean value.Also: and: For many distributions, the coefficient of mixture for k-th distribution.To predict the parameters by the algorithm maximum log-likelihood (MLL), calculated as (4): where  is the weight of probability distribution.The random variable can be determined in the following expression: The updated equations are as ( 6) and ( 7): The probability of finding a certain cluster set of nodes will be as presented in (1) based upon the random node variable   and on the overall WSN nodes variance and mean [31]- [35].The flowchart of the Gaussian mixture clustering algorithm is shown in Figure 1.

K-means algorithm
It is one of the common unsupervised learning algorithms.The dataset firstly is unlabeled.Then it is split into many clusters.The K of the K-means algorithm refers to the minimum number of clusters, which should be created, this number must be pre-defined.For instance, if K=2, there will be two clusters, if K=4, it means that 4 clusters will be generated, and so on.The unlabeled dataset is divided into k different clusters using an iterative process.Each cluster comprises just one dataset and has a unique set of properties.It enables us to classify the data into various sets and provides a practical technique to determine the cluster of unlabeled datasets quickly and accurately.The method is based on the centroid of the clusters.The main objective of K-means is to decrease the distances between points and centers of clusters.The steps of the K-means algorithm of  input data points  1 ,  2 ,  3 , … ,   at K number of clusters as: a. Choose K points either at random or the first K from the dataset to serve as the starting centroids.In the dataset containing the identified K points.b.Calculate the Euclidean distance between each point (cluster centroids).c.Use the calculated distance in the second step, and assign every point of data to the closest cluster centroid.d.Compute the mean of the points in each cluster group to determine the new centroid.e. Repeat steps b through d as many times as necessary, up until the centroids stay the same.The Euclidean distance between two points in space is (8): Suppose ( 1 ,  2 ), ( 1 ,  2 ) are two points in the space.If every centroid of the cluster is denoted by   , then every point of data  is assigned to a cluster depending on (9): Then find the new centroid from the clustered group of points (10): is the set of all points assigned to the  ℎ cluster.The flowchart of K-means is shown in Figure 2. It is a hard clustering approach, which means that it will identify each point with one and only one cluster.This is a key feature.This approach has a drawback in that no probability or uncertainty measure indicates the degree to which a data point is related to a given cluster.So how about switching from a hard clustering to a soft one?Exactly this is what GMM, or just GMMs, aim to achieve.Let's now talk more about this approach.

COMPLEXITY ANALYSIS OF THE PROPOSED CLUSTERING ALGORITHMS
The computational complexity or latency in decision-making is one of the key challenges in computer science, particularly with critical time applications.The computational complexity of the K-means algorithm mainly involves two steps: the assignment step and the update step.In the assignment step, each data point is assigned to the nearest cluster center based on a distance metric (often Euclidean distance).In the update step, the cluster centers are recalculated as the mean of all data points assigned to that cluster.These steps are iteratively performed until convergence, which typically happens when the cluster centers no longer change significantly.The time complexity of each iteration is O(nkd), making K-means computationally efficient, especially when the number of iterations is relatively small, consider n, d, and k are the number of points, dimensions, and centers, respectively [36].On the other hand, the GMM is a probabilistic model used for density estimation and clustering.In GMM, each cluster is represented as a Gaussian distribution, and the model parameters include the means, covariances, and mixture weights of these Gaussians.The paragraph mentions that the computational complexity of the GMM algorithm is ( 3 ) [37].consider n data points, k Gaussian components, and d dimensions of the GMM algorithm.This complexity primarily arises from the need to estimate the covariance matrices for each Gaussian component and their inverses; hence, K-means requires less computing power per iteration than a GMM per iteration.In summary, understanding the computational complexities of clustering algorithms like K-means and GMM is essential for selecting the appropriate algorithm for a given application.K-means is often preferred when computational efficiency is a priority, while GMM offers greater flexibility at the cost of increased computational requirements.

EVALUATION METHOD
In this section, an evaluation method is proposed to measure the performance of both algorithms (K-means and the GMM) in saving energy in MWSN. Figure 3 illustrates a flowchart of the evaluation method.For fair evaluation, the input parameters represent the graph parameters.The evaluation metrics mentioned in the context of energy saving and complexity can be assessed using specific methodologies and criteria which is calculated after the clustering process as depicted in Figure 3.Here is an explanation of how these metrics may be evaluated: a. Energy saving: energy consumption monitoring: measure the energy consumption before and after implementing a both algorithms to determine the amount of energy saved.b.Complexity analysis of the proposed clustering algorithms: the computational complexity or latency in decision-making is one of the key challenges in computer science, particularly with critical time applications.Complexity analysis involves comparing the proposed clustering algorithms with existing or alternative methods.This comparison can be based on metrics such as execution time.

RESULTS AND DISCUSSION
In design, the proposed evaluation technique is conducted for both algorithms (K-means and GMM) of MWSN with parameter settings shown in Table 2.The MWSN is deployed using the graph topology (undirected graph topology).Figure 4 shows the minimum energy in the distribution nodes in a rectangular layout.The figures between nodes are similar to rectangles.It can be shown as rectangular or circular as in Figure 5. Figure 6 shows how nodes' position changes during the clustering process when searching for the closest cluster (in this image, it is cluster 5 and the nodes around it), where the x-axis and y-axis represent the dimension of the MWSN.After obtaining the results from the implementation program, it is preferable to compare these results.The comparison results have been illustrated in Figure 7.The decision boundary's form is the first obvious distinction between K-means and GMM.With a covariance matrix, elliptical borders can be created with GMM rather than circular bounds with K-means.Hence, GMM is a little more flexible than K-means.
Figure 7 shows that the consumption power will remain 100% using K-means for some time and then will be decreased to reach 40%.While in GMM the power consumption will decrease directly from 90% to reach 30% after a period.It is worth stating that in comparison to other clustering algorithms, GMM are typically less scale-sensitive.Thus, it might not need to rescale the variables before utilizing them for clustering.Figure 8 shows that the average saving power ratio between GMM and K-means is 60.52% and that means GMM is better than K-means since it can save power more than K-means by a ratio of 60.52% and that is because of its accurate clustering techniques.

CONCLUSION
In this paper, the power saving of wireless sensors network based on clustering techniques is addressed.For efficient energy saving, the GMM algorithm can be considered in the MWSN as it showed a significant improvement in the power saving in MWSN.The results present that the power saving rate is up to 92% at 4,500 rounds compared to the K-means algorithm.This means that using the GMM algorithm will decrease the amount of power consumed over time.On the other hand, the computational overhead of GMMs is higher compared to K-means.Hence, there is a trade-off between the power saving and the corresponding complexity of each algorithm.

Figure 3 .
Figure 3.The proposed evaluation models

Table 2 .
Parameters for simulation