Assessing mangrove deforestation using pixel-based image: a machine learning approach

Received Aug 3, 2021 Revised Oct 3, 2021 Accepted Nov 1, 2021 Mangrove is one of the most productive global forest ecosystems and unique in linking terrestrial and marine environment. This study aims to clarify and understand artificial intelligence (AI) adoption in remote sensing mangrove forests. The performance of machine learning algorithms such as random forest (RF), support vector machine (SVM), decision tree (DT), and objectbased nearest neighbors (NN) algorithms were used in this study to automatically classify mangrove forests using orthophotography and applying an object-based approach to examine three features (tree cover loss, aboveground carbon dioxide (CO2) emissions, and above-ground biomass loss). SVM with a radial basis function was used to classify the remainder of the images, resulting in an overall accuracy of 96.83%. Precision and recall reached 93.33 and 96%, respectively. RF performed better than other algorithms where there is no orthophotography.


INTRODUCTION
Mangroves offer numerous benefits such as carbon sequestration, soil erosion prevention, and rich biodiversity. Thus monitoring deforestation and managing mangrove forest sustainability are of crucial concern, and satellite remote sensing is used to map and assess the changes over time [1]. However, the traditional remote sensing approaches lack accuracy due to the coarse spatial resolution of the satellite image, resulting in spectral species being confused with vegetation in landward areas [2]. A previous study [3] reported 75 to 90% accuracy in detecting mangrove species, although the research only considered primary forest areas overlapping mangrove tree cover. Light detection and ranging (LiDAR) and optical remote sensing combined with the SVM algorithm to map and create a mangrove inventory was already adopted and tested [4]. However, to the best of author's knowledge, the applicability of LiDAR and optical images, together with RF, is not tested for mapping mangrove forests. Furthermore, no comparison between random forest (RF) and support vector machine (SVM) in mangrove mapping has yet been performed [5]. Changes in mangroves forests have been previously studied using freely available time-series data of satellite-based imagery [6]. Such remotely sensed time series data have proven to be effective for monitoring mangrove ecosystem changes, using both aerial photographs (APs) as well as optical and synthetic-aperture radar (SAR) data [7]. Thus, numerous studies have recently applied time series data to map mangrove changes. In those, APs and visual interpretation are the most commonly applied techniques for detecting changes in mangrove forests [8], [9]. This study similar to Turubanova et al. [10] defines primary forests as tropical  Figure 1. Seven Provinces included in the study and their time-series plot of tree cover loss, above ground CO2 loss and above ground bio-mass loss data source: [17]; map source: GADM

Dataset selection
The dataset (2001-2020) used in this study was obtained from a collaboration project between the GLAD laboratory at the University of Maryland, Google, USGS, and NASA [19], Tree cover regions measure decrease in resolution of around 30 × 30 m 2 . Data were obtained from the Landsat 5 Maper (TM), the Landsat 7 map plus (ETM+), and the Landsat 8 operating landscaper (OLI) sensors utilizing multispectral satellite imaging. More than one million satellite images, including around 600,000 Landsat 7 for 2000-2012, and over 400,000 Landsat 5, 7, and 8 images for the period 2011-2019 were processed and analyzed, providing updates to the satellite. There is obvious land area monitoring and a supervised learning method in satellite images. Song et al. [20] the pixel tree cover loss has been utilized to identify. The tree cover is described in this data set as any vegetation greater than 5 m in height that may be planted through a range of canopy densities in the shape of nature's forests. Tree cover loss is described as a booth replacement disturbance or the entire tree cover evacuation canopy within the pixel range of Landsat. Tree cover loss may be due to human actions including forest collection or deforestation (natural forest transformation into various land uses) and also natural conditions like storm damage or illness. Tree cover losses may also be the result of human activity. Therefore, deforestation is not comparable to loss.

Normalization 4.3.1. Normalize data
Normalization is a preprocessing step in the scaling method and used to find new information from the existing array, and can aid the prediction process. There are several approaches for predicting data. To maintain a large variation in prediction and forecasting, the normalization method is required to make them closer [21]. Standard deviation can still be used with normalised data because both translation and linear scaling have no effect on it. The classification stage entails determining the correlation values for each pair of graphs and applying several rule-based techniques to categorize the characteristics of mangrove forests into various categories [22]. The proposed normalization method is given as shown in with an explanation.
Where is the particular data component, is the number of digits in component , is the first digit of a data component, is the scaled value between 0 and 1.

Mangrove forest normalization
The Landsat images were normalized utilizing image normalization to decrease spectral variations between Landsat, as described as shown in. This approach of the mangrove forest pixels applies the average values for linear correction to each spectral band [7]. The variation in mean sea level over the 20 years of monitoring showed negative correlation with normalized soil pore water salinity, such that years of high salinity with low mean sea level are related with [23], indicative of lower levels of tidal inundation in several regions in the Gulf of Thailand, thus impeding mangrove growth as.
Where ( ) is the normalized median for the band , ( ) is the median value of the dense mangrove forest of the sample size for band , and ( ) is the reference dense mangrove forest value for band as computed for the well-known, stable locations identified using the Landsat images across the study area.

Machine learning algorithms
Machine learning (ML) discipline is closely connected to the database discipline [6] therefore machine learning involves the integration of additional questions about the computational architectures and algorithms which can be most effectively used to pull-in, index, merge, recover, and store these data; how different learning subtasks can be arranged in a bigger system, and respond to the questions of computational tractability. Thus, the model is considered as an approximation of the process that machines are required to mimic. In such a situation, some input errors may be obtained, but mostly, the model provides correct answers. Hence, another measure of performance (besides that involving the memory usage and speed metrics) of a machine learning algorithm is the accuracy of results. In this study, four machine-learning algorithms are chosen, consisting of SVM, RF, DT, and the object based NN algorithm.

Support vector machine
Machine learning is defined as the science of getting computers to learn the thoughts and actions of humans to improve the performance of computers in analyzing data and information from the human command [24]. One of the machine learning algorithms that may be utilized in the Google Earth Engine is the SVM. The SVM, being a supervised non-parametric statistical learning method, may be used for both classification and regression, making it appropriate for categorizing mangrove and non-mangrove forests based on pixel reflectance [25]. The SVM is responsible for finding the decision boundary from several different classes and maximizing the margin. An illustration of this could be found in the research conducted by Heumann [1] which classifies mangrove cover using the SVM algorithm on the results of segmentation. The SVM is employed in the training process by entering the training data into the vector space. The nearest pattern from the training data is called a support vector. Only one previous study has applied the SVM [26] in the analysis of mangroves as part of a combination methodology to map mangroves using spectral and image texture data, while the performance of SVM for the multispectral classification of mangroves stays untested. The mathematical equation can be annotated as.
Considering the linear classifier for a binary classification problem with labels and features , ∈ {−1,1} is used to denote the class labels and parameters , as : normal to the line, : bias.

Random forest
The segmented training samples from the dataset were exported as comma-separated values (CSV) with all seven classes input into the random forest classification and implemented in Weka 3.9 [27]. Preprocessing was performed on the training and testing sets (remaining non-ground objects) using CSV which is the file structure utilized by Weka. The classification model was trained applying RF with 500 iterations and then utilized to classify the validation tests as well as the continuing test delineated as.
Where the expectation of an average value of such trees is the same as the expectation of any one of them. An average of variables randomly, each with the variance 2 , has a variance of 1 2 , positive correlation in pairs , the average variance.

Decision tree
The machine learning techniques are better than traditional approaches and remote sensing applications are increasingly used to monitor the change of mangrove forest from series data in distant wetland research. Recent research has demonstrated the effective usage of learning in remote sensing. It is one of the most popular techniques in mangrove studies. It promotes various machine learning approaches and has various advantages such as being very accurate, cost-effective, time-efficient in land-cover classification, based on remote saving, and providing long-term data access [28]. Therefore, the purpose of this research is to develop straight from the training classification criteria for researching the learning data's capacity without human participation. In addition, does not use multi-temporary Landsat data ranges like the max-forest cover in seven provinces, unlike other analytical statistical techniques for classification and detection of changes [29] as: Where ( ) is the value of attribute ; and are weights. The attributes used in the path of the tree can be ignored.

Object-based NN classification
In object-based classification, segmentation is the process of finding pixel clusters with comparable properties and can generate objects of varying numbers and dimensions based on spectral seamlessness and compactness thresholds. There can be a hierarchy of segmentation levels extending from a few large items to numerous little objects with each object belonging to a huge object at a higher segmentation level and reduced objects on a lower level. The spectral attributes of each object include not only statistical values such as maximum, minimum, and standard variation of each band but also median value for each band from the pixels participating in the object. All these variables can be utilized during the classification process to support the discrimination of objects and their correct position to the land-use cover classes [30]. The classes are constructed according to a tree-like hierarchy in order to inherit the higher-level super-class features in the classification tree in a similar way to the object connection. the lower structural levels. In this study, the image was segmented using the cluster merging method, which starts with pixel-sized objects and iteratively matures to combine tiny objects into a bigger one of the image objects until the homogeneous threshold is exceeded. The homogeneous threshold is concluded by user-defined parameters such as shape, scale, color, compactness, smoothness, and image layer weights, with the scale being the most essential parameter, directly controlling the measure of object images. The object-based NN classification method is utilized in this study to distinguish three mangrove varieties and surrounding land-use types, with the NN algorithm achieving an overall accuracy more than 93.22%. The criteria or attributes mentioned above were used to label the objects and for further object-based nearest neighbor (NN) classification. This supervised classification methods allows all objects to be classified in a whole picture based on the selectable samples and statistics [31].
Where is the number of bands and is the weight of the current band, , 1 , and 2 are the number of pixels within the merged object, initial object 1, and initial object 2, respectively. Symbols , 1, and 2 are the variances of the merged object, initial object 1, and initial object 2, respectively, derived from the local tone heterogeneity and weighted by the size of the object images and summed over image bands. After the segment of an image is done, it is subsequently classified as an object-based classification at the segment level.

RESULTS AND DISCUSSION
Both the USGS dataset and time-series Landsat sensor data were applied in this study. From 2001, the main cause of the mangrove forests was the settlement of the mudflats. Mapping was used to quantify these areas, where tree loss may have occurred at any point throughout the period of the time series, by considering three features: tree cover loss, aboveground CO2 emissions, and aboveground biomass loss for the years 2001-2020. A rising rule was then applied to categorize the remaining tree loss locations which were not detected in the tree cover. The expanding rule barrier was raised. However, objects were only involved in the iterative process of tree cover loss if they shared a border with those already classified as trees. The growth rule was applied to finalize the tree cover classification and the findings were then utilized to determine changes of mangrove cover, with three stated characteristics to make it clearer.

Training data selection
The satellite imagery was used in this research to map land use in the study region by means of training data. First, the classification key was formulated using a high-resolution imagery map acquired (3-5 km spatial resolution) from the seven provinces inside the Gulf of Thailand following a stratified random sampling strategy conducted with datasets. Training and validation data collected from [32] the University of Maryland, Google, USGS, and NASA (2001-2019), for this study using a predictive model to obtain the dataset for 2020. A large training set of 17 classes based on 17 provinces in the Gulf of Thailand was validated using 10-fold cross-validation, with 99.6% accuracy achieved. However, we then reduced the number of provinces to seven to reduce the training data while making the study more challenging, achieving an accuracy of 96.86%. The training data consisted of three features, namely tree cover loss, aboveground CO2 emissions, and aboveground biomass loss over 20 years from 2001-2020. For classification and validation over 20 years, the training and testing datasets were divided into 80% and 20%, respectively [33]. Finally, four machine-learning methods (SVM, RF, DT, and NN) were selected to train the dataset. The results of the training were then tested for model development. Figure 2 illustrates the architecture of the classification process for mangrove forest models using the method proposed in this study.

Evaluation and measurement
In the area ratio of seven classes (provinces) to data training and testing, 700 samples for three characteristics were taken in seven provinces, 80% were randomly picked for classifier modelling and 20% were randomly selected for independent accuracy checks. Due to inaccessible circumstances and difficulties in reaching each province, as well as complications in accessing and covering all the information due to COVID 19 pandemic situation, we focused on this research to cover all deForest ation on the coast of the Gulf of Thailand in the study. Changes in the extent of mangrove forest s were observed across all provinces under study due to both natural and anthropogenic drivers of change. During the period 2001-2020, no region had an intact mangrove extent, with anthropogenic disturbance/removal occurring in all of the sites. The conversion of mangrove to aquaculture and agriculture (especially notable in the Gulf of Thailand provinces) was the most prevalent source of anthropogenic-induced change, which was virtually exclusively restricted to Thailand. Whereas this distinguishes the topographical dispersion of forest loss, it does not convey the extent to which the training happened. Several areas in Southern Thailand exhibited a localized loss. Land conversion for large-scale agriculture such as palm oil; shrimp framing, illegal logging is some of the primary drivers of deForest ation in Thailand and road and infrastructure development acted as an indirect driver, opening up new areas for timber harvesting. The transformation of mangroves into commercial shapes of nourishment and asset generation has been widely observed. The foremost common cause of anthropologically initiated alter was the transformation of mangroves into aquaculture/agriculture; the majority of which occurred in the areas of the gulf. Natural mangrove loss and advancement processes were regularly observed and widely distributed, arising in all areas. This study has recognized locales of seriously alter, both in mangrove gain and loss, and increased future monitoring is recommended. Where TP is true positive (positive issue of correct segmentation), TN is the true negative (negative effect of appropriate segmentation), FP is false positive (incorrect segmentation of a positive issue), and FN is false negative (negative point of incorrect segmentation). In evaluating the three features under study, namely tree cover loss, aboveground CO2 emissions, and aboveground biomass loss for the years 2001-2020, seven out of the 17 provinces in the Gulf of Thailand exhibit positive values.
We calculate the classifiers performance concerning various execution measurements such as recall, precision, F-measure, accuracy region under ROC curve, and gamma measurement. The mean absolute error of a particular a variable is calculated by (11): Where ( ) is the value predicted by the specific variable for a sample issue , and is sample instances; is the target value for a sample issue . For a great fitting, ( ) = then = 0 so, the value is index ranges from 0to ∞, with 0 equivalent to the model [34]. There are distinct errors related to a simple predictor such as the relative absolute error is relative to a simple predictor, which is the average of the real values. In this issue of mangrove forest, though, the error is the overall absolute error instep of the total squared error. Consequently, the relative absolute error requires the overall absolute error and normalizes it by separating by the overall absolute error of the predictor. We calculate the relative absolute error (12).
Where we calculate the value of ̄ to the left side from (12) to (13), which is sample instances.
We compute the root mean squared error, which is the square root from (13), and root relative squared error is square root is calculated on (14).
Ten-fold cross-validation was applied to test the feasibility of distinct models. In the cross-validation procedure, a portion of data is kept aside, and it is being arranged by the remaining data for three features and 7 Provinces. Moreover, the process is common for various parts of the data by starting with 100 iterations and base learner. The portions are selected based on the value of k. The 10-fold cross-validation was used here implies data is divided into 10 parts [32]. Kappa statistic coefficient was developed to validate the nearness of the subjects that were displayed. The Kappa coefficient is a statistical measure of inter-rater unwavering quality or assertion that is utilized to evaluate subjective reports and decide assertion between two raters. We calculated by using the statistic formula defined by [35].
Where is a likelihood of success classification, and is a likelihood of success due to the chance to estimate the kappa statistic coefficient. The cross-validation accuracy of machine learning methods such as FR, HMM, and RBM for applied Kappa statistics coefficient algorithm is greater than 9 so it is excellent results shown in Table 1. Error rate ( ): ER predicted is an assessment of the likelihood of an error happening amid the completion of an assignment. Error rate predicted is utilized to decrease the probability of mistakes happening within the future [35]. Matthew's correlation coefficient ( ): is employed in machine learning as a measure of the value of binary (two-class) classifications. Its true positives, false negatives division with, false negatives, and false positives is commonly considered as acknowledged as an adjusted degree that may be used even when the classes are of extremely different sizes [36].
After comparing the performance of four machine learning algorithms (SVM, RF, DT, NN), classified on the basis of the accuracy obtained, the results for the seven provinces out of 17 in the Gulf of Thailand showed better mangrove class as well as other classes. The RF classifier received higher values for precision, recall, F-score, while also achieving a higher overall accuracy of 97.96%. This could be because characteristically, the SVM has difficulty in training a good model if there are many training samples and instances of 10-fold cross-validation. In mangrove mapping, many training samples are required to differentiate classes, specifically mangroves from other trees. On the other hand, due to their construction, RF, DT, and NN can handle a large amount of training data. This study applied a confusion matrix involving four machine-learning models: SVM, RF, DT, and object based NN classification. Tables 2 (a)-(d) shows the SVM, RF, DF, and NN algorithm in the confusion matrix of CO2 with an accuracy of 93.17%. Carbon emissions reflect the carbon dioxide emitted to the atmosphere because of aboveground live-forest ed biomass loss. All biomass loss is "committed" radiations to the atmosphere upon the clearing, despite the fact that there are periods of low activity due to a few reasons of tree death. Emissions are "gross" estimates rather than "net" estimates, which means that due to a present lack of accurate data, data on the fate of the land after clearance, as well as its carbon value, is not incorporated. Emissions related with other carbon pools, such as shown in ground biomass, deadwood, litter, and soil carbon, are excluded from the files. Loss of biomass, like loss of tree cover, may arise for numerous reasons, involving deForest ation, fire, and monitoring within the course of forest ry operations. Table 2 (b) shows the confusion matrix value by accuracy 96.80% using the dataset for classes of classification random forest (RF) method. As shown in Figure 3, the performance of the four machine learning methods was assessed using precision recall and Fscore parameters, which were categorized according to the seven provinces for CO2 emissions. Table 2 Figure 4 shows the performance results of the four machine learning algorithms for above-ground biomass, with SVM, RT, DT, and NN achieving 95.84%, 98.96%, 96.92%, 94.88% accuracy, respectively. The comparative error values of K, MAE, RMSE, RAE, & RSE for all the four methods for aboveground biomass is provided in Table 4.   The tree cover can take the shape of natural woods or pasture in various densities surrounding a canopy, as the whole vegetation more than 5 m is defined. Loss is the removal or death of the covering of the tree and can be due to a number of reasons, such as mechanical fawning, disease, fire, and storm damage. The loss is therefore not like deForest ation. Pixels of loss are covered corresponding to the concentration of loss at around 30 x 30 m scale. The darker Pixels are shading embody regions with a greater thickness of tree cover loss, but brighter pixels shading indicate a smaller intensity of tree cover loss. Once the information is at a full resolution since not have any change in pixel shading.
In this study, four statistical measurements are involved for determining the tree loss in seven provinces, and this study attempts to modify these parameters to provide high accuracy for SVM, RF, DT, and NN of 94.27, 97.34, 96.20, 95.55%, respectively. Tables 5 (a)-(d) show the tree cover loss classifications confusion matrix for SVM, RF, DT, and NN, respectively. Figure 5 presents the performance results of the four machine learning methods for tree cover loss. The comparative error values of K, MAE, RMSE, RAE, & RSE for all the four methods for treecover loss is provided in Table 6. Table 5

CONCLUSION
This study offers the initial assessment of changes in the mangrove forest range over the period 2001-2020 using SVM, RF, DT, and NN. In this paper, only seven provinces are presented to provide time series data on the influence of terrain change and climate scenarios with various levels of accuracy obtained by the algorithms for each province. This study proposes an alternative approach by using three features: tree cover loss, aboveground CO2 emissions, and aboveground biomass loss from 2001-2020 with different values for each feature. This approach can be tested using an alternative location to check its appropriateness for diverse data. Moreover, based on the data and classifier performance, the SVM, RF, DT, and NN performed well in mangrove classification achieving 95.84%, 98.96%, 96.92%, 94.88%, but it should be noted that RF performed better than the other three algorithms in areas with no orthophotography. In addition, RF Better performance in terms of accuracy since it achieved higher values in precision, recall, F-score and overall accuracy when comparing the three features. Future work should involve the selection of all seventeen provinces, involving several mangrove features in the Gulf of Thailand, as well as more machine learning approaches for higher precision and efficiency, such as Neural Network models and deep learning.