Plant leaf identification system using convolutional neural network

Received Mar 3, 2020 Revised May 10, 2021 Accepted Oct 15, 2021 This paper proposes a leaf identification system using convolutional neural network (CNN). This proposed system can identify five types of local Malaysia leaf which were acacia, papaya, cherry, mango and rambutan. By using CNN from deep learning, the network is trained from the database that acquired from leaf images captured by mobile phone for image classification. ResNet-50 was the architecture has been used for neural networks image classification and training the network for leaf identification. The recognition of photographs leaves requested several numbers of steps, starting with image pre-processing, feature extraction, plant identification, matching and testing, and finally extracting the results achieved in MATLAB. Testing sets of the system consists of 3 types of images which were white background, and noise added and random background images. Finally, interfaces for the leaf identification system have developed as the end software product using MATLAB app designer. As a result, the accuracy achieved for each training sets on five leaf classes are recorded above 98%, thus recognition process was successfully implemented.


INTRODUCTION
Over the last few decades, machine vision [1]- [3], digital image processing and analysis [4]- [6], have been discussed and thrived. This point of view is an important element of artificial intelligence [7], [8] and develop a strong correlation of human-machine technology. Such techniques were expected to be used in agriculture or natural habitats [9]. Technology nowadays is advance enough to make people realise it was easier to learn new things. With the popularity of smartphones, people can easily reach their phone to discover more about the surrounding nature. The planet inherits a massive number of species of plants. Current floral species estimates range from 220,000 to 420,000 [10]. Identifying plants was an essential and challenging task. Leaf shape description was the key problem in leaf identification [11], [12]. To date, several shape characteristics have been derived to explain the form of the leaf.
Plant recognition for plant species management was significant in agriculture; thus, we firmly believe, botanists can use this system in honour of leaf for medicinal [13]. Most of the plants' leaf has specific characteristics, and each of characteristics can be used for plants identification. Plant recognition and identification [14], [15] were the most exciting topic in image processing, especially in the context of leaf image retrieval. Nonetheless, leaves were natural objects whose morphological complexity was complicated. Thus it was necessary to distinguish between types of leaves [16]. The main target of the project was to proposed a leaf recognition system based on specific characteristics from leaf images by using deep learning technique from MATLAB, that contribute an approach in which the plant was classified based on the parts of its leaves.
Healthy trends show lots of people; adults and kids were involved with healthy activities, such as walking, jogging, and hiking. However, only a few people can recognise trees just by looking at their leaves, while kids eager to know more about plants. So, with a leaf identification algorithm, hopefully, people can realise trees quickly by identify the leaves around them. These educational purposes hopefully will attract more people close to plants and nature, thus gain new knowledge. Moreover, plant identification was a critical concern for botanists, agricultural researchers and environmentalists. Human experts can manually identify plants, but it was a time-consuming and low-efficiency operation [17]. This project presents an approach for plant recognition using leaf image. Besides, the combination of MATLAB with the leaf recognition system [18] has several advantages caused this method was efficient and straightforward. From this point of view, improving the classification rate has been regarded considerably.
Convolutional neural network (CNN), plant classification was a technique used to classify or grouping plants into several numbers of ranks and groups with similarity properties. This group of plants then categories into sub-group to identify the similarity components of the plants. It was a vital process thus useful for the researchers to study behaviour, properties and similarity of each plant and a group of plants [19]. Classification of plants was considered a challenging task even for expert biologists. Thus, the technology of computer vision valuable to simplify this activity to improve the biologists' research and to be a non-expert educational tool in the field. Many studies on automatic plant classification have been carried out for many years.
Previously, plants identification was totally based on human experience and knowledge to identify the plants for daily intake, medical resolve, nourishment and into a lot of industries, in the other hand to reduce the risk of using the wrong plant for extraction of medicinal products and to avoid the fatal errors that may cause a problem to the patients. Botanists and computer scientists were inspired by the advent of digital devices and computer vision possibilities to create computerized systems or semi-automatic systems for plant classification or recognition based on different features. A lot of different techniques can be used to classify plants using the leaves of the plant. Research on leaf classification has a lot of opportunities to discover rather than flowers. Leaf identification proses based on different biometric qualities of leaf takes much longer and expensive. This biometric quality can be categories as the colour, venation, tissue and shape of the leaf. For example, leaf classification as on colour that composition resemblance among both of two images was related impression of sunlight on season [20].
The CNN was a class of deep learning neural networks, that represent a vast potential in the identification of pictures. They were most commonly used for the analysis of visual images and often work behind the scenes in the classification of images. CNN structured based three main layers, which are input, output, and hidden layers. The hidden layers usually consist of convolutional, ReLU, pooling, and fully connected layers; a. Convolutional layers transferring the information to the next layer by applying a convolution procedure from the input layers b. Pooling blends the neuron cluster outputs in the next layer into a single neuron. c. Fully connected layers link each neuron in one layer to the next layer of each neuron.
CNN was an in-depth learning approach that was used extensively to solve complex problems, that able to surpasses the limits of traditional approaches to machine learning. A CNN works by eliminates the need for manual feature extraction, and it does not train the features, yet the images features were extracted automatically. Feature detection was to train and learn through tens to hundreds of hidden layers, thus makes CNN models extremely accurate for machine learning tasks. Each layer increases the complexity of the features learned. There are three learning method of machine learning, which were supervised, without supervision and semi-supervised learning [21].
Starting with an input image, a CNN has a lot of different filters used to develop a function map, then to increase non-linearity the ReLU function was applied, and a pooling layer was added to each function map. The pooled images are flattening into one long vector and fed into a fully connected artificial neural network. The features then process through the system. The final fully connected layer provides the "voting" of the classes. It trains through forwarding propagation and backpropagation for a lot of epochs and should be replicated until a well-defined neural network with trained weights and detectors function. The neural network [22], [23] can be used to take out models and notice fashions these are overly complex found by individuals or other computer techniques. A trained neural network can be thought of as an "expert" in the form of information to be evaluated. This expert can then be used to make forecasts of new conditions of importance and address "what if". The upside on choosing CNN for this project is based on; a. Adapt learning: a potential to learn how to perform tasks based on instruction or first trial data b. Self-organization: neural network represents the information for the duration of the learning process. c. Coding: partial failure of a network results in subsequent performance degradation. ResNet architecture, ResNet-50 CNN [24] was 50 layers deep and trained on more than a million images from the ImageNet database and able to classify images into 1000 object categories, such as animals, pen, and computer. The network has acquired rich representations of features for a wide range of images as a result. The network able to train an image input size up to 224-by-224. A simple idea of ResNet-50: skip the data to the next layers by feed the output of two successive convolutional layers. Yet they cross two layers here and extend to large scales. Figure 1 shows a ResNet residual block which convolutional layer and bypass the input to the next layers. Bypassing two layers was a key insight, since bypassing one layer did not give much change. By two layers can be thought as a small classifier or a Network-In-Network. It's also the very first time a network of over 100, perhaps 1000 layers has been equipped.

Figure 1. ResNet residual block
This layer reduces the number of features at each layer by first using a 1x1 convolution with a smaller output (usually 1/4 of the input), and then a 3x3 layer, and then again, a 1x1 convolution to a larger number of features. As with Inception modules, this allows the calculation to be kept low while delivering a rich combination of features. ResNet also uses a layer of pooling plus softmax as the final grouping.
GoogleNet architecture, researchers at Google introduced the Inception network, which took first place in the 2014 ImageNet competition for detection and classification challenges. Figure 2 shows the inception module that was used in GoogleNet architecture for neural network. The model consists of a basic unit called an "inception cell" in which a sequence of convolutions was carried out at different scales, and the effects are then aggregated. To save computation, 1x1 convolutions were used to reduce the input channel depth. For each cell, a set of 1x1, 3x3, and 5x5 filters was learned to extract features at different scales from the input. Max pooling was also used to maintain the proportions of "same" padding so that the output can be connected properly.
GoogleNet used a stem without inception modules as initial layers, and an average pooling plus softmax classifier was like network-in-network (NiN). This architecture was split into 22 thick layers. It reduces the number of parameters to 4 million from 60 million (AlexNet). This classifier was also extremely low compared to AlexNet and VGG operations, that helped to create a very efficient network. encouraging and continuously improving results on automatic plant species identification as regards other object classification problems. After comparing, since ResNet-50 gives better results and less overfitting, the suitable architecture for this project was ResNet. Assume that this selection due to ResNet50 being broader but still less nuanced. At the same time, it produces a lower-dimensional feature vector, which was likely due to the use of a more robust Average Pooling with a pool size of 7x7. It saves the effort to reduce the dimension.

RESEARCH METHOD
Leaf identification system was a system used for plant recognition to identify the species of plant, thus expecting to optimise the botanists' task and can be used as a kids education platform in recognising the plants. Figure 3 shows the basic flow of automatically or digital leaf identification system. Input images can be from a camera or pictures from the internet and sent to the database for image classification. Deep learning machine will learn to cluster and classify the images then process the images through CNN. Once the process complete, the system can recognize the leaf images, which differed from the images used at the database. Finally, a graphical user interface (GUI) was created to assist the leaf identification process.

Images for data collection
All the images that acquired were stored in a database for data collection. Each image in dataset having a white background and with no leafstalk. Then the current images were augmented such as rotate or flip. Besides, filter from photo editor also applied to the raw images. This process was to increase the number of data collection to increase accuracy for the leaf identification system. The resolution of each image was 1024 × 960. The total number of the leaf images of these five trees were up to 375. More specifically, there were 75 images for mango, 75 images for acacia (as in Figure 4), 75 images of papaya, 75 images of rambutan and 75 images of cherry. The training sets and test sets were generated randomly with 70% images for training and 30% images for testing.

Convolutional neural network method
This method considers the colour and shape features of the leaf. Leaves of different plants were invariably similar in colour and shape; therefore, a single feature alone may not produce expected results. The method of image search and retrieval mainly focuses on the generation of the colour feature vector by calculating the average means. Then for each plane row mean and column mean of colours were calculated. The average of all row means and all columns means was calculated for each plane. The features of all three planes were combined to form a feature vector. Once the feature vectors were generated for an image, they are stored in a feature database. Neural network class encapsulates all the layers of the network. It has methods to train the network using mini-batch gradient descent, to compute the result of input, perform crossvalidation of the network, reset the weights of the network, and write the network to a file [26], [27].
CNN's main building block was the convolutional layer. Three hyperparameters control the size of the convolution layer output volume: the depth, stride and zero-padding. The spatial volume of the output, O can be determined. As shown in (1) used to determine the total number of neurons that "fit" in a given volume. With the input volume size of W, the kernel field size of the convolutional layer neurons of K, stride applied of S and amount of zero paddings of P. (1)

Training and test sets
The general statistical classification was the method of defining a set of classes or categories to which a new observation belongs, based on preliminary information such as a data set for training. Specifically, the procedure used to allocate a specific plant species to an image, based on its feature set, have been listed in this research. It was also a subset of the more general statistical and machine learning classification problem, namely supervised learning.
From the database that has been collected, which total up to 375 images from 5 classes then run a neural network process for classification. The network was trained and tested on leaf images from the database. The dataset only included pictures of leaves on a white background. All networks are trained using minibatch gradient descent with a learning rate of 0.01. Each network was trained until the validation accuracy reach maximum accuracy. Thus, the number of times taken differ with each training and test sets.
Furthermore, some epochs also differ with each leaf classes. This model was trained by an inconstant number of iterations on the training sets of 375 images by utilizing an NVIDIA GTX 1650 4GB GPU. Simplify as a block diagram in Figure 5.

Classification
As for classification, CNN algorithm has been calculated and classified the leaves from the database and recognized the new leaf images from the input. The ResNet-50 was a CNN that was trained on more than a million images from the ImageNet database. Thus, the identification system was able to recognize more than 90% accuracy from the images in the database. During the classification method, accuracy was checked to measure how accurate the machine works during the process. Accuracy was a metric used to evaluate models of classification. The formal definition of accuracy was stated in (2); For binary classification in terms of positives and negatives, accuracy can also be measured using (3), with TP, TN, FP and FN are for true positives, true negatives, false positive and false negatives, respectively.

Graphical user interfaces
GUI or Graphical User Interfaces were the final presentations of this leaf identification system. As the system can recognize the images, this interface has been guided the user to insert the desired leaf image for recognition purpose, and the interface started to run the program to identify the leaves thus completed the output of this project which was able to identify the leaves. For this part, the suitable method was also by using MATLAB interface creator which called App designer. By combining the coding of a neural network into the App designer, the interface was completed as per design.

RESULTS AND DISCUSSION
The result and analysis that has been done for this project were data collection and the algorithm of CNN through MATLAB to classify the leaves types. There were several steps in order to achieve the desired results as the output of this project. The main result was to ensure the successfulness of the system to identify the leaves from input images.

Training results
The leaves identification for this project required several steps. For the result, the output obtained from the MATLAB command window. By deep learning machine with an application of CNN, the system determined and classified the leaves from data collection by Resnet-50, which was the feature extraction. In this case, 50 layers, with 49 Convolution layers and one FC layer on top. Except for the first Convolution layer, the rest 48 composes 16 "residual" blocks in 4 stages. The block has a familiar architecture within each level, i.e. the same input and output shape, as shown in Figure 6. After simulating in MATLAB, each layer of the feature extraction completes the task to identify leaves and classify it accordingly. Resnet50 architecture was used in the training progress. For epoch, the system stops trained until the validation accuracy achieve its maximum accuracy, to ensure perfect reading of accuracy that has been trained on the network. The limit of the epoch can be up to 100 epochs and base learning rate of constant 0.01. Figure 7 shows the training images from the database of acacia. By running the training process for simulation, several functions were read, and the data was tabulated for each leaf classes. Those functions include validation accuracy, elapsed time, epoch, iteration, iterations per epoch and learning rate. Figure 8 clearly shows that all the accuracy was above 98 % because of the training network used was pictures/images that suitable for the learning process in deep learning. The best validation accuracy among five leaf classes were papaya leaves with 99.70% while the lowest was rambutan leaves with 98.60%. Means that, papaya leaves were more comfortable to learn by the network due to the unique shape that differs from most of the other leaves. For an elapsed time, the quickest elapsed time was mango leaves with 8 seconds while the longest elapsed time was cherry leaves with 23 seconds. Epoch

Testing results
For the input image used for testing, there are three types of images involved, which were image with white background noise and image with two different random background. Noise can be added using MATLAB coding. Figure 9 and 10, show several numbers of sample of input image used for testing and GUI of leaf identification, respectively. Figure 10 shows that all five leaves are successfully recognized by the system.

Comparison of training sets on different neural network architecture
For results comparison, there is another architecture of neural network used, which was GoogleNet. Thus, the comparison was made between ResNet50 and GoogleNet neural network architecture. The database used for this training sets were the same as the existing database used in leaf classification for this system. The comparison was for research purpose in order to achieve more output for the project and analysis of the results.
The system uses ResNet-50 architecture in a neural network which has high accuracy in the network; meanwhile, GoogleNet architecture has high accuracy in a neural network. So, comparison of results between these two architectures has been made to compare the main criteria in a neural network such as validation accuracy, loss, elapsed time, epoch, iteration, frequency and learning rate. Based on the results, these two architectures do not differ much, and the accuracy was still high, which was above 98%, and the elapsed time was acceptable, which was below 20 seconds. Both architectures were suitable for the image classification purpose.

Project analysis
Based on the project results, the created system was able to identify the leaves from the database that has been created from the first phase of the project. These leaves classes which were acacia, papaya [28], cherry, mango [29] and rambutan has successfully identified when testing with different image and environment with the accuracy of more than 98%. The accuracy was high based on related works of the leaf identification system using CNN. When tested with a random environment, sometimes the system cannot identify the correct leaves due to the same colour of the environment might interrupt the identification system. Nevertheless, when testing with white background, the system can successfully detect the leaves without error, although the images were added with noise (speckle).

CONCLUSION
Recognition of leaves has been explored in various scientific articles and studies. It can make a significant contribution to plant classification research. This work has been proposed in order to introduce an automatic identification or classification of leaves using CNN. A method of leaf identification address two essential points: the fundamental characteristics of the leaf and the recognition or classification of these leaves. In neural networks, the networks seek to identify leaf sets is based on their colour concentration without carrying out quantitative or computational studies. The use of a neural network for leaf identification and the plant's classification has been conducted successfully. Several set of experiments are arranged in this work, including training and testing of CNN. The results reported in this work proved that the use of CNN for classification of plants based on the images of their leaves is a promising idea, shows that the ability to use neural networks in the classification process for leaving recognition tasks and machine vision application. The different architecture of CNN differs the result of the validation of accuracy. However, for classification, it still can identify images without high loss which are acceptable and valid for the identification process. Besides, there is a lot of other neural network architecture besides ResNet and GoogleNet.