Cucumber disease recognition using machine learning and transfer learning

ABSTRACT


INTRODUCTION
The agricultural sector in Bangladesh dominant an important role in the process of driving economic growth.It contributes 19.6% to the national GDP and provides 63% of employment for the population of Bangladesh [1].But one of the factors affecting agricultural production is the plant disease that restricts the growth of plants and causes major economic losses.Therefore, it is important to recognize plant disease's visual signs in the initial phase to avoid transmitting the disease to healthy plants and to correctly diagnose plant disease.
Cucumber (Cucumis sativus), in the Cucurbitaceae gourd family, is a widely cultivated crop that is one of Bangladesh's most popular vegetables.It is the fourth most extensively grown food crop in the world.[2].A fresh and healthy cucumber provides vitamins, iron, calcium, niacin, thiamine, fibers and phosphorus, and minerals with a cool and refreshing taste [3].But diseases in cucumber affect the growth of plants which may reduce the quality and productivity.The major cause of these diseases are pathogenic viruses, fungus, and bacteria.Most of the diseases are contagious and deadly which causes the spread of the disease to the healthy Bulletin of Electr Eng & Inf ISSN: 2302-9285  Cucumber disease recognition using machine learning and transfer learning (Md.Jueal Mia) 3433 plant and damages the plant in quality and quantity.So early detection is essential to prevent the spread of disease.Cucumber diseases recognition by traditional methods are slow, expensive, laborious, and timeconsuming in large farms.Besides, to consult with experts, farmers may need to travel long in some developing countries as well as rural areas in Bangladesh.On the other hand, experts may not be able to go to the local area at right time.
Research in agriculture along with the development of a computer vision approach as well as pattern recognition has solved many natural limitations.The application of computer vision has proven to play an signitficant role in agriculture field.Computer vision first began at the beginning of the 1970s which was meant to mimic human sight and endowing a robot with intelligent behavior [4].In agriculture, computer vision uses a technique to monitor plant health including the detection of pests, diseases, and weeds through the computer eye instead of naked-eye observation.
This paper explores the process of computer vision technology with traditional machine learning (ML) and CNN based transfer learning and comparing these two techniques to automatically recognize with the best accuracy of cucumber disease -downy mildew, powdery mildew, mosaic virus, belly rot, scab, and pythium fruit rot (Cottony leak).
Several researchers have begun to work in the agricultural field on crop disease detection and recognition with the help of advanced technologoy.Therefore, several approaches have been presented to solve this problem such as support vector machine (SVM), artificial neural network (ANN), convolutional neural network (CNN), transfer learning (TL), sparse representations classification (SRC), global-local Singular value decomposition (GL-SVD), global pooling dilated CNN (GPDCNN), and Hyperspectral imaging (HSI).The different types of research methodology have been developed to recognize cucumber diseases are described in the various paper will be discussed here.
Zhou et al. [5] have introduced the image preprocessing technique and SVM for classification.The accuracy of cucumber downy mildew in their system is 90.00%.Three types of cucumber disease recognition were considered in [6].The image processing technology was used in their experimental results like removing noise, lesion segmentation, smoothing and graying.Then shape, color, and texture features were extracted and later classified using minimum distance.The average recognition of correct rate got more than 96.00%.Using hyperspectral imaging technology [7], studied cucumber downy mildew disease where enhancement, binarization, corrosion, and expansion treatments are carried out for the fusion image to make the spot characteristic clearer.Their algorithm demonstrated overall detection accuracy of 90.00%.Pawar et al. [8] developed a system using image processing techniques including nine texture feature extraction using not only GLCM but also first-order statistical moments methods.Then artificial neural network was used for classification.They focused on three classes i.e. powdery mildew, and downy mildew, and healthy plant.Their system achieved an accuracy rate of classification is 80.45%.Youwen et al. [9] using computer image processing and SVM introduced a new method of cucumber leaf diseases.The experimental results showed that for cucumber diseased leaf image recognition, SVM is better than neural networks, and also using the shape and texture features, the recognition accuracy is better than just using the shape feature.Khan et al. [10] have described an improved saliency method and deep features selection.Here, they utilized five types of cucumber disease.They achieved an average recognition rate of 98.08% in 10.52 seconds.This improved method consists of five steps where deep features are extracted using VGG19 taking 2000 features and finally classification using SVM.Zhang et al. [11] demonstrated the recognition of seven types of diseases in cucumber plant leaves.Here they used the K-means clustering algorithm for the segmentation of the region of disease.Combined shape and color features are extracted from lesion information and then classified using sparse representation (SR).Their method reached an overall accuracy of 85.7%.Zhang et al. [12] offered a procedure for the identification and classification of cucumber disease on the basis of fusing superpixel clustering, hybrid action of EM algorithm, and classification using SVMs and achieved accuracy above 90.00%.Ma et al. [13] constructed 1184 images data for four cucumber disease.They applied a deep convolutional neural network (CNN).They proved that the performance of DCNN considerably better than conventional classifiers (random forest (RF) and SVM) where the recognition accuracy of their proposed method was 93.4%.Zhang and Wang [14] used global-local singular values to extract features and constituting the key-point vector and finally applied SVM to classify cucumber disease.But the main drawback of their new proposed method is that extracts the singular values of a few sub-blocks, needs more computational efforts.Zhang et al. [15] also have proposed a solution for identifying six common cucumber plant diseases by using global pooling dilated CNN.This approach has a disease recognition accuracy rate of over 94.00 percent.Zhang et al. [16] used the current state-of-the-art method transfer learning with EfficientNet.They have developed a classification model for the four types of cucumber disease and achieved model accuracy is 97.00%.They also proved that the most effective method for their study is EfficientNet-B4.

RESEARCH METHOD
This section has divided into four sub-sections named as data description and augmentation, system overview, conventional ML approach, and CNN based transfer learning.The details of the section has listed below.

Data description and augmentation
Some major diseases caused by viruses, bacteria, fungi, nematode, and some non-infectious diseases has affected cucumber [17].These diseases include downy mildew, powdery mildew, mosaic virus, belly rot, scab, and pythium fruit rot (Cottony leak) are used in this work for identification and classification.Sample images of cucumber disease affected has shown in Figure 1.The main reason to use data augmentation in this study is to train the ML models or transfer learning model on more data.Data augmentation techniques can expand the insufficient amount of training data by adding slightly modified on already existing data.The commonly used data augmentation techniques are flipping, rotation, cropping, shifting, scaling, translation, noise, and color jittering.In our work, we selected different methods of data augmentation such as rotation, shifting, shearing, zooming, and flipping to our original dataset.For rotation, a random angle generated from -40 and 40 degrees were set to the images.The images were shifted via width and height range arguments along X-axis and Y-axis by 20%.The images were also sheared by 20% and for zooming, they were zoom-in or zoom-out by 20%.For flipping, we set it horizontally.From the field level, we have collected 525 images data of cucumber.After augmenting the data, we have total 4200 images data.We have partitioned this dataset into train, validation, and test set.Total 20% data has kept for the testing purpose.Among the rest of the 80% data, 20% data used for the validation of the models and the remaining data is for training of the models.

System overview
The proposed system for recognizing cucumber diseases is shown in Figure 2. The system used the proposed technique of disease recognition.The user needs to send the cucumber disease image to the expert system.The system will process the input image and then will apply different image processing algorithms.After analyzing the image with a qualified and training dataset, the system will provide the name of the disease that will be sent to the user device.

Conventional machine learning approach
We have used both the concept of conventional ML approach for cucumber disease recognition.In the agricultural field, ML has developed many applications and tools to help farmers improve productivity by feeding machine-learning systems with the acquired data collection.A review paper showed that ML has been applied in multiple agricultural sectors such as crop management, yield prediction, livestock management, disease detection, water management, and soil management [18].So it is mostly used to increase crop productivity and quality.In our work, our approach proposes a method for recognition of cucumber disease using ML is presented in Figure 3.

Figure 3. Flow diagram for cucumber disease recognition by ML approach
Image acquisition is the first stage of any vision system which the process of retrieving an image from a source.If the image has not been obtained satisfactorily, then the expected outcome may not be achieved.After the image has been acquired, it can be processed through various methods of processing to perform a specific vision-related task.In this work, we have collected about 6 disease images of cucumber.The collected images also include healthy.
It can be difficult to obtain good results by relying solely on the raw images.Processing performed on collected raw data to make it for another processing operation is referred to as data preprocessing [19].So in the second step, this image preprocessing is done to improve the image of intelligibility.In this research, image preprocessing mainly includes image resizing, image filtering, contrast enhancement, and color space conversion.Image segmentation is a method of dividng a visual image into data several segments and turning it into something that easier to interpret by simplifying its representation.The K-means clustering technique has used to segment the image in this study.K-means is very simple to implement and also gives better results proven by Mia [20] and Habib [21].The first step is to convert processed RGB images into L*a*b* color space where L is luminosity layer 'L*' and a*b* is chromaticity-layer.Since RGB color space is highly devicedependent, and L*a*b* space is a device independent space, this conversion is necessary.To start, we must convert RGB image pixels to CIE XYZ tri-stimulus color values as described in [22].The conversion of RGB to XYZ is shown in (1).)) The 'a*' and 'b*' layers contain all of the color information, and K-means clustering is used in segmentation to label each pixel and segment the image by color in the 'a*b*' space.Euclidean distances are used to calculate the difference between two colors.If d (a, b) is the euclidean distance where a, b is two-pixel points then the euclidean distance is presented by ( 6) is given as (6).
Extraction of features of an image is an important process after segmentation to extract the information for the identification of a disease.The extraction of features is essential for identifying disease because it provides key information about its visual representation.In our work, we only focus on texture analyses and statistical features.Texture features such as contrast, energy, homogeneity, correlation, and entropy were extracted using the GLCM.And for statistical features such as mean, skewness, standard deviation, variance, and kurtosis are also extracted from the images.These features are defined by ( 7)- (16). Correlation: Energy: Homogeneity: Mean: Standard deviation: Entropy: ∑ ((  )  2 (  )) −1 =0 Variance: ∑ [(  − ) 2 (  )] Kurtosis: Skewness: Image classification is a technique for categorizing images into predefined classes based on various image features.In this study after extracting 10 features, the classification is performed and compared various classifiers.In our proposed system, there are 7 predefined classes: Belly Rot, Cottony Leak, Mosaic, Scab, Powdery Mildew, Downy Mildew, and Healthy one.
RF is a predictive model for classification tasks that is simple, most flexible, and well suited to multiclass problems similar to our datasets.The RF takes each of the nodes of the tree's prediction and predicts the final output based on the majority votes.In our study, RF performs better than the other ML approaches.
From RF, we have achieved higher accuracy of 89.93 %.

CNN based transfer learning
We have also used transfer learning for cucumber disease recognition.Flow diagram of CNN based transfer learning to recognize the cucumber diseases is shown in Figure 4.The main objective of this paper is to improve the accuracy of disease identification.Although we have achieved good accuracy through a ML approach but to understand the performance of the traditional ML approaches compared to other existing approaches, we also studied transfer learning of plant disease identification and considered choosing the concept of transfer learning.Transfer learning is a method that usually consists of models that have gained knowledge through working on a problem from one domain in different but similar domains when there is limited training data [23].In recent years, transfer learning has shown its effectiveness in classification problems and it has arisen as a novel learning method for plant disease identifications.There is a variety of models in transfer learning but we have used and compared three pre-trained models -InceptionV3, MobileNetV2, and VGG16 for our work.To implement, we used the pre-trained weight of the architectures on the imagenet dataset of 1000 classes.The details of MobileNet architecture for the ImageNet dataset has been explained by Rahman [24].As we worked on seven classes, so for the training purpose, we discarded 1000 neurons from the output layer and added 7 neurons in the output layer.We made all the layers trainable when we performed transfer learning with MobileNet architecture.To reduce the overfitting, we used a dropout layer with rate of 0.5.

RESULTS AND DISCUSSION
In this experimental study, six diseases of cucumber and one healthy class are considered.They are downy mildew, powdery mildew, mosaic virus, belly rot, scab, pythium fruit rot (Cottony leak), and healthy plant.Here, we have used two different approaches for cucumber disease recognition.The first is a traditional ML approach, and the second is a CNN-based transfer learning approach, which yielded the best results in terms of cucumber disease recognition.
The very first step we followed is to image acquisition.In the ML approach, image preprocessing is performed through image resizing, filtering, and contrast-enhancing after collecting the images.The RGB image is converted into L*a*b* color space using color space conversion.To extract the feature, the diseaseaffected segment is chosen from the three clustered images using K-means clustering with k=3.Using two feature sets-5 texture and 5 statistical features are retrieved from segmented images.Then various classifiers like RF, IBk, and KStar are used and also compared for classification to understand the individual model's performance.The stepwise effect of changes from image acquisition to image segmentation is depicted in detail in Figure 5.The value of extracted features is illustrated from a pair of two diseased cucumber presented in Figure 6.
However, we got good results in the ML approach but for the better performance, we also studied transfer learning.When it comes to transfer learning, first, we augmented data then we divided the data into three different sets named train, validation and test set into a specific ratio, e.g., 60% for training, 20% for validation and 20% for testing purpose.After data augmentation, three of the pre-trained models are applied.
We have analyzed the performance of the traditional ML approach and CNN-based transfer learning approach through the performance metrics of a classification model [25].We plotted the confusion matrix generated by each of the model.The multiclass confusion matrix, M is an n * n square matrix that has n rows and n columns, totaling n 2 entries [26].Each model produces a 7*7 confusion matrix as we have worked on seven classes.Performance evaluation matrices for multiclass confusion matrix is shown in [27].The following formula is used to calculate the performance evaluation metrics in percentage: accuracy, precision, specificity, sensitivity, FNR, and FPR.
In Table 1, the generated confusion matrix for each of the applied models has shown.Here, 'A' denotes Belly Rot, 'B' denotes Pythium Fruit Rot, 'C' denotes Mosaic Virus, 'D' denotes Scab, 'E' denotes Powdery Mildew, 'F' denotes Downy Mildew and 'G' denotes Healthy.
We have calculated several evaluation metrics using confusion matrix.From Table 2, it has emerged that with classification accuracy of 93.23%, MobileNetV2 has outperformed other approaches.Table 3 presents the results of class-wise metrics of RF for the individual class where it has observed that during the classification of the Downy Mildew class, the classifier RF achieved the maximum accuracy of 92.14%.The precision, specificity, sensitivity, FNR, and FPR for Downy Mildew class are 71.43%,95.00%, 75.00%, 25.00%, and 5.00% respectively which are significant enough compared to other classes.
Table 4 presents the results of class-wise evaluation metrics of IBk classifier for the individual class where it has observed that during the classification of the Downy Mildew class, the classifier IBk achieved the maximum accuracy of 91.43%.The precision, specificity, sensitivity, FNR, and FPR for Downy Mildew class are 72.22%,95.83%, 65.00%, 35.00%, and 4.17% respectively which are significant enough compared to other classes.
Table 5 presents the results of class-wise evaluation metrics of KStar classifier for the individual class where it has observed that during the classification of the Downy Mildew class, the classifier KStar achieved the maximum accuracy of 90.71%.The precision, specificity, sensitivity, FNR, and FPR for Downy Mildew class are 71.88%,96.25%, 57.50%, 42.50%, and 3.75% respectively which are significant enough compared to other classes.
Table 6 presents the results of class-wise evaluation metrics of InceptionV3 for the individual class where it has observed that during the classification of the Healthy class, the model InceptionV3 achieved the maximum accuracy of 95.71%.The precision, specificity, sensitivity, FNR, and FPR for Healthy class are 97.73%,99.72%, 71.67%, 28.33%, and 0.28% respectively which are significant enough compared to other classes.
Table 7 presents the results of class-wise evaluation metrics of MobileNetV2 for the individual class where it has observed that during the classification of the Downy Mildew class, the model MobileNetV2 achieved the maximum accuracy of 99.17%.The precision, specificity, sensitivity, FNR, and FPR for Downy Mildew class are 94.49%, 99.03%, 100.00%, 0.00%, and 0.97% respectively which are significant enough compared to other classes.
Table 8 presents the results of class-wise evaluation metrics of VGG16 for the individual class where it has observed that during the classification of the Downy Mildew class, the model VGG16 achieved the maximum accuracy of 99.76%.The precision, specificity, sensitivity, FNR, and FPR for Downy Mildew class are 98.36%, 99.72%, 100.00%, 0.00%, and 0.28% respectively which are significant enough compared to other classes.
Image processing methods for plant disease identification play an important role in agriculture, which is why it has become an important research topic.Researchers for disease recognition have proposed many techniques.Each of these techniques and algorithms has its own set of limitations and fails.However, some of them are appropriate for use in this field because of their strengths.Comparative performance of all the work for cucumber disease recognition has presented in Table 9      However, the performance of any method depends on the amount of data, hardware dependency, and computationally expensive.Though most of the techniques achieved good accuracy, in some paper [6], [7], [9] the amount of data is insufficient which could affect the model's training and the capacity to correctly recognize the diseases.Some studies [5]- [9], [14], [16] worked on very few diseases of cucumber.While some approaches, such as deep CNN, hyperspectral-imaging technology, CNN have good accuracy, they also have a high computational expense, hardware dependency, high cost.In comparison to other works, we can say that our method yields a better result.However, there is still some need for improvement.Therefore, the future works could be to increase datasets and work on a much broader variety of cucumber disease.

CONCLUSION
Modern techniques such as automatic recognition of disease should be available to farmers so that they can give healthy and profitable cucumber crops.Our proposed solution will help farmers to grow more crops by detecting and identifying diseases easily that will ensure sustainable economic growth through increased quality and quantity of crops.Here, we compared traditional ML and transfer learning approaches.After capturing the images of cucumber, preprocessed was done by resizing, filtering, and contrast-enhancing.To segment the images, we choose k-means clustering.After segmented images, 10 features are extracted.In traditional ML, RF achieved high accuracy of 89.93%.To understand the performance compared to other approaches, we also investigated transfer learning.We found that the MobileNetV2 model of transfer learning achieves the highest accuracy with 93.23% among these two approaches.

Figure 2 .
Figure 2. Proposed expert system for cucumber disease recognition

−
Resizing images are needed for further operation of images.− Image filtering (noise removal, smoothing of images) is done by using various filtering techniques.− Contrast enhancement is a technique to improve image quality with clear visibility of image features.The technique we use for contrast enhancement is histogram equalization.

3437 Figure 4 .
Figure 4. Flow diagram of CNN based transfer learning to recognize the cucumber diseases

Figure 6 .
Figure 6.The value of extracted features is illustrated from a pair of two diseased cucumber images where two are correctly classified and the other two are incorrectly classified, (a) actual disease class, (b) acquired image, (c) image after segmentation, (d) extracted feature vector, (e) recognized disease

Table 2 .
Performance evaluation metrics using six models

Table 3 .
Class-wise evaluation metrics using Random Forest

Table 4 .
Class-wise evaluation metrics using IBk

Table 9 .
Comparative analysis of our work and existing work for cucumber disease recognition