Low resource deep learning to detect waste intensity in the river flow

Received Oct 28, 2020 Revised Apr 3, 2021 Accepted Jul 8, 2021 Waste has become a significant problem in Indonesia, especially in the capital city of Jakarta due to many disasters caused by it. The one cause of flooding is the blockage of river flow by waste. The monitoring of litter is essential to find out the waste intensity in the river. The research was formed which aims to produce an application that can detect, track, and calculate river waste using YOLO v3 algorithm. This research was done in order to simplify the process of monitoring waste in the river and can calculate waste using video. This research uses 340 images directly from photos and videos, captured by researchers-detection of waste processed frame by frame by changing video into several structures. From the acquired result from the experiment, it's proven that YOLO v3 can be used for detection and counting waste recorded on video. The result of this research is an application that can detect waste and it is able to detect said objects with 98.74% of confidence from case video.


INTRODUCTION
In big cities, garbage is a logical consequence of residential areas, but garbage also often causes flooding. Based on research, Indonesia is the second-largest waste producer after China [1]. In a study conducted by Jambeck. in 2015 entitled plastic waste inputs from land into the ocean, Indonesia has contributed 3.22 million metric tons (millions of metric tons/MMT) of plastic waste [1]. President Jokowi revealed that the floods experienced at the beginning of the new year 2020 were caused by damage to the ecosystem and ecology and because there were still many people who littered [2]. The ministry of environment and forestry (KLHK), also said that the amount of waste piles produced nationally is 175,000 tons per day [3]. It's quite clear that waste is a big problem in Indonesia, especially in the capital. The concern of urban communities not to throw garbage in the river flow is essential, so that river flow is not obstructed which eventually causes flooding, on the other hand, the government needs to make efforts to monitor the discharge of waste in the river flow.
Several studies to detect waste have been conducted by other researchers before. Zhihong et al. [4] In the July 2018 article "multi-task detection system for garbage sorting base on high-order fusion of convolutional feature hierarchical representation", focus are on detecting waste through images with complex backgrounds. Succeeded in detecting even small objects on a complex scene, before being applied to the manipulator to perform the sorting by KUKA's robotic hand [4]

2725
"Riverine Plastic Litter Monitoring Using Unmanned Aerial Vehicles (UAVs)" in 2019. They monitored plastic waste using UAVs to obtain spatial data on rivers and garbage in the Klang River, Malaysia. This research succeeded in detecting debris during high tide on 30 April and 1 May, the density of waste occurred in the middle of the river, and on 30 April, and 1 May, the highest density of plastic was observed around 48m from the south riverbank [5].
A. Chung et al. [6] also carried out waste detection using micro-UAV in an article entitled "cloud computed machine learning-based real-time litter detection using micro-uav surveillance" in 2018. They compared the accuracy of waste detection using the convolutional neural network (CNN) method, support vector machine (SVM), single shot multibox detector (SSD, region-based fully convolutional network (R-FCN), you only look once (YOLO), custom ensemble results, and bagging ensemble results with captured data using UAV. This study found that the bagging ensemble method can perform real-time detection using UAVs with the highest accuracy compared with other methods [6].
Salimi, Dewantara, and Wibowo also presents trash detection with a smart trash bin robot using support vector machine (SVM) to classify the trash in an article "visual-based trash detection and classification system for smart trash bin robot". They first use the Haar-Cascade technique to detect the trash presence around the robot. Besides using Haar-Cascade, they also used gray-level Co-occurance matrix (GLCM) to obtain the characteristic of texture drawn from the statistic of gray intensity values from the image and the oriented gradient histogram (HOG) to calculate the appearance of gradient orientation in the localized image portion. SVM used in their experiment to classify the features into organic waste, nonorganic waste, and non-waste. Their experiment found that offline testing of classification system using 5fold Cross-Validation method obtain 82,7% of accuracy and online testing of detection and classification system obtain 63.5% of accuracy [7].
Trash detection is also researched by Mikami et al. in "DeepCounter: Using Deep Learning to Count Garbage Bags". The researchers calculate the amount of collected garbage (both flammable and nonflammable) taken from a camera placed behind the garbage truck, so they can see the distribution of the garbage from each city. Single shot multibox detector (SSD) algorithm is used on their experiment to detect the object. They use 1600 images of the dataset that have been annotated. A desktop NVIDIA GTX 1080 and NVIDIA Jetson in the experiment is utilized to compare the result between Original SSD and Tiny SSD. It was found that the results of Tiny SSD are better than the original SSD using the aforementioned hardware [8].
Litter detection in marine is also researched by Fulton et al. in the journal "robotic detection of marine litter using deep visual detection models". Along with the aforementioned research, some research that using robotics in marine for navigation and localization like "AEKF-SLAM: A new algorithm for robotic underwater navigation" journal [9] and navigation technologies for autonomous underwater vehicles journal [10] has been done to evaluate the feasibility of litter detection. The dataset that they use was sourced from the J-EDI (JAMSTEC E-library of Deep-sea Images) dataset of marine debris. J-EDI dataset provides typespecific debris data in short video format that contain images captured from real-world environments. Their training data was drawn from videos between 2000 and 2017 that labeled as containing debris. Four architectures networks were used in their experiment like YOLO V2, Tiny-YOLO, faster RCNN with Inception v2, and single shot multibox detector (SSD). From the experiment above, Faster R-CNN has the highest plastic AP with a score of 83,3 AP, followed by YOLO V2 (fine-tuned) with 82.3 AP, Tiny-YOLO with 70,3 AP, and SSD with 69,8 AP. But, Tiny-YOLO has the highest performance comparing with 3 other mentioned architecture with 205 frames per second using GTX 1080 [11].
This phenomenon has attracted researchers to apply technology to detect the amount of waste in river flows so that at least they can predict when the river should be cleaned. Experiments were carried out in watersheds where there was still visible garbage dumping. In observation, the waste that passes through the river is dominated by plastic and styrofoam waste of various sizes. Therefore, this study aims to create a system for the detection and counting of garbage in the river flow using the you only look once (YOLO) v3 method. The YOLO v3 method was chosen because there was small trash on the learning data record and through several experiments conducted by researchers, the YOLO v3 method was able to detect all objects on a large, medium, and small scale.
In this study, the researcher carried out the detection and calculation of waste passing through the river using the YOLO v3 method. The learning dataset used in this study was collected directly by taking photos of the watershed. The images used were 340 photos. The application was tested using video recordings showing trash in the river flow; the video used was taken from several river flow locations. From the experiments conducted, it can be seen that the resulting application can detect small size trash such as food wrappers to large sizes, the application can also calculate how much waste is seen in the video recording. This detection and counting waste application can work more effectively with a video containing footage of rivers with minimum sunlight reflection throughout the video.

RESEARCH METHOD
Experimental data were taken from river basins in Jakarta where trash was often seen passing through the river. Observations were made to determine the smooth flow of the river and the amount of waste seen in the river flow. This observation was carried out by researchers to find data collection locations. Some of the things that were taken into consideration were: the smooth flow, discharge, and variation of waste and river clarity, the clearer the river, object detection was difficult because the reflection of other objects in the water made it difficult for the algorithm to detect waste. The watershed that was selected as training data collection is a river flow area with a reasonably smooth flow, variations in the size and type of garbage, and clear water conditions (it can reflect other objects). Three hundred seventy photos of debris and video recording of river flow at several points as many as seven videos, which were used as training and testing data in this study, were taken using a smartphone. These photos will then be resized to reduce the resolution and size of the image; all this data will then be uploaded to Google Drive. Experiments were carried out using the help of the google collab engine. This experiment using the Google colab engine because the researcher can utilize its high-end GPU to process images faster, instead of used the researcher's local GPU and CPU [12]. GPU is very important in completing things related to graphics [13]. The researcher is preferred to use GPU than CPU because the GPU can run 20 to 40 times faster than the CPU [14], [15].
The literature review was carried out by researchers to determine the methods and algorithms that can be used to detect objects in the form of garbage. Unlike the YOLO v2 which often struggled with small object detection [16], the results of literature studies and experiments conducted have found that YOLO v3 is most suitable for waste detection because the algorithm can detect objects that vary from small to large [7]. The usage of YOLO v3 is also a good choice for this experiment because the majority of data used in this experiment also have a small sized object [17]. After the data acquisition was completed, the researcher annotated the data. Annotations are performed on all images in the dataset. Annotation is done to label objects and takes the coordinates of the items in the picture. The result of this annotation is object information in the form of the center point, coordinates, and class. Furthermore, this image dataset is divided into two groups: training data and validation data with a composition of 90% training data and 10% validation data. The next step is to create a model using YOLO V3. YOLO v3 training architecture design is using the architecture from Darknet-53. Darknet is the open-source neural network framework written in C and CUDA [18]. This Darknet 53 is much powerful than Darknet 19, but still more efficient than ResNet-101 [19]. The architecture of Darknet-53 can be seen in Figure 1. CUDA is used because this software has reliable architecture and the language is almost similar to C, so it is easy to understand [20]. Feature from the last three residual blocks will be used to feed them into the detector. The detector will produce 3 different results for each scale. The results are for large scale, medium scale, and small scale. The process of the detector can be seen in Figure 2. After the model is generated, the next step is to create a waste counting system. Waste counting is done by creating a counter line, objects that are detected as garbage, and crossing the counter line will be counted as garbage. In this detection and counting waste experiment, the researcher adjusted several YOLO V3 parameters to suit the research context. Changes were made to a batch, subdivision, max_bactches, and steps parameters. Changes were also made to filters and classes, [yolo] and [convolutional] sessions. In the waste counter detection and creation stage, object detection is carried out at each frame in the video. Objects that are detected in the frame and have a confidence value> 0.5 are stored in a new variable to retrieve object offset information. Object Offset contains object coordinates, objectiveness, and object class (x, y. W, h, objectiveness (0/1), class). The system runs non-max suppression for all objects with more than one bounding box. This non-max suppression is done to take the bounding box with the highest confidence value. Bounding boxes and determining the object's center are drawn around the item from the offset obtained. The center point on the detected object will be used as a reference for calculating waste. If the midpoint is intersected with a counter line, it is counted as waste. The system path in this study can be seen in Figure 3 in appendix. Figure 2. Multi-scale detector (with some changes) [21] This waste detection and counting continue at every frame, from the beginning to the end of the video. The final result of waste detection and counting produces a new video containing elements of the amount of waste, counter lines, and trash objects surrounded by bounding boxes, class labels, and the size of their configuration. Testing of the application is carried out using input in the form of video. This waste detection and counting application were created using the Python 3.7 language. The results of the video processing that has been done before producing new videos with .avi format.

RESULTS AND DISCUSSION
Modeling in this study uses the YOLO v3 algorithm using the Darknet 53 architecture. Training is carried out on YOLO v3 using 106 layers with convolution, shortcuts, and feature combination (concatenate). The architecture used in YOLO v3 can be seen Figure 4. This study also prepares a pre-trained weight model of Darknet-53 architecture as a medium for transfer learning. Transfer learning is used in this study so that researcher did not have to re-train the entire network from scratch [22], [23]. The training data is carried out using 2000 iterations with a comparison of the train data and validation of 9: 1. The graph results of the training and data validation that has been done can be seen in Figure 5. YOLO v3 accuracy can be seen from the resulting average precision (AP) and mean average precision (mAP). The final AP result obtained in this training is 64.43%, while the best AP result obtained is 66.07%. The mAP obtained was 0.654692 or 65.46%. The loss gained during the training period is 0.9593.
Waste detection and calculation are carried out on video testing, video recording is entered into the application to detect waste that appears on the video. The detected trash is seen as an object with a box around it, on this detection, the confidence value is also displayed, the greater this value, the more confident the application will recognize the object as trash. For waste calculation, the researchers drew guidelines on the video. This guideline is used as a reference; if the centroid point of the object that is detected as waste intersects with the guideline, the application will count it as garbage. The calculation using the guideline in this experiment is using the sort method from Alex Bewley et al. [24] in their journal titled "Simple Online and Realtime Tracking" [24]. Initiation of the guideline is used to trigger counter from sort method. From the experimental results using data such as in Figure 6, the application is able to detect 26 objects as rubbish, including dry leaves that fall on the river surface. This detection is done frame by frame. According to the result, the object that can be detected as waste is the object that has such a minimum or no sunlight reflection around its area. Most objects can not be detected as waste when the object passing through the reflection of sunlight even the train data of the object has been annotated under the reflection of sunlight. The results obtained are same as the statement in the journal "Estimation of Sunlight Direction Using 3D Object Models" which states about the object to sunlight [25]. The program may fail to keep track of the detected waste, this may be caused by the lack of training data variance for the waste images. This may pose a problem whenever the waste passing through the guideline when it is not detected. Hence, collecting objects from many various perspectives for train data and picking the right area for the guideline is necessary for effective counting. In this experiment, the guideline is set manually from several points of view (based on reflection), and the guideline which according to the author is optimal is chosen. The guideline used in Figure 6 is chosen because it has less reflection of sunlight and it is also the area mostly traversed by waste.

CONCLUSION
From the results of experiments conducted in the study, it can be concluded that the application of the Convolutional Neural Networks algorithm in the YOLO V3 architecture can be implemented to detect waste in river flows. The resulting application can also count the detected waste in the video recording. The experimental results show that the final AP value obtained in training is 64.43%, the best AP value obtained during the training period is 66.07%, and the average AP (mAP) received is 0.654692 or 65.46%. The highest waste detection accuracy obtained was 98.74% with a loss during the training period of 0.9593. These results indicate that the researchers succeeded in detecting and calculating waste in river flows using the CNN algorithm on the YOLO V3 architecture with low computation resources. Although this study succeeded in answering the research objectives, the resulting algorithms had not yet correctly detected objects that were hit by sunlight reflection; the researchers suggested that changes in parameters or the use of other learning algorithms were made to correct these deficiencies. This system can be try to develop using CCTV with the real-time case for future enhancement. It can try with another new detection algorithm like SSD, or YOLO V4 also.