Growth predictions of lettuce in hydroponic farm using autoregressive integrated moving average model

ABSTRACT


INTRODUCTION
Agriculture is the second largest supporting sector for Indonesia's economy, which is strongly related to the food demand in the country.The purchasing power of the people in Indonesia greatly influences food security.The impact of COVID-19 has dramatically affected the economic situation in the country, so controlling food prices is one of the strategies to overcome the problem [1].There are several alternatives for the household to support their daily needs by harvesting foods on their own, particularly vegetables.Hydroponics is one of the farming techniques that is often used by modern society.Because of their limited land, this farming method has several advantages, such as not requiring a large area, growing faster, and having higher quality yields.Nevertheless, several parameters play a vital role in determining harvesting success when farming using hydroponic techniques, such as total dissolved solids (TDS), pH, humidity and air temperature, solution temperature, and light intensity [2].If one or more of these parameters are not suitable for the growth of hydroponic lettuce plants, it will cause them to grow abnormally and even die [3].

3563
IoT solutions in many industrial environments can lead to innovative, productive, and precise systems development to increase efficiency in every intelligent operation.The IoT and computer vision approaches combined with machine learning (ML) algorithms and data mining to evaluate the system's performance are used by researchers.Computer vision has different applications in smart farming, especially in hydroponics, such as monitoring plant growth, estimating crop yield, and measuring nutrient constituents.Develop a model in computer vision using ML algorithms to instruct computers to perform complex tasks through regression, diagnosis, planning, and recognition by learning from historical data.Thus, data and algorithms are considered fundamental to the performance of ML models.High quality data and more extensive data sizes are essential for ML model accuracy.
For example, many researchers utilize computer vision technology in agriculture to recognize crops that benefit from non-destructive and convenient characteristics [4].This study has undiscovered techniques to improve, especially real time detection speed and dead seedling observation.The most popular features of the computer vision technique are image recognition, texture, color extraction, shape identification, and other characteristics of the picture.In addition, it distinguishes targets in terms of different characteristics.
This paper proposes an IoT hydroponic plant monitoring system that utilizes a few sensors and a camera as input data to measure and predict plant growth parameters.Using a time series forecasting algorithm called autoregressive integrated moving average (ARIMA), the data will also be used to forecast hydroponic plant growth for the next few days.The performance of each algorithm will be evaluated with the mean absolute error (MAE), root mean squared error (RMSE), and mean absolute percentage error (MAPE) value [5].On the other hand, the ARIMA model complexity measure performance is evaluated with Akaike information criterion (AIC) and Bayesian information criterion (BIC) [6].In the end, the model with the best performance will be used to predict plant growth which aims to provide information to hydroponic farmers so that they can take preventive measures to produce higher quality crops.

LITERATURE REVIEW
Over the past few years, there has been a development in ML assisted agriculture as part of applications in the IoT for precision agriculture.This interest is driven by the benefits of the practice in increasing agricultural productivity, sustainability, and profitability while increasing food security [7].As a result, the ML algorithm has gained attention in many fields.The most known algorithms include linear models, support vector machines, clustering, decision trees, random forests, neural networks, and clustering.Additionally, in computer vision, data sources such as RGB color, visible light, thermal infrared image, near 3D, and spectroscopy to measure and analyze features like texture, shape, spectrum, and color.
A combination method on the regional center of cross-border leaves and a methodology to improve watershed segmentation on overlapping leaf images were developed to find the leaf area on every seedling in the plug tray [8].This vision system was developed to measure leaf area in each leaf cell to distinguish good leaves based on proportion in area.The top view of the seedlings and the method for calculating each seedling leaf area in the plug tray were investigated and the detection process was developed.On the other hand, a crop segmentation method (AP-HI) was used to automatically detect two critical growths of maize seedlings.The result gives a high performance of 96.68% and could outstand exterior environment conditions [9].Meanwhile, over-segmentation and sensitivity to false edges were the limitations of this method.
Furthermore, according to the feature position, the over segmentation block projection technique was utilized to locate the crown of maize seedlings captured by the camera.The camera image plane is parallel to the ground.Hence, the center of maize seedling roots was obtained.Arbitrary, although these approaches give high efficiency and uncomplicated calculation, they are unsuitable for diversified environmental conditions and destitute versatility.Moreover, they have low robustness and are sensitive to noise with low robustness [10].
The combination of time series-forecasting algorithms and computer vision has become one of the solutions to overcoming problems in the agricultural sector, especially in forecasting growth and the variables that affect growth.For example, research conducted by Srivani et al. [11] used the long short term memory (LSTM) algorithm to predict the value of the root zone temperature (RZT) in an indoor hydroponic.The main contribution of this study is to analyze which hyperparameter combination shows better performance with the smallest average RMSE error.The study aims to design a model that can predict environmental changes so that it can adapt and control the actuators automatically.The use of computer vision can also be used to see the response of plants under a given environmental condition.According to Story and Kacira [12], utilize cameras and computer vision to continuously monitor color, morphological, textural, and spectral (crop indices and temperature) features from a crop that can monitor plant growth and health status and improve controlled environmental conditions results.Our research paper is conducted on hydroponic lettuce seedlings as the objective and proposes forecasting hydroponic plant growth for the next few days.A time series forecasting algorithm called ARIMA was used to detect them.To construct the dataset, the prediction model can better recognize the growth of hydroponic lettuce through computer vision data extraction.

METHOD
This study uses two main concepts for data collection: machine to machine (M2M) communication and the IoT.M2M allows a machine to communicate with other machines without requiring human interference.IoT also supports this concept where the machine must be connected to the internet network to communicate or exchange data [13].These two concepts are used explicitly on a Raspberry Pi 3B+ and a Wemos D1 R32 (microcontroller) connected to various sensors in collecting all the data needed in this study.

Device setup
The Raspberry Pi 3B+ is used as a web server where the Thingsboard IoT platform is installed with the addition of a PostgreSQL database to store all the sensor data [14].On the other hand, a small yet powerful Raspberry Pi camera module is also added.The camera automatically takes photos of plant growth every four hours.The sensors used in this experiment are pH, TDS, temperature, and humidity, which are directly connected to the microcontroller as illustrated on Figure 1(a).
The device is connected to the local internet and sends data every 2 minutes into the PostgreSQL database.An access point on the system also to communicates with the Raspberry Pi and the microcontroller Wemos D1 R32 is used for data acquisition, as presented in Figure 1

Data collection
Data were collected at the uncontrolled greenhouse for eight days from 8-15 June 2022.Sample images are taken every 3 hours.However, the data at night does not appear to be used in the analysis because greenhouse lighting depends only on sunlight.There are no additional lights for lighting during nighttime.While the data from the sensors to measure pH, TDS, temperature, and humidity were recorded every 5 minutes.

Image collection and image processing
Photo collection, as seen in Figure 1(b) using an additional camera module connected to a Raspberry Pi, is intended to show plant growth every hour.The photo will also undergo several image processing to extract the leaf area from the photo.In this study, a software called ImageJ.ImageJ (fiji) is a Java based image processing program developed by the National Institutes of Health and the Laboratory for Optical and Computational Instrumentation.ImageJ can read various image file formats, including PNG, GIF, JPEG, DICOM, FITS, and RAW.Some of the uses of this program are to display, edit, calibrate, measure, and analyze image data.ImageJ is often used in various fields, such as biology, earth science, fluid dynamics, astronomy, computer vision, and signal processing [15].The ImageJ software is intended to measure the area (area) of lettuce leaves based on photos captured by the Raspberry Pi.The algorithm used by ImageJ in measuring is a light threshold and pixel counting.The light threshold is done by converting the image into an 8-bit format (grayscale) and maximizing the image's contrast to distinguish between lettuce leaf objects and the image's background.The pixel counting calibration uses an object whose size is known by manual measurement (such as using a ruler).In the next step, ImageJ calculates pixels number per centimeter as a reference to measure the lettuce leaf area.
The ImageJ software can perform image processing series such as editing, calibrating, measuring, and analyzing image data.Two methods to convert image data into leaf area are pixel counting and light threshold.Pixel counting is a method to measure an object according to another object's length.As a result, ImageJ can easily find the scale that will be the reference in pixels per centimeter.The light threshold is the other method to adjust an image's color in RGB format.This process aims to separate the leaf object from its background.Therefore, the region of interest (ROI) selection is accurate.Figure 2 is an example of output from the ImageJ software.According to the camera's photo, ten specimens or samples are present in the frame.Sample number 3 in Figure 3 has a significant size compared to the others.Hence, it is interesting to monitor its plant growth to be used as a reference point in the dataset.The line chart of lettuce leaf area growth in the dataset shows that sample 3 had the large leaf area on day eight among the best four samples, as illustrated in Figure 4.In contrast, the other samples did not experience significant growth.Therefore, data from sample 3 will be used for time series forecasting models in the following part of this paper.The uneven distribution of nutrient fluids can cause a difference in growth, so hydroponic plant seeds' absorption is not optimal for each sample [16].

Time series forecasting model
The model creation process is carried out using one of the Python statistics packages.The process begins with testing the dataset's stationarity and determining the model's order and lag using auto correlation function (ACF) and partial auto correlation function (PACF) plots [17].First, the model was built using the time series data (sensor and image).Then, the dataset stationarity test was carried out using an augmented dickey fuller (ADF) statistical test [18].
The parameter that becomes the reference for determining the stationarity of a dataset is the p-value.If the p-value is less than 0.05, then the dataset is stationary.The dataset is not stationary if the p-value is more significant than 0.05 [19].When the dataset is not stationary, it can be differentiated by subtracting each value from the previous value (t-(t-1)) [20].The goal is to make the mean, variance, and standard deviation constant.
ARIMA model is a time series forecasting algorithm that uses one variable (univariate) in modeling without considering other variables.ARIMA combines auto regressive (AR) and moving average (MA) [21].Both models use the historical value to make a prediction and previous error values to make a prediction, respectively.The autoregressive integrated moving average with exogenous variables (ARIMAX) is the ARIMA model with additional multivariate properties [22].The ARIMAX model considers external or exogenous variables to predict future data.
The seasonal auto-regressive integrated moving average (SARIMA) model is a time series forecasting algorithm aimed at forecasting datasets that have seasonal patterns [23].A seasonal pattern can be defined as a pattern that keeps repeating itself over a particular time, for example, daily, weekly, or even yearly.The ARIMA model's parameters are (p, d, q).In contrast, in the SARIMA model, the parameters used are (p, d, q) (P, D, Q) m.The additional parameters in the form of (P, D, Q) m are used to determine AR, differentiation, and MA orders from the SARIMA seasonal pattern.The SARIMAX is the SARIMA model with more than one non-independent variable (multivariate).The SARIMAX model considers external or exogenous variables to predict future data [24].

Auto correlation function and partial auto correlation function
ACF and PACF plots are used to determine orders from ARIMA, ARIMAX, SARIMA, and SARIMAX models for both seasonal and non-seasonal orders.The ACF plot is used to determine the order of the MA, which measures the direct and indirect effects on the dataset at a certain lag [25].At the same time, the PACF plot is used to determine the order of AR, which only measures the direct effect on the dataset at a certain lag.

Determining the model
The first step in making a model for time series forecasting is a stationarity test of the dataset.It used a statistical test called ADF.It is an indicator of whether a dataset is stationary through the p-value.The dataset became stationer if the p-value was below 0.05.On the contrary, if above its minimum value, the The dataset was analyzed to illustrate the process.After the differentiation process, the p-value decreased until equal to 0.0013.Due to the differentiation process being twice, the dataset's order d equals 2. The following step is to determine the p and q values.It is observed that the PACF lag 1 is quite significant since it is well above the significance line; hence the p AR value is 1.On the other hand, that 1 of the lags is out of the significance limit in the ACF plot, so it is said that the optimal value of q MA is 1.
In this paper, we experimented by varying the values of p and q from 1 3.In contrast, the value of d for differentiating is 2. Next, we looked at the performance of each model with these different parameters and combinations.In addition, specify the model's order using ACF plots and PACF plots.A python library called 'pmd_arima' can be used in a function [26], [27].
The function inside the library package called 'auto_arima' can automatically generate models with various order combinations (p, d, q) [28].This function will provide output and information in the form of a model with the smallest error value and its order.The results of the auto_arima function obtained the order (p, d, q) is equal to 2, 2, 1.The auto_arima function provides information that the model with the best order combination produces the minor error value.

Evaluation criteria
Several criteria need to be evaluated to determine the model with the slightest error.The criteria that are used for ARIMA models in this study are RMSE, MAE, MAPE, AIC, Akaike information criterion bias corrected (AICC), and BIC.The formulas to calculate each parameter are stated in Table 1.Where Ө is maximum likelihood function, n is observation number, k is a number of model parameters, Yt is observation time, and Ŷt is estimated observations values.

RESULTS AND DISCUSSION
The statistical results of the evaluation criteria from several models are compared, such as the mean, standard deviation, minimum, and minimum values presented in Table 2. Based on the analysis, results from the standard deviation for all parameters are relatively low, meaning that the data are clustered around the mean.Therefore, the selected ARIMA model comparison has minor significant differences.5(a), it can be seen that the most significant error of all performance comparison parameters is ARIMA (1, 2, 1), where this parameter is the first guess based on looking at ACF and PACF in the previous section.Because it is necessary to iterate over the ARIMA parameters with different parameters, it is found that the results that have a minor error are ARIMA (2, 2, 1) and auto_arima also recommends these parameters.In Figure 5(b), ARIMA (2, 2, 1) shows consistency by having the lowest value on the compared three parameters.Hence, it is the most suitable parameter for the lettuce hydroponics plant dataset.Later, the model will forecast the development of lettuce plants on a hydroponic system.Since ARIMA (2, 2, 1) model fits the lettuce growth data, as shown in Figure 6(a), the trend between measured data and predictions is matched.Therefore, it can directly forecast the area for the next three days out of eight days (three samples each day).Figure 6(b) presents the actual and forecasted area in cm 2 with a 95% confidence limit.
The forecasted values indicate that the lettuce hydroponics growth will continue to rise.Keep in mind that the result is a predicted value, but the hydroponic planting is a dynamic that depends on the surrounding environment.Hence, we should pay attention to climatic conditions, nutrition liquids, and pest attacks.Proper adjustment and decision-making in operation are necessary to maintain the growth trend and control to prevent crop failure.

CONCLUSION
This paper discusses finding the best ARIMA parameter and forecasting hydroponic lettuce growth based on eight days of data.Time series plots, ACF, and PACF, were used to test data stationarity.The ARIMA model with different orders of AR (p) and MA (q) were compared.The best model is ARIMA (2, 2, 1) because it has a minimum value for all compared performance criteria.
The performance ARIMA (2, 2, 1) time series forecasting model in predicting hydroponic plants' growth gives the smallest value of RMSE, MAE, and MAPE with 0.97, 0.94, and 0.04, respectively.The real time monitoring using sensors and cameras will significantly ease the work of farmers.Another benefit is that the system can predict plant growth over the next few days.Therefore, it can be used as a reference to make an early decision to produce high quality crops.

Figure 2 .
Figure 2. The light threshold to separate the lettuce leaf object from the background

Figure 3 .
Figure 3. Sampling on the dataset

Figure 4 .
Figure 4. Time series hydroponics lettuce growth area Bulletin of Electr Eng & Inf ISSN: 2302-9285  Growth predictions of lettuce in hydroponic farm using autoregressive … (Muhammad Zacky Asy'ari) 3567 dataset should be differentiated to eliminate the existing trend and value variance.It occurs by subtracting each value from the previous value (t-(t-1)).

Figure 5 .
Figure 5. Simulation of result; (a) accuracy measure and (b) complexity of the ARIMA model

Figure 6 .
Figure 6.The result of ARIMA model in; (a) actual vs prediction plot and (b) three days ahead forecast plot

Table 1 .
Performance criteria calculation