Automatic time series forecasting using nonlinear autoregressive neural network model with exogenous input

This study aims to determine an automatic forecasting method of univariate time series, using the nonlinear autoregressive neural network model with exogenous input (NARX). In this automatic setting, users only need to supply the input of time series. Then, an automatic forecasting algorithm sets up the appropriate features, estimate the parameters in the model, and calculate forecasts, without the users’ intervention. The algorithm method used include preprocessing, tests for trends, and the application of first differences. The time series were tested for seasonality, and seasonal differences were obtained from a successful analysis. These series were also linearly scaled to [−1,+1]. The autoregressive lags and hidden neurons were further selected through the stepwise and optimization algorithms, respectively. The 20 NARX models were fitted with different random starting weights, and the forecasts were combined using the ensemble operator, in order to obtain the final product. This proposed method was applied to real data, and its performance was compared with several available automatic models in the literature. The forecasting accuracy was also measured by mean squared error (MSE) and mean absolute percent error (MAPE), and the results showed that the proposed method outperformed the other automatic models.

INTRODUCTION Time series data forecasting is often referred to as a periodical forecasting, which involves the study of existing historical patterns and estimation of future values. This concept is classified into linear and nonlinear methods, with the popular univariate types being exponential smoothing and autoregressive integrated moving average (ARIMA) models. These methods have been reported to be successful in forecasting linear time-series data, and very poor based on designing nonlinear and complex parameters [1]. Meanwhile, the nonlinear forecasting generally provides irregular function specification requirements. Furthermore, an artificial neural network (ANN) is being introduced as a universal approach for this method, as several studies have confirmed its excellent performance in long-term forecasting, based on monthly, and quarterly time series of nonlinear data [2]- [5].
Two ANN architectural models are reported to have been widely applied to several time series forecasts, including the time-lagged feed-forward and dynamically-driven recurrent network methods. Both are found to use a time-lagged and feed-forward network architecture, with the second model also using a feedback approach [6], [7]. Based on several applications, obtaining ANN models with accurate time series forecasting performance requires intervention to set up several optimal setting parameters, including the number of lag inputs and those associated with neurons in the hidden layer. However, this situation is not practical for some applications, such as obtaining (near) real-time forecast. Based on other cases, it is possible that no one understands how to use the ANN model to forecast, as one possible solution to this issue is the automatic forecasting [8]- [10]. This only requires the supply of input, as an automatic forecasting algorithm automatically sets up the appropriate data, estimates the parameters in the model, and calculates forecasts without intervention. Based on multilayer perceptron parallel architecture without feedback (NAR model), the automatic ANN time series forecasting was first discussed in [8], and the result was further extended in [11]- [13], by implementing ensemble/combination operators. The studies in [8], [11]- [13] used the NAR model with sigmoid and linear activation functions at the hidden and output layers, with the backpropagation learning algorithm applied to update the parameters. Specifically, the non-automatic version of the forecasting algorithm based on the NAR model was observed in [1]. Forecasting time series with NAR is also found to be possible, by using multiple univariate models. The univariate aspect applies the past data from those predicted as the input, as external factors with possible effects are not allowed in the model. However, these are accommodated in the multiple univariate models, in order to ensure improved accuracy. Furthermore, the nonlinear autoregressive neural network model with exogenous input (NARX) is used as multiple univariate methods, as it is also considered a variant of NAR that utilizes external/exogenous inputs in the learning process. It generally has a more accurate forecasting capability compared to NAR, when the utilized external inputs have a strong relationship with the predicted data [14]- [16].
According to this research, an extension of the previous results in [8], [11]- [13] is being proposed on two folds. Firstly, an automatic forecasting algorithm is considered for a more general class of model, i.e., the NARX with parallel architecture without feedback. Secondly, the ensemble operators (mean, median, and mode are usable) with both logistic and tangent hyperbolic functions are also considered to activate the hidden layer of the NARX model. The descriptions and implementation details of each ensemble operator is clearly observed in [11]- [13], [17]- [20]. Also, several learning algorithms are being considered to update the parameters, namely the backprop (backpropagation), rprop+ and rprop-(resilient backpropagation with and without weight backtracking), as well as the grprop sag, and grprop slr (globally resilient backpropagation without weight backtracking and smallest absolute gradient or learning rate). The descriptions and implementation details for each learning algorithm are further shown in [21]- [25]. Based on the empirical study, the proposed automatic method is applied to forecast two real data, i.e., the Indonesian inflation and exchange rates between the Rupiah and US Dollar. Furthermore, the performance of this proposed model is compared with several available automatic methods in the literature, namely exponential smoothing (see discussion in [26]- [29]), ARIMA (see discussion in [30]- [33]), and NAR parallel architecture without feedback. The forecasting accuracy is also being measured by mean squared error (MSE) and mean absolute percent error (MAPE). This research is organized such that section 1 and 2 introduces the background of the study, as well as provide some necessary concepts while introducing the automatic NARX modeling for time series forecasting, respectively. Also, section 3 and 4 discusses the empirical results and conclusions of the study, respectively.

RESEARCH METHOD 2.1. NARX Model
The NARX model with exogenous input was reported to be very essential to the discrete-time nonlinear systems, and defined using the following mathematical relationship [34], where u(t) ∈ and y(t) ∈ indicates the input and output of the model at time t, n u ≥ 1 and n y ≥ 1 (n y ≥ n u ) represents the input and output-memory orders, w is the weights matrix, and f is the nonlinear function expected to be estimated through multilayer perceptron [35]. The NARX network was basically trained under one of two models [36]. The first model was the series-parallel architecture (or parallel architecture without feedback), where formation of the regressors was only obtained through the use of the output actual values.
❒ ISSN: 2302-9285 The second model was the parallel architecture (or parallel architecture with feedback), where the output was the feedback for the feed-forward neural input, which was part of the standard architecture, Based on being previously stated, the NARX network include the regressors of the system (inputs and outputs), with the time-series used as the output channels without measured input. Therefore, the forecasting ability of the model is limited in its application for time-series data without input regressors, due to the elimination of the tapped-delay line over the signal. This further leads to the reduction of NARX to plain time-delay neural network architecture [37]- [39], as shown in, According to [20], a simple strategy was proposed in line with the embedding theorem of Takens. This served as a solution to the problem, by providing the opportunity for the full exploitation of the actual NARX network computational abilities, towards predicting nonlinear time-series. Furthermore, the input signal regressors (u(t)) were initially defined through the delay-embedding coordinates, as shown in, where d E = n u and τ are the embedding dimension and delay, respectively. Secondly, the output signal regressors (y(t)) are presented as shown in the following relationships, due to the possibility of training the NARX network in two different architectures, where the output regressor (y(t)) for the parallel architecture without and with feedback in (6) have previous n y actual and estimated time values, respectively. These outputs were values of y(t + 1), which were previously estimated for a network that had been effectively trained. They were also required to follow the prognostic relationships applied, by using the NARX network. This is represented is being as [15], [16], Therefore, the NARX networks trained in line with (8) and (9) were represented as NARX-SP and NARX-P, respectively. However, this research focused on the model that had parallel architecture and without feedback (NARX-SP).

Automatic NARX modeling
Based on the NARX model, this section describes the automatic forecasting procedure, with the assumptions that y(t) and u(t) are the main series and external/exogenous variables to be predicted, respectively. Furthermore, several key steps contained in the algorithm were explained is being as, − Step 1: Preprocessing of the series (y(t)) started with trend, seasonality check, and seasonal difference application. The Cox-Stuart test was used to determine the trend in a time series data, based on a 12-period centred moving average. This test was carried out in order to smoothen effects, due to irregularities. Furthermore, the de-trended time series data were used to calculate and identify seasonal indices, which were analyzed to determine their significant deviations, through the use of a Friedman test. These data were also linearly scaled between −1 and +1, in order to facilitate the NARX training. − Step 2: The following strategy was used to obtain the autoregressive lags for the y(t) and u(t) series.
When the frequency of y(t) is equal to m, all products from 1 to m were considered possible lag numbers. For example, 1-4 and 1-12 were considered as quarterly and monthly data, respectively. Furthermore, the order for u(t) was assumed equal to y(t), as the significant lags were selected using the stepwise algorithm. The model also included seasonal dummy variables, due to the identification of periodical patterns. Creating additional features in form of dummy variables was also one of the methods used to capture deterministic seasonal components. The conventional approach used to model these periodical patterns were s − 1 binary dummy variables with t, which denoted the seasonal length. Meanwhile, long input vectors used additional s − 1 for high frequency data (s > 12). Two inputs including x s,1 and x s,2 were also used to encode seasonality with variables created, by using Sin(t) and Cos(t) for an explicit representation of the point within an identified seasonal length (s). Furthermore, x s,1 and x s,2 were used to determine the sine-cosine-pairs for each s, as well as the input vector for long and multiple seasonalities. − Step 3: There was also procedural division into training and testing parts, where 80% and 20% of the data processes were used respectively. The mean squared error (MSE) of training data was checked, when the value of neurons was set at 1 towards the maximum lag observed (as the total input in step 2 plus 2). Furthermore, the optimal number of neurons in the hidden layer was defined to be the value providing the minimum MSE. The number of neurons in this layer was also experimentally identified for each time series. Also, the maximum lag (as the total input in step 2 plus 2) neurons in the hidden layer were evaluated for each time series, as the values that minimized the MSE validation were selected. − Step 4: The NARX model obtained in step 3 were fitted 20 times using different random starting weights, and the forecasts obtained were combined using the ensemble operator approach (mean, median, and mode are usable), in order to produce the final product. Based on avoiding local minima and providing an adequate error distribution using sufficient results, each NARX candidate was initialized 20 times with random starting weights in the interval of [−1, +1]. − Step 5: The recursive or iterative strategy was used for multi-step ahead forecasts.
Further details that also showed the applicability of the above algorithm for the automatic NAR model, were provided in [8], [11]- [13]. Meanwhile, the non-automatic version of the forecasting algorithm based on the model was observed in [1].

RESULT AND DISCUSSION
This research was conducted using two real cases, with the first being the inflation rate data in Indonesia, with the external/exogenous variable being the interest of the Indonesian Central Bank. The second case was the exchange rate data for the Indonesian rupiah against the US dollar, with the external variable being the composite stock price index.

Inflation rate data
The Indonesian monthly inflation rate data from January 2007 to February 2018 that contained 134 observations, were used, with the initial 129 and final 5 applied for training, and testing, respectively. Based on simple terms, inflation is understood as a persistent, and continuous rise across a broad spectrum of prices. The investigation on forecasting inflation in a specific country had received significant attention from several macroeconomics experts. Based on most central banks, one of the monetary policy objectives was inflation. Monetary policy also needs to consider future inflation, due to the occurrence of typical time lags. Furthermore, the current inflation levels that were the result of past policies, should provide only incomplete information. Therefore, the forecasts that linked future inflation to current developments were found to bridge this gap. This study attempts to develop an inflation forecasting model for Indonesia, which serves as an input for policy setting in Bank Indonesia (BI). Based on evaluating the accuracy of the models, two forecast error measurements were used, namely the mean squared error (MSE) and mean absolute percent error (MAPE). The MSE and MAPE are further defined is being as, where A t and F t are actual and forecast values at data time t, e t is the error at data time t, and N is the number of data. Thirty models were considered based on the combination of ensemble operators, activation functions, and algorithm types, in order to calculate weight networks applied in the automatic algorithms. The ensemble operators were mean, median, and mode, while the two activation functions compared were logistic and hyperbolic-tangent. Furthermore, the five utilized algorithm types included backpropagation (backprop), ❒ ISSN: 2302-9285 resilient backpropagation with weight backtracking (rprop+), resilient backpropagation without weight backtracking (rprop-), globally resilient backpropagation without weight backtracking and smallest absolute gradient (grprop sag), and globally resilient backpropagation without weight backtracking and smallest learning rate and (grprop slr). The summary of the forecasting performance was shown in Table 1, and the automatic NARX model that combined the median ensemble operator, hyperbolic-tangent activation function, and rprop+ was observed to have produced the smallest MSE and MAPE values. There was also a performance comparison of the proposed methods with the other forecasting models, including automatic exponential smoothing (see discussion in [26]- [29]), automatic ARIMA (see discussion in [30]- [33]), and automatic NAR parallel architecture without feedback (see discussion in [8], [11]- [13]), respectively. Furthermore, the plot of in-sample fitting and out-sample forecasts for inflation rate data were shown in Figure 1, by using considered automatic algorithms. It was also observed that all the considered methods relatively performed accurately for modeling the data, as differences were hardly detected. However, the numerical summary presented in Table 2 showed that the proposed automatic NARX method outperformed other available methods, both in the training and testing data.  Figure 1. Plot of real data, in-sample fitting, and out-sample forecast using several automatic algorithms

Exchange rate data
There were 198 monthly observations for the exchange rate data from January 2003 to June 2019, with the initial 188 and final 10 applied for training and testing, respectively. Based on finance, an exchange rate is the value at which one currency is transformed for another. It is also regarded as the value of one country's currency with another. Also, the exchange rate describes the price of one currency in terms of another. This price is found to be essential for the government or company, especially when the business extends over different countries or firms. Furthermore, the exchange rate forecasting is an essential input for the decision-making management of exposure or hedging strategies.
This structure was similar as observed in the first cases. The summary of the forecasting performance was shown in Table 1, as the automatic NARX model with mean ensemble operator, hyperbolic-tangent activation function, and rprop+ produced the smallest MSE and MAPE values. This method was further compared with other automatic models, and the numerical performance was summarized in Table 3. Furthermore, the plots of in-sample fitting and out-sample forecasts that used all considered automatic algorithms were omitted. However, they showed that all methods were reasonably performed. Based on the results in Table 3, the automatic NARX method was superior to other models considered in the study.

CONCLUSION
This study already proposed an automatic forecasting method of univariate time series, by using the nonlinear autoregressive neural network model with exogenous input (NARX). The automatic algorithm only allowed the supply of input data, as a forecasting algorithm automatically sets up the appropriate data, estimated the parameters in the model, and calculated forecasts without intervention. Furthermore, the empirical studies conducted showed that the automatic NARX models outperformed the other available methods in the literature,

❒
ISSN: 2302-9285 by using two monthly series data. However, further research needs to be conducted based on checking and improving the effectiveness of the method, in order to forecast different types of time series data.