Prediction of passenger train using fuzzy time series and percentage change methods

Received Jan 21, 2021 Revised Aug 19, 2021 Accepted Oct 15, 2021 In the subject of railway operation, predicting railway passenger volume has always been a hot topic. Accurately forecasting railway passenger volume is the foundation for railway transportation companies to optimize transit efficiency and revenue. The goal of this research is to use a combination of the fuzzy time series approach based on the rate of change algorithm and the Holt double exponential smoothing method to forecast the number of train passengers. In contrast to prior investigations, we focus primarily on determining the next time period in this research. The fuzzy time series is employed as the forecasting basis, the rate of change is used to build the set of universes, and the Holt's double exponential smoothing method is utilized to forecast the following period in this case study. The number of railway passengers predicted for January 2020 is 38199, with a tiny average forecasting error rate of 0.89 percent and a mean square error of 131325. It can also help rail firms identify future passenger needs, which can be used to decide whether to expand train cars or run new trains, as well as how to distribute tickets.


INTRODUCTION
Rail transit is a very viable option for meeting public transportation needs. The demand for a more efficient transportation system is growing. Transportation services are a fast-growing industry in a developing country like Indonesia. The planning and management of genuine railway business resources determine the quality of transportation services. Better serve the community and deal with rising transportation costs. Predicting passenger volume is very important in the field of rail transportation [1]. The key to increasing the operating efficiency and economic income of rail transport companies is the accurate and timely projection of the volume of rail passengers [1]. Accurate transportation volume predictions are critical for formulating strategies for future rail transportation growth, investment, and facility efficiency [2], as well as for local economic development, resource allocation, and cost reduction [3]. It also forms the basis for rail transport companies to determine whether to operate new trains [4], as well as how to allocate tickets [5] and taking ticket prices into consideration [6]. Prediction of the volume of train passengers on a large scale, not only includes predictions of passengers in one area but also passengers in all regions.
The requirement for public transportation services may be controlled sensibly by offering effective ground transportation services, therefore accurate forecasting is critical for every railway firm organization.
Let ( ) = and ( − 1) = . The relationship between ( ) and ( − 1) (referred to as a fuzzy logical relationship, FLR) can be denoted by → ; where is called the left-hand side (LHS) and the right-hand side (RHS) of the FLR. Definition 3. Given two FLR on the LHS with the same fuzzy sets, → 1 , → 2 . Both FLR can be combined into FLRG (fuzzy logical relationship groups) → 1 , 2 . In FTS theory, the discretization process reduces the complexity of the discourse world. This approach is typically used as a first step in preparing the universe of speech for numerical evaluation by tying events from different time periods together. Differences in time series data have been employed as the universe of discourse in several forecasting systems [46]. Time series data differences can improve forecasting accuracy. However, estimates of growing and decreasing rates of time series data cannot be made solely on the basis of disparities. As a result, the universe of discourse in our method is defined as the percentage of change (PoC) from time t to time + 1.
As ( + 1) = ( ( + 1) − ( )) ( ), where ( + 1) is the value at time + 1 index and ( ) is the actual value at time t index, the event discretization function can be defined in such a way that its value at time t index correlates with the occurrence of the event at a specific time in the future. PoC is the percentage change in value from time t to time + 1. Example: The PoC of period 2012/2 is calculated as (9515-10223)\10223, which equals -6.93 percent, as shown in Table 1. The PoC for the following year/month is calculated in the same way.

Procedure for dividing frequency density
We changed the approach for dividing the frequency density [8], [9], [43], [44] in this session to: − Calculate the number of PoCs that fall in each interval. − Determine the ranking based on the number of frequencies. − Divide the interval by the biggest ranking minus one to find the interval. − In the same manner, repeat for the next interval. Table 2 shows sample data at intervals along the number of PoC. In Table 2, the interval {-15,-10} has the highest PoC frequency. It is subdivided into three parts: {-10, -8.33}, {-8.33, -6.67}, and {-6.67, -5}. Furthermore, the interval {-10, -5} is the interval with the next highest frequency of data. It will be separated into two sections: {-10, -7.5} and {-7.5, -5}. After that, leave the intervals {-5, 0} and {0, 5} unaltered.

Define fuzzy set based on triangular membership function
Based on the interval produced using the triangular membership function, defining fuzzy set = 1, 2, 3, 4, . . . , . Then, to calculate the anticipated value of the percentage change, find the mean value at the interval obtained. Then, using (2), estimate the percentage change data using the triangle membership function. (2) where −1 , , +1 are the mean of the fuzzy intervals of − 1, , + 1 respectively. generates prediction of the percentage change in the number of train passengers from month to month.

Determine the prediction for the next time period +
The combination of methods using the DES Holt approach. The DES is a popular technique for predicting the trend of time series data using simple linear equations in business and economics [47]. Introduction A class of forecasting algorithms is described by the exponential smoothing (ES) method [48]. In corporate forecasting, ES is the most used family of forecasting models [49]. The double exponential smoothing (DES) is a trend time series extension of the exponential smoothing (ES) [50]. The calculate prediction for the next time period + 1 as shown in (3)-(7): where: =Actual data at time t ′ = Single smoothing value =Smoothing trend , =Smoothing parameter between 0 − 1 + =Forecast value =Future period

Steps in the algorithm
Historical data and graphs of the number of train passengers from January 2006 to December 2019 obtained from the statistics central agency (BPS) are shown in Figure 1. In order to solve the prediction issue in this case of the number of train passengers using the FTS and percentage change methods, the steps are carried out in 7 steps. Step 2: determining the set of universe by: a. Calculating the real data's percentage change on the number of train passengers using (8).
. Determining LL and UL from the results of the percentage change, then the obtained value of is −21.4255 and 23.5273. Thus can be determined using (9).
The values of 1 and 2 are positive integers to assist in defining the set of universe, so that the set of universes is defined = [−23.00, 25.00]. c. Forming an interval class by calculating the number of intervals using (10).
n=number of percentage change of data. Step 3: based on the result of forming the interval class on the set of universe, then the frequency of the percentage change of data included in each of these intervals was calculated and ranked based on the frequency, as shown in Table 3. Step 4: determining each fuzzy set based on the divided interval and fuzzification of the historical data of the number of train passengers, where the fuzzy set shows the linguistic value from month to month of the percentage change of data represented by the fuzzy set. Dividing the length of the interval based on the ranking of the data with the largest to the smallest frequency, for example = the largest frequency rating. The length of the interval is 6. 00, the ranking that is at the greatest frequency is 8, then for the first interval it is divided into − 1 = 8 − 1 = 7 intervals with the same interval length, namely 6.00/7 = 0.8571. The second interval is divided into − 2 = 8 − 2 = 6 intervals with the same interval length, namely 6.00/6 = 1.00. The third interval is divided into − 3 = 8 − 3 = 5 intervals with the same interval length, namely 6.00/5 = 1.20 and so on until the ninth last interval. The total number of intervals obtained becomes 29 interval classes. Then determining the mean value of each interval class as shown in Table 4.
Step 5: defuzzifying the fuzzy data shown in Table 5 (in Appendix).
Step 6: determining the value of the data based on the results of forecasting → ( ) where ( ) is the forecasting value of the data percentage change. The (12) is used to determine ( ). The results of ( ) are shown in Table 5.
where: −1 =actual data to − 1 whereas for + 1 forecasting used the classic double exponential smoothing holt (DES Holt) forecasting method with = 0.38 and = 0.01, the value of data smoothing in December 2019 was 36766 while the trend smoothing value was 145 using formula 3, 4, 5, 6, and 7. So that the forecast value for January 2020 is: Step 7: calculating the average forecasting error rate (AFER) and mean square error (MSE) [44] between real data and predicted results, namely the formulas 13 and 14 shown in Table 5 and Figure 2.

CONCLUSION
The use of FTS and PC techniques, as well as a combination of DES Holt, has proven to be useful in predicting the number of railway passengers over the next time period. This may be seen in the prediction results for January 2020, which are 38199, with AFER=0.89 percent and MSE=131325. It can be utilized as decision assistance for railway management based on the aforesaid predicted results. Based on the aforementioned forecast results, railway management can use it as decision support to develop policies for the future period in terms of planning, resources, setting departure schedules, deciding ticket prices, adding train carriages, and adding tickets. Researchers will adapt these methods in the future to handle predictions in the same case study by constructing web-based applications and/or employing additional methods to anticipate data based on different intervals in order to improve accuracy.