[PYTHON] Time series analysis Part 3 Forecast

1. Overview

2. Data

Continuing from the previous time, we will use TOPIX historical data and monthly foreign visitor data. save.png save.png

3. Prediction of AR process

In the following predictions, the prediction that minimizes the mean square error (MSE) is the optimum prediction. AR(p):\quad y_t=c+\phi_1y_{t-1}+\phi_2y_{t-2}+\cdots+\phi_py_{t-p}+\epsilon_t,\quad \epsilon_t\sim W.N.(\sigma^2) In predicting the AR process, a sequential prediction approach using the following properties is common. \left\\{\begin{array}{ll} E(y_\tau|\Omega_t)=y_\tau,&\tau\leq t \\\E(\epsilon_{t+k}|\Omega_t)=0,&k>0\end{array}\right. However, $ \ Omega_t = \ {y_t, y_ {t-1}, \ cdots, y_1 \} $

At this time, the optimal one-term forecast is \hat y_{t+1|t}=c+\phi_1y_t+\phi_2y_{t-1}+\cdots+\phi_py_{t-p+1} And MSE MSE(\hat y_{t+1|t})=E(\epsilon_{t+1}^2)=\sigma^2 Will be. Next, the optimal two-term forecast is \begin{split}\hat y_{t+2|t}&=c+\phi_1\hat y_{t+1|t}+\phi_2y_t+\cdots+\phi_py_{t-p+2}+\epsilon_{t+2}\\\ &=(1+\phi_1)c+(\phi_1^2+\phi_2)y_t+(\phi_1\phi_2+\phi_3)y_{t-1}+\cdots\\phi_1\phi_py_{t-p+1}\end{split} And at this time MSE MSE(\hat y_{t+2|t})=E(\epsilon_{t+2}+\phi_1\epsilon_{t+1})^2=(1+\phi_1^2)\sigma^2 Will be. In this way, the h-term forecast is sequentially obtained.

The above is the point prediction of the AR process, but the interval prediction is as follows.

Let's consider a 95% interval forecast one period ahead. When $ y \ sim N (\ mu, \ sigma ^ 2) $, P(-1.96\leq\frac{y-\mu}{\sigma}\leq1.96)=0.95 P(\mu-1.96\sigma\leq y\leq\mu+1.96\sigma)=0.95 Is established. Where the conditional distribution of $ y_ {t + 1} $ is N(\hat y_{t+1|t},MSE(\hat{y}_{t+1|t})) Therefore, the 95% interval forecast for the next period is

\bigl( \hat y_{t+1|t}-1.96\sqrt{MSE(\hat y_{t+1|t})},\; \hat y_{t+1|t}+1.96\sqrt{MSE(\hat{y}_{t+1|t})}\bigr) Will be.

In general, it is difficult to obtain the h-term MSE of $ AR (p) $, and a method of approximating by simulation is used.

4. Prediction of MA process

If there are an infinite number of observations, the invertable MA process is y_t = \sum_{k=1}^{\infty}\eta_ky_{t-k}+\epsilon_t And because it can be rewritten to $ AR (\ infinty) $, the optimal prediction for $ MA (q) $ is (1) Optimal prediction up to the $ q $ period depends on all observed values $ \ Omega_t $. (2) $ q + 1 $ Forecasts beyond the period are simply equal to the expected value of the process. (3) Forecasts of $ q + 1 $ and beyond, the MSE is equal to the variance of the process. It has the property of.

On the other hand, even if there are only a finite number of observed values, predictions beyond the $ q $ period are expected values of the process, and MSE is the variance of the process. Forecasts up to the $ q $ period are generally made assuming $ \ epsilon = 0 $ before the sample period.

5. Prediction of ARMA process

The ARMA process prediction is a combination of the AR process and MA process predictions.

Below, we will try to predict the ARMA process using the data on the number of foreign visitors to Japan, which was also used in Part 2 (https://qiita.com/asys/items/622594cb482e01411632).

save.png

In Part 2, I know that $ p = 4, q = 1 $ looks good, so I will use it. This time, out of the total 138 data, the first 100 will be used for model construction, and the remaining 38 will be predicted. You can easily get a prediction by using the predict function as follows.

arma_model = sm.tsa.ARMA(v['residual'].dropna().values[:100], order=(4,1))
result = arma_model.fit()
pred = result.predict(start=0,end=138)
arma_model = sm.tsa.ARMA(v['residual'].dropna().values[:100], order=(4,1))
result = arma_model.fit()
pred = result.predict(start=0,end=138)
pred[:100] = np.nan
plt.figure(figsize=(10,4))
plt.plot(v['residual'].dropna().values, label='residual')
plt.plot(result.fittedvalues, label='ARMA(4,1)')
plt.plot(pred, label='ARMA(4,1) pred', linestyle='dashed', color='magenta')
plt.legend()
plt.grid()
plt.title('ARMA(4,1) prediction')
plt.show()

save.png

The prediction is a combination of the AR process and the MA process, and it is intuitively understood and consistent that the accuracy decreases as the prediction period becomes longer. On the other hand, the prediction accuracy for the first and second terms is not bad. How to use forecasts will vary greatly depending on the purpose, but for example, in the stock market, the number of monthly foreign visitors to Japan will affect the subsequent price movements of inbound stocks, so predict the number before publication and position It can be used to take. In this case, all that is needed is a forecast one period ahead, and we are only interested in the accuracy of the forecast one period ahead. So, if you look at the prediction accuracy one period ahead,

res_arr = []
for i in range(70,138):
    arma_model = sm.tsa.ARMA(v['residual'].dropna().values[:i], order=(4,1))
    result = arma_model.fit()
    pred = result.predict(i)[0]
    res_arr.append([v['residual'].dropna().values[i], pred])
res_arr = np.array(res_arr)

sns.regplot(x=res_arr[:,0], y=res_arr[:,1])
plt.xlabel('observed')
plt.ylabel('predicted')
plt.show()

save.png

Therefore, although there is a positive correlation, there is a large variation, and it is a level that hesitates to put it into actual battle.

Recommended Posts

Time series analysis Part 3 Forecast
Time series analysis part 4 VAR
Time series analysis Part 1 Autocorrelation
Time series analysis practice sales forecast
Python: Time Series Analysis
RNN_LSTM1 Time series analysis
Time series analysis 1 Basics
Time series analysis related memo
Python: Time Series Analysis: Preprocessing Time Series Data
Time series analysis 3 Preprocessing of time series data
Challenge to future sales forecast: ② Time series analysis using PyFlux
Challenge to future sales forecast: ⑤ Time series analysis by Prophet
Time series analysis 2 Stationary, ARMA / ARIMA model
Time Series Decomposition
Time series analysis 4 Construction of SARIMA model
Time series analysis # 6 Spurious regression and cointegration
pandas series part 1
Python: Time Series Analysis: Building a SARIMA Model
Python: Time Series Analysis: Stationarity, ARMA / ARIMA Model
Kaggle ~ Housing Analysis ③ ~ Part1
Python time series question
Challenge to future sales forecast: ④ Time series analysis considering seasonality by Stats Models
Display TOPIX time series
Time series plot / Matplotlib
Python 3.4 Create Windows7-64bit environment (for financial time series analysis)
Python: Stock Price Forecast Part 2
[Python] Plot time series data
Wrap analysis part1 (data preparation)
Challenges for future sales forecasts: (1) What is time series analysis?
Python: Stock Price Forecast Part 1
[Statistics] [Time series analysis] Plot the ARMA model and grasp the tendency.
Calculation of time series customer loyalty
Easy time series prediction with Prophet
Time series plot started ~ python edition ~
About time series data and overfitting
Japanese analysis processing using Janome part1
Differentiation of time series data (discrete)
Movement statistics for time series forecasting
LSTM (1) for time series forecasting (for beginners)
Multidimensional data analysis library xarray Part 2
Power of forecasting methods in time series data analysis Semi-optimization (SARIMA) [Memo]
Instantly illustrate the predominant period in time series data using spectrum analysis