[PYTHON] Time series analysis Part 2 AR / MA / ARMA

1. Overview

2. Data

Continuing from the previous time, we will use TOPIX historical data. save.png In addition, I felt that there was data that seemed to have autocorrelation, so I prepared data on the number of foreign visitors to Japan every month. I used the one on the website of JTB Research Institute linked below. https://www.tourism.jp/tourism-database/stats/inbound/ save.png At first glance, the data is seasonal, and the yen is depreciating in Abenomics, so the overall trend is upward.

  1. Moving Average Process Let's start with the first MA process. MA(1):y_t=\mu+\epsilon_t+\theta_1\epsilon_{t-1}, \quad \epsilon_t \sim W.N.(\sigma^2) Let's decide some parameters and plot them. For the time being, set $ \ sigma = 1.0 $ and use numpy's random.randn.
wn = np.random.randn(50)
mu = 0.0
theta_1 = 1.0
map = mu + wn[1:] + theta_1 * wn[:-1]

The data was generated like this. save.png Considering the green line $ \ theta_1 = 0.0 $ as a reference, it is easy to understand the influence of the sign and value of $ \ theta_1 . The blue line ( \ theta_1 = 1.0 ) and the red line ( \ theta_1 = -1.0 ) are located on opposite sides of the green line, and the orange line ( \ theta_1 = 0.5 $) is blue. It is moving between the line and the green line.

The main statistics are as follows. mean : \quad E(y_t)=\mu variance : \quad \gamma_0=Var(y_t)=(1+\theta_1^2)\sigma^2 first-order auto-covariance : \quad \gamma_1=Cov(y_t,y_{t-1}) = \theta_1\sigma^2 first-order auto-correlation : \quad \rho_1=\frac{\gamma_1}{\gamma_0}=\frac{\theta_1}{1+\theta_1^2}

Check the correlogram. In the plot above, the number of data is 50, but here it is created with 1,000 data. save.png

Up to this point, it's an intuitive image. Only the first-order autocorrelation is significant, and when $ \ theta_1 <0 $, the autocorrelation is a negative value.

Next, let's look at a generalized version of the MA process. MA(q):y_t=\mu+\epsilon_t+\theta_1\epsilon_{t-1}+\theta_2\epsilon_{t-2}+\cdots+\theta_q\epsilon_{t-q}, \quad \epsilon_t \sim W.N.(\sigma^2)

The main statistics are as follows. \quad E(y_t)=\mu \quad \gamma_0=Var(y_t)=(1+\theta_1^2+\theta_2^2+\cdots+\theta_q^2)\sigma^2 \quad\gamma_k=\left\\{\begin{array}{ll}(\theta_k+\theta_1\theta_{k+1}+\cdots+\theta_{q-k}\theta_q)\sigma^2,& 1\leqq k\leqq q\\\0, & k\geqq q+1\end{array}\right. $ \ quad $ MA process is always steady \quad\rho_k=\left\\{\begin{array}{ll}\frac{\theta_k+\theta_1\theta{k+1}+\cdots+\theta_{q-k}\theta_q}{1+\theta_1^2+\theta_2^2+\cdots+\theta_q^2},&1\leqq k\leqq q\\\0,&k\geqq q+1\end{array}\right.

[Supplement] What is stationarity? If less than $ \ quad $ holds, the process is weak stationarity. \qquad \forall t,k\in\mathbb{N},\quad E(y_t)=\mu \land Cov(y_t,t_{t-k})=E[(y_t-\mu)(y_{t-k}-\mu)]=\gamma_k $ \ quad $ Also, at this time, autocorrelation \qquad Corr(y_t,y_{t-k})=\frac{\gamma_kt}{\sqrt{\gamma_{0,t}\gamma_{0,t-k}}}=\frac{\gamma_k}{\gamma_0}=\rho_k $ \ Quad $ holds. $ \ quad $ Both autocovariance and autocorrelation depend only on the time difference $ k $. $ \ Quad $ Also, $ \ gamma_k = \ gamma_ {-k} $ and $ \ rho_k = \ rho_ {-k} $ are established.

  1. Autoregressive Process Next is the AR process. First of all, from the primary AR process. AR(1):y_t=c+\phi_1y_{t-1}+\epsilon_t,\quad \epsilon_t\sim W.N.(\sigma^2)

Let's plot it. Here, $ y_0 = 0 $, $ c = 1 $, and $ \ sigma ^ 2 = 1 $.

AR1 = []
y = 0
c = 1
phi_1 = 0.5
for i in range(200):
    AR1.append(y)
    y = c + phi_1*y + np.random.randn()

save.png

\phi_1=1.0At that time, it can be visually understood that the process is not steady. In fact,|\phi_1|\leq1At this time, the process becomes steady. As for the value of $ \ phi_1 $, it can be seen that the smaller the value, the lower the average value, and the smaller the value, the higher or lower the average value.

The main statistics are as follows. However,|\phi_1|\leq1soyIs steady. \quad E(y_t)=c+\phi_1E(y_{t-1})=\frac{c}{1-\phi_1} \quad Var(y_t)=\phi_1^2Var(y_{t-1})+\sigma^2=\frac{\sigma^2}{1-\phi_1^2} \quad \gamma_k=\phi_1\gamma_{k-1} $\quad \rho_k=\phi_1\rho_{k-1}\qquad\cdots $(Yule-Walker Equation)

Next is the generalized AR process. AR(p):y_t=c+\phi_1y_{t-1}+\phi_2y_{t-2}+\cdots+\phi_py_{t-p}+\epsilon_t, \quad \epsilon_t\sim W.N.(\sigma^2)

The main statistics are as follows. \quad \mu=E(y_t)=\frac{c}{1-\phi_1-\phi_2-\cdots-\phi_p} \quad \gamma_0=Var(y_t)=\frac{\sigma^2}{1-\phi_1\rho_1-\phi_2\rho_2-\cdots-\phi_p\rho_{p}} \quad \gamma_k=\phi_1\gamma_{k-1}+\phi_2\gamma_{k-2}+\cdots+\phi_p\gamma_{k-p},\quad k\geqq1 \quad \rho_k=\phi_1\rho_{k-1}+\phi_2\rho_{k-2}+\cdots+\phi_p\rho_{k-p},\quad k\geqq1,\quad\cdots(Yule-Walker Equation) The autocorrelation of the $ \ quad $ AR process decays exponentially.

The correlogram is as follows. save.png

  1. Autoregressive Moving Average Process The general terms of the ARMA process, which combines the AR and MA processes, are given below. ARMA(p,q):y_t=c+\phi_1y_{t-1}+\cdots+\phi_py_{t-p}+\epsilon_t+\theta_1\epsilon_t-1+\cdots+\theta_q\epsilon_{t-q},\quad \epsilon_t\sim W.N.(\sigma^2)

$ ARMA (p, q) $ has the following properties. \quad \mu=E(y_t)=\frac{c}{1-\phi_1-\phi_2-\cdots-\phi_q} \quad \gamma_k=\phi_1\gamma_{k-1}+\phi_2\gamma_{k-2}+\cdots+\phi_p\gamma_{k-p},\quad k\geqq q+1 \quad \rho_k=\phi_1\rho_{k-1}+\phi_2\rho_{k-2}+\cdots+\phi_p\rho_{k-p},\quad k\geqq q+1 The autocorrelation of the $ \ quad $ ARMA process decays exponentially.

Let's actually plot $ ARMA (3,3) $. The parameters are as follows. c=1.5, \phi_1=-0.5, \phi_2=0.5,\phi_3=0.8, \theta_1=0.8, \theta_2=0.5, \theta_3=-0.5, \sigma=1

arma = [0,0,0]
wn = [0,0,0]
c = 1.5
phi_1 = -0.5
phi_2 = 0.5
phi_3 = 0.8
theta_1 = 0.8
theta_2 = 0.5
theta_3 = -0.5
for i in range(1000):
    eps = np.random.randn()
    y = c + phi_1*arma[-1] + phi_2*arma[-2] + phi_3*arma[-3] + eps + theta_1*wn[-1] + theta_2*wn[-2] + theta_3*wn[-3]
    wn.append(eps)
    arma.append(y)

save.png

Stationarity (This is the same as the steady state condition of the AR process.) AR characteristic equation : 1-\phi_1z-\cdots-\phi_pz^p=0 The AR process is steady when the absolute values of all the solutions of this AR characteristic equation are greater than 1.

Reversability The MA process is reversible when the MA process can be rewritten into an AR (∞) process. MA characteristic equation : 1+\theta_1z+\theta_2z^2+\cdots+\theta_pz^p=0 When all the solutions of this MA characteristic equation are greater than 1, the MA process is invertible. Invertable MA processes are desirable for model selection.

6. Estimating the ARMA model

From here, let's estimate the parameters for $ ARMA (3,3) $ actually created in 5. The statsmodels library is used because it is complicated to calculate by hand. See the URL below for details. https://www.statsmodels.org/stable/generated/statsmodels.tsa.arima_model.ARMA.html It seems that if you give the sample sample and the order of the model with order = (p, q) to the argument of ARMA, the parameter will be calculated by maximum likelihood estimation.

arma_model = sm.tsa.ARMA(arma, order=(3,3))
result = arma_model.fit()
print(result.summary())
                              ARMA Model Results                              
==============================================================================
Dep. Variable:                      y   No. Observations:                 1003
Model:                     ARMA(3, 3)   Log Likelihood               -1533.061
Method:                       css-mle   S.D. of innovations              1.113
Date:                Sun, 08 Dec 2019   AIC                           3082.123
Time:                        14:51:34   BIC                           3121.409
Sample:                             0   HQIC                          3097.052
                                                                              
==============================================================================
                 coef    std err          z      P>|z|      [0.025      0.975]
------------------------------------------------------------------------------
const          6.8732      0.410     16.773      0.000       6.070       7.676
ar.L1.y       -0.4986      0.024    -21.149      0.000      -0.545      -0.452
ar.L2.y        0.5301      0.019     27.733      0.000       0.493       0.568
ar.L3.y        0.8256      0.020     41.115      0.000       0.786       0.865
ma.L1.y        0.7096      0.036     19.967      0.000       0.640       0.779
ma.L2.y        0.3955      0.039     10.176      0.000       0.319       0.472
ma.L3.y       -0.4078      0.033    -12.267      0.000      -0.473      -0.343
                                    Roots                                    
=============================================================================
                  Real          Imaginary           Modulus         Frequency
-----------------------------------------------------------------------------
AR.1            1.0450           -0.0000j            1.0450           -0.0000
AR.2           -0.8436           -0.6689j            1.0766           -0.3933
AR.3           -0.8436           +0.6689j            1.0766            0.3933
MA.1           -0.6338           -0.8332j            1.0469           -0.3535
MA.2           -0.6338           +0.8332j            1.0469            0.3535
MA.3            2.2373           -0.0000j            2.2373           -0.0000
-----------------------------------------------------------------------------

And so on. We are able to estimate values that are reasonably close to the original parameters. A few notes, S.D. of innovations represents the dispersion of white noise, and to find the value of c from const,

print(result.params[0] * (1-result.arparams.sum()))
0.9824998883509097

Need to do as.

7. ARMA model selection

To select $ ARMA (p, q) $, first conservatively narrow down the $ p, q $ values from the autocorrelation and partial autocorrelation values, and then determine the optimal model based on the information criterion. take.

First, the k-th-order partial autocorrelation removes the influence of $ y_ {t-1}, \ cdots, y_ {t-k + 1} $ from $ y_t $ and $ y_ {tk} $. It is defined as the correlation of things.

\left\\{\begin{array}{lll}MA(q):&y_t=\mu+\epsilon_t+\theta_1\epsilon_{t-1}+\theta_2\epsilon_{t-2}+\cdots+\theta_q\epsilon_{t-q}\\\AR(p):&y_t=c+\phi_1y_{t-1}+\phi_2y_{t-2}+\cdots+\phi_py_{t-p}+\epsilon_t\\\ARIMA(p,q):&y_t=c+\phi_1y_{t-1}+\cdots+\phi_py_{t-p}+\epsilon_t+\theta_1\epsilon_t-1+\cdots+\theta_q\epsilon_{t-q} \end{array}\right.

Considering the above definition, the properties shown in the table below are clear.

model Autocorrelation 偏Autocorrelation
AR(p) Decaying p+1After the next0
MA(q) q+1After the next0 Decaying
ARMA(p,q) Decaying Decaying

This makes it possible to narrow down the model to some extent by looking at the structure of sample autocorrelation and sample partial autocorrelation.

Next is the information criterion (IC). \quad IC=-2\mathcal{L}(\hat{\Theta})+p(T)k \quad where T : number of samples, k : number of parameters There are two commonly used information criteria: $ \ left \ {\ begin {array} {ll} Akaike's Information Criterion (AIC): \ quad & AIC = -2 \ mathcal (L) (\ hat {\ Theta}) + 2k \\ Schwarz Information Criterion (AIC) SIC): & SIC = -2 \ mathcal (L) (\ hat {\ Theta}) + \ log (T) k \ end {array} \ right. $ SIC tends to choose a smaller model, but if the sample size is sufficient, the maximum likelihood estimator of the parameter of the excessive part of AIC may converge to 0, so it is a general rule to say which is the better criterion. It is not possible.

8. Modeling the actual data

We have prepared data on TOPIX and the number of foreign visitors to Japan, but both are in the shape of not being steady. Therefore, I had no choice but to create data that excludes trends by subtracting the moving average for the 1996-2007 portion of the number of foreign visitors to Japan.

visit.head(3)
month visits
0 1996-01-01 276086
1 1996-02-01 283667
2 1996-03-01 310702
visit['trend'] = visit['visits'].rolling(window=13).mean().shift(-6)
visit['residual'] = visit['visits'] - visit['trend']
v = visit[visit['month']<'2008-01-01']
plt.plot(v['month'].values, v['visits'].values, label='original')
plt.plot(v['month'].values, v['trend'].values, label='trend')
plt.plot(v['month'].values, v['residual'].values, label='residual')
plt.legend()
plt.grid()
plt.title('Foreign Visitors')
plt.show()

save.png As a trend, I decided to take a moving average of 6 months before and after. This is processed by rolling (window = 13) .mean () and shift (-6).

First, let's look at autocorrelation and partial autocorrelation. Very easy with the library.

fig = plt.figure(figsize=(8,5))
ax1 = fig.add_subplot(211)
fig = sm.graphics.tsa.plot_acf(v['residual'].dropna().values, lags=40, ax=ax1)
ax2 = fig.add_subplot(212)
fig = sm.graphics.tsa.plot_pacf(v['residual'].dropna().values, lags=40, ax=ax2)
plt.tight_layout()
fig.show()

save.png

There is a strong seasonality that cannot be hidden. .. .. The correlation every 12 months is very strong. You can use statsmodels' arma_order_select_ic to estimate the optimal order of the ARMA model. For details, go to this link (https://www.statsmodels.org/dev/generated/statsmodels.tsa.stattools.arma_order_select_ic.html)

sm.tsa.arma_order_select_ic(v['residual'].dropna().values, max_ar=4, max_ma=2, ic='aic')

Here, for $ ARMA (p, q) $, the AIC criterion is used with the maximum value of $ p $ being 4 and the maximum value of $ q $ being 2.

0 1 2
0 3368.221521 3363.291271 3350.327397
1 3365.779849 3348.257023 3331.293926
2 3365.724955 3349.083663 3328.831252
3 3361.660853 3347.156390 3329.447773
4 3332.100417 3290.984260 3292.800604

I feel that none of them change much, but the result is that $ p = 4, q = 1 $ is good.

arma_model = sm.tsa.ARMA(v['residual'].dropna().values, order=(4,1))
result = arma_model.fit()
plt.figure(figsize=(10,4))
plt.plot(v['residual'].dropna().values, label='residual')
plt.plot(result.fittedvalues, label='ARMA(4,1)')
plt.legend()
plt.grid()
plt.title('ARMA(4,1)')
plt.show()

save.png

It's like this. Personally, I felt that he was doing his best despite being a relatively simple model.

Recommended Posts

Time series analysis Part 2 AR / MA / ARMA
Time series analysis part 4 VAR
Time series analysis Part 3 Forecast
Time series analysis Part 1 Autocorrelation
Time series analysis 2 Stationary, ARMA / ARIMA model
Python: Time Series Analysis
RNN_LSTM1 Time series analysis
Time series analysis 1 Basics
Python: Time Series Analysis: Stationarity, ARMA / ARIMA Model
Time series analysis related memo
Python: Time Series Analysis: Preprocessing Time Series Data
Time series analysis practice sales forecast
Time series analysis 3 Preprocessing of time series data
[Statistics] [Time series analysis] Plot the ARMA model and grasp the tendency.
Time series analysis 4 Construction of SARIMA model
Time series analysis # 6 Spurious regression and cointegration
Python: Time Series Analysis: Building a SARIMA Model
Time Series Decomposition
pandas series part 1
Python 3.4 Create Windows7-64bit environment (for financial time series analysis)
Kaggle ~ Housing Analysis ③ ~ Part1
Python time series question
Display TOPIX time series
Time series plot / Matplotlib
Challenge to future sales forecast: ② Time series analysis using PyFlux
A study method for beginners to learn time series analysis
Challenge to future sales forecast: ⑤ Time series analysis by Prophet
Challenges for future sales forecasts: (1) What is time series analysis?