[PYTHON] Time series analysis Part 2 AR / MA / ARMA

1. Overview

Continuing from Part 1 (https://qiita.com/asys/items/d28a0ce0a8f51681a243), I am studying based on "Measurement time series analysis of economic and financial data".
This time about the ARMA model, which corresponds to Chapter 2.

2. Data

Continuing from the previous time, we will use TOPIX historical data. In addition, I felt that there was data that seemed to have autocorrelation, so I prepared data on the number of foreign visitors to Japan every month. I used the one on the website of JTB Research Institute linked below. https://www.tourism.jp/tourism-database/stats/inbound/ At first glance, the data is seasonal, and the yen is depreciating in Abenomics, so the overall trend is upward.

Moving Average Process Let's start with the first MA process. MA(1):y_t=\mu+\epsilon_t+\theta_1\epsilon_{t-1}, \quad \epsilon_t \sim W.N.(\sigma^2) Let's decide some parameters and plot them. For the time being, set $ \ sigma = 1.0 $ and use numpy's random.randn.

wn = np.random.randn(50)
mu = 0.0
theta_1 = 1.0
map = mu + wn[1:] + theta_1 * wn[:-1]

The data was generated like this. Considering the green line $ \ theta_1 = 0.0 $ as a reference, it is easy to understand the influence of the sign and value of $ \ theta_1 . The blue line ( \ theta_1 = 1.0 ) and the red line ( \ theta_1 = -1.0 ) are located on opposite sides of the green line, and the orange line ( \ theta_1 = 0.5 $) is blue. It is moving between the line and the green line.

The main statistics are as follows. mean : \quad E(y_t)=\mu variance : \quad \gamma_0=Var(y_t)=(1+\theta_1^2)\sigma^2 first-order auto-covariance : \quad \gamma_1=Cov(y_t,y_{t-1}) = \theta_1\sigma^2 first-order auto-correlation : \quad \rho_1=\frac{\gamma_1}{\gamma_0}=\frac{\theta_1}{1+\theta_1^2}

Check the correlogram. In the plot above, the number of data is 50, but here it is created with 1,000 data.

Up to this point, it's an intuitive image. Only the first-order autocorrelation is significant, and when $ \ theta_1 <0 $, the autocorrelation is a negative value.

Next, let's look at a generalized version of the MA process. MA(q):y_t=\mu+\epsilon_t+\theta_1\epsilon_{t-1}+\theta_2\epsilon_{t-2}+\cdots+\theta_q\epsilon_{t-q}, \quad \epsilon_t \sim W.N.(\sigma^2)

The main statistics are as follows. \quad E(y_t)=\mu \quad \gamma_0=Var(y_t)=(1+\theta_1^2+\theta_2^2+\cdots+\theta_q^2)\sigma^2 \quad\gamma_k=\left\\{\begin{array}{ll}(\theta_k+\theta_1\theta_{k+1}+\cdots+\theta_{q-k}\theta_q)\sigma^2,& 1\leqq k\leqq q\\\0, & k\geqq q+1\end{array}\right. $ \ quad $ MA process is always steady \quad\rho_k=\left\\{\begin{array}{ll}\frac{\theta_k+\theta_1\theta{k+1}+\cdots+\theta_{q-k}\theta_q}{1+\theta_1^2+\theta_2^2+\cdots+\theta_q^2},&1\leqq k\leqq q\\\0,&k\geqq q+1\end{array}\right.

[Supplement] What is stationarity? If less than $ \ quad $ holds, the process is weak stationarity. \qquad \forall t,k\in\mathbb{N},\quad E(y_t)=\mu \land Cov(y_t,t_{t-k})=E[(y_t-\mu)(y_{t-k}-\mu)]=\gamma_k $ \ quad $ Also, at this time, autocorrelation \qquad Corr(y_t,y_{t-k})=\frac{\gamma_kt}{\sqrt{\gamma_{0,t}\gamma_{0,t-k}}}=\frac{\gamma_k}{\gamma_0}=\rho_k $ \ Quad $ holds. $ \ quad $ Both autocovariance and autocorrelation depend only on the time difference $ k $. $ \ Quad $ Also, $ \ gamma_k = \ gamma_ {-k} $ and $ \ rho_k = \ rho_ {-k} $ are established.

Autoregressive Process Next is the AR process. First of all, from the primary AR process. AR(1):y_t=c+\phi_1y_{t-1}+\epsilon_t,\quad \epsilon_t\sim W.N.(\sigma^2)

Let's plot it. Here, $ y_0 = 0 $, $ c = 1 $, and $ \ sigma ^ 2 = 1 $.

AR1 = []
y = 0
c = 1
phi_1 = 0.5
for i in range(200):
    AR1.append(y)
    y = c + phi_1*y + np.random.randn()

\phi_1=1.0At that time, it can be visually understood that the process is not steady. In fact,|\phi_1|\leq1At this time, the process becomes steady. As for the value of $ \ phi_1 $, it can be seen that the smaller the value, the lower the average value, and the smaller the value, the higher or lower the average value.

The main statistics are as follows. However,|\phi_1|\leq1soyIs steady. \quad E(y_t)=c+\phi_1E(y_{t-1})=\frac{c}{1-\phi_1} \quad Var(y_t)=\phi_1^2Var(y_{t-1})+\sigma^2=\frac{\sigma^2}{1-\phi_1^2} \quad \gamma_k=\phi_1\gamma_{k-1} $\quad \rho_k=\phi_1\rho_{k-1}\qquad\cdots $(Yule-Walker Equation)

Next is the generalized AR process. AR(p):y_t=c+\phi_1y_{t-1}+\phi_2y_{t-2}+\cdots+\phi_py_{t-p}+\epsilon_t, \quad \epsilon_t\sim W.N.(\sigma^2)

The main statistics are as follows. \quad \mu=E(y_t)=\frac{c}{1-\phi_1-\phi_2-\cdots-\phi_p} \quad \gamma_0=Var(y_t)=\frac{\sigma^2}{1-\phi_1\rho_1-\phi_2\rho_2-\cdots-\phi_p\rho_{p}} \quad \gamma_k=\phi_1\gamma_{k-1}+\phi_2\gamma_{k-2}+\cdots+\phi_p\gamma_{k-p},\quad k\geqq1 \quad \rho_k=\phi_1\rho_{k-1}+\phi_2\rho_{k-2}+\cdots+\phi_p\rho_{k-p},\quad k\geqq1,\quad\cdots(Yule-Walker Equation) The autocorrelation of the $ \ quad $ AR process decays exponentially.

The correlogram is as follows.

Autoregressive Moving Average Process The general terms of the ARMA process, which combines the AR and MA processes, are given below. ARMA(p,q):y_t=c+\phi_1y_{t-1}+\cdots+\phi_py_{t-p}+\epsilon_t+\theta_1\epsilon_t-1+\cdots+\theta_q\epsilon_{t-q},\quad \epsilon_t\sim W.N.(\sigma^2)

$ ARMA (p, q) $ has the following properties. \quad \mu=E(y_t)=\frac{c}{1-\phi_1-\phi_2-\cdots-\phi_q} \quad \gamma_k=\phi_1\gamma_{k-1}+\phi_2\gamma_{k-2}+\cdots+\phi_p\gamma_{k-p},\quad k\geqq q+1 \quad \rho_k=\phi_1\rho_{k-1}+\phi_2\rho_{k-2}+\cdots+\phi_p\rho_{k-p},\quad k\geqq q+1 The autocorrelation of the $ \ quad $ ARMA process decays exponentially.

Let's actually plot $ ARMA (3,3) $. The parameters are as follows. c=1.5, \phi_1=-0.5, \phi_2=0.5,\phi_3=0.8, \theta_1=0.8, \theta_2=0.5, \theta_3=-0.5, \sigma=1

arma = [0,0,0]
wn = [0,0,0]
c = 1.5
phi_1 = -0.5
phi_2 = 0.5
phi_3 = 0.8
theta_1 = 0.8
theta_2 = 0.5
theta_3 = -0.5
for i in range(1000):
    eps = np.random.randn()
    y = c + phi_1*arma[-1] + phi_2*arma[-2] + phi_3*arma[-3] + eps + theta_1*wn[-1] + theta_2*wn[-2] + theta_3*wn[-3]
    wn.append(eps)
    arma.append(y)

Stationary state and reversibility of ARMA process
The MA process is stationary, but the AR process is not always stationary.
Since there are multiple MA processes with the same expected value and autocorrelation structure, it becomes a problem in model selection.
Stationarity and reversibility are important in considering this.

Stationarity (This is the same as the steady state condition of the AR process.) AR characteristic equation : 1-\phi_1z-\cdots-\phi_pz^p=0 The AR process is steady when the absolute values of all the solutions of this AR characteristic equation are greater than 1.

Reversability The MA process is reversible when the MA process can be rewritten into an AR (∞) process. MA characteristic equation : 1+\theta_1z+\theta_2z^2+\cdots+\theta_pz^p=0 When all the solutions of this MA characteristic equation are greater than 1, the MA process is invertible. Invertable MA processes are desirable for model selection.

6. Estimating the ARMA model

Least squares (OLS: ordinary least squares)
The OLS selects the parameters of the ARMA model to minimize the sum of squared residuals (SSR) between the actually observed values and the values estimated by the ARMA model.
\quad\displaystyle{\min_{\Theta}}\quad SSR=(y_t-ARMA(\Theta))^2
You can solve the normal equations that are partially differentiated with respect to each parameter.
\quad \frac{\partial SSR}{\partial \Theta}=0
Maximum likelihood estimation (MLE)
The maximum likelihood method is the approach of selecting the parameter with the maximum log-likelihood.
\quad\displaystyle{\max_{\Theta}}\quad \mathcal{L}(\theta) = \displaystyle\sum_{t=1}^{T}\log f_{Y_t|Y_{t-1}}(y_t|y_{t-1};\Theta)

From here, let's estimate the parameters for $ ARMA (3,3) $ actually created in 5. The statsmodels library is used because it is complicated to calculate by hand. See the URL below for details. https://www.statsmodels.org/stable/generated/statsmodels.tsa.arima_model.ARMA.html It seems that if you give the sample sample and the order of the model with order = (p, q) to the argument of ARMA, the parameter will be calculated by maximum likelihood estimation.

arma_model = sm.tsa.ARMA(arma, order=(3,3))
result = arma_model.fit()
print(result.summary())

                              ARMA Model Results                              
==============================================================================
Dep. Variable:                      y   No. Observations:                 1003
Model:                     ARMA(3, 3)   Log Likelihood               -1533.061
Method:                       css-mle   S.D. of innovations              1.113
Date:                Sun, 08 Dec 2019   AIC                           3082.123
Time:                        14:51:34   BIC                           3121.409
Sample:                             0   HQIC                          3097.052
                                                                              
==============================================================================
                 coef    std err          z      P>|z|      [0.025      0.975]
------------------------------------------------------------------------------
const          6.8732      0.410     16.773      0.000       6.070       7.676
ar.L1.y       -0.4986      0.024    -21.149      0.000      -0.545      -0.452
ar.L2.y        0.5301      0.019     27.733      0.000       0.493       0.568
ar.L3.y        0.8256      0.020     41.115      0.000       0.786       0.865
ma.L1.y        0.7096      0.036     19.967      0.000       0.640       0.779
ma.L2.y        0.3955      0.039     10.176      0.000       0.319       0.472
ma.L3.y       -0.4078      0.033    -12.267      0.000      -0.473      -0.343
                                    Roots                                    
=============================================================================
                  Real          Imaginary           Modulus         Frequency
-----------------------------------------------------------------------------
AR.1            1.0450           -0.0000j            1.0450           -0.0000
AR.2           -0.8436           -0.6689j            1.0766           -0.3933
AR.3           -0.8436           +0.6689j            1.0766            0.3933
MA.1           -0.6338           -0.8332j            1.0469           -0.3535
MA.2           -0.6338           +0.8332j            1.0469            0.3535
MA.3            2.2373           -0.0000j            2.2373           -0.0000
-----------------------------------------------------------------------------

And so on. We are able to estimate values that are reasonably close to the original parameters. A few notes, S.D. of innovations represents the dispersion of white noise, and to find the value of c from const,

print(result.params[0] * (1-result.arparams.sum()))

0.9824998883509097

Need to do as.

7. ARMA model selection

To select $ ARMA (p, q) $, first conservatively narrow down the $ p, q $ values from the autocorrelation and partial autocorrelation values, and then determine the optimal model based on the information criterion. take.

First, the k-th-order partial autocorrelation removes the influence of $ y_ {t-1}, \ cdots, y_ {t-k + 1} $ from $ y_t $ and $ y_ {tk} $. It is defined as the correlation of things.

\left\\{\begin{array}{lll}MA(q):&y_t=\mu+\epsilon_t+\theta_1\epsilon_{t-1}+\theta_2\epsilon_{t-2}+\cdots+\theta_q\epsilon_{t-q}\\\AR(p):&y_t=c+\phi_1y_{t-1}+\phi_2y_{t-2}+\cdots+\phi_py_{t-p}+\epsilon_t\\\ARIMA(p,q):&y_t=c+\phi_1y_{t-1}+\cdots+\phi_py_{t-p}+\epsilon_t+\theta_1\epsilon_t-1+\cdots+\theta_q\epsilon_{t-q} \end{array}\right.

Considering the above definition, the properties shown in the table below are clear.

model	Autocorrelation	偏Autocorrelation
AR(p)	Decaying	p+1After the next0
MA(q)	q+1After the next0	Decaying
ARMA(p,q)	Decaying	Decaying

This makes it possible to narrow down the model to some extent by looking at the structure of sample autocorrelation and sample partial autocorrelation.

Next is the information criterion (IC). \quad IC=-2\mathcal{L}(\hat{\Theta})+p(T)k \quad where T : number of samples, k : number of parameters There are two commonly used information criteria: $ \ left \ {\ begin {array} {ll} Akaike's Information Criterion (AIC): \ quad & AIC = -2 \ mathcal (L) (\ hat {\ Theta}) + 2k \\ Schwarz Information Criterion (AIC) SIC): & SIC = -2 \ mathcal (L) (\ hat {\ Theta}) + \ log (T) k \ end {array} \ right. $ SIC tends to choose a smaller model, but if the sample size is sufficient, the maximum likelihood estimator of the parameter of the excessive part of AIC may converge to 0, so it is a general rule to say which is the better criterion. It is not possible.

8. Modeling the actual data

We have prepared data on TOPIX and the number of foreign visitors to Japan, but both are in the shape of not being steady. Therefore, I had no choice but to create data that excludes trends by subtracting the moving average for the 1996-2007 portion of the number of foreign visitors to Japan.

visit.head(3)

	month	visits
0	1996-01-01	276086
1	1996-02-01	283667
2	1996-03-01	310702

visit['trend'] = visit['visits'].rolling(window=13).mean().shift(-6)
visit['residual'] = visit['visits'] - visit['trend']
v = visit[visit['month']<'2008-01-01']
plt.plot(v['month'].values, v['visits'].values, label='original')
plt.plot(v['month'].values, v['trend'].values, label='trend')
plt.plot(v['month'].values, v['residual'].values, label='residual')
plt.legend()
plt.grid()
plt.title('Foreign Visitors')
plt.show()

As a trend, I decided to take a moving average of 6 months before and after. This is processed by rolling (window = 13) .mean () and shift (-6).

First, let's look at autocorrelation and partial autocorrelation. Very easy with the library.

fig = plt.figure(figsize=(8,5))
ax1 = fig.add_subplot(211)
fig = sm.graphics.tsa.plot_acf(v['residual'].dropna().values, lags=40, ax=ax1)
ax2 = fig.add_subplot(212)
fig = sm.graphics.tsa.plot_pacf(v['residual'].dropna().values, lags=40, ax=ax2)
plt.tight_layout()
fig.show()

There is a strong seasonality that cannot be hidden. .. .. The correlation every 12 months is very strong. You can use statsmodels' arma_order_select_ic to estimate the optimal order of the ARMA model. For details, go to this link (https://www.statsmodels.org/dev/generated/statsmodels.tsa.stattools.arma_order_select_ic.html)

sm.tsa.arma_order_select_ic(v['residual'].dropna().values, max_ar=4, max_ma=2, ic='aic')

Here, for $ ARMA (p, q) $, the AIC criterion is used with the maximum value of $ p $ being 4 and the maximum value of $ q $ being 2.

	0	1	2
0	3368.221521	3363.291271	3350.327397
1	3365.779849	3348.257023	3331.293926
2	3365.724955	3349.083663	3328.831252
3	3361.660853	3347.156390	3329.447773
4	3332.100417	3290.984260	3292.800604

I feel that none of them change much, but the result is that $ p = 4, q = 1 $ is good.

arma_model = sm.tsa.ARMA(v['residual'].dropna().values, order=(4,1))
result = arma_model.fit()
plt.figure(figsize=(10,4))
plt.plot(v['residual'].dropna().values, label='residual')
plt.plot(result.fittedvalues, label='ARMA(4,1)')
plt.legend()
plt.grid()
plt.title('ARMA(4,1)')
plt.show()

It's like this. Personally, I felt that he was doing his best despite being a relatively simple model.