About VAR based on Gujarati's Basic Econometrics (BE) I tried together. It is based on BE's Example 17.13 and Section 22.9. Most of the translations are 22.9, but if you don't have BE, it will be difficult to read, so I tried to reproduce Gujarati's expression as much as possible. Many of Gujarati's Econometrics-related books are used as textbooks at universities and graduate schools in Europe and the United States. It is one of the most reliable textbooks, with a clear description of what econometrics can and cannot do.
In addition, we will hold a Free Online Study Session (Linear Regression) on June 16, 2020. We hope you will join us.
Regression analysis mostly deals with a model consisting of one equation. It consists of one dependent variable and one or more explanatory variables. Such models emphasize getting Y predictions and averaging. If there is a relationship between cause and effect, then in such a model it would be from X to Y. However, in many situations it is considered meaningless to discuss the direction and relationship between cause and effect. However, there is also a phenomenon that Y is fixed by X and X is fixed by Y. Sometimes X and Y affect both directions at the same time. In this case, the distinction between the dependent variable and the explanatory variable does not make sense. In such a simultaneous equation model, each is fixed at the same time as a set of variables. In such a model there are one or more equations, and in such a model the variables that are dependent on each other are called endogenous variables and are random variables. On the other hand, variables that are not truly stochastic are exogenous variables or pre-determined variables. In BE, chapters 18 to 20 are explained in detail in the simultaneous equation model (18), the discrimination problem (19), and the method of simultaneous equations (20). Consider the prices of money supply (Q) and commodities (P). The price of a commodity and the amount of money supply are determined by the interaction of that commodity with the supply and demand curve. Therefore, we represent these curves linearly and add noise to them to model the interaction.
Demand function $ Q_t ^ d = \ alpha_0 + \ alpha_1P_t + u_ {1t} $
Supply function $ Q_t ^ s = \ beta_0 + \ beta_1P_t + u_ {2t} $
Equilibrium condition $ Q_t ^ d = Q_t ^ s $
$ t $ is the time and $ \ alpha $, $ \ beta $ is the parameter.
Both the demand function ($ Q_t ^ d
In simultaneous equations (simultaneous equations), or structural models, variables are treated as endogenous, some exogenous, or pre-determined variables that combine exogenous and delayed endogenous. Before estimating such a model, it is necessary to check whether the equations in the system are (accurately or excessively) identifiable. This identifiability is achieved by assuming that some of the given variables exist only in some equation.
This decision is often subjective and has been severely criticized by Christopher Sims. According to The Sims, if there is true simultaneity between a set of variables, they should all be treated equally. There must be no prior distinction between endogenous and exogenous variables. It is based on this idea that Sims developed the VAR model.
(17.14.1) and (17.14.2) are the current value of GDP in terms of the past value of money supply and the value of past GDP, and the present in terms of the past value of money supply and the past value of GDP. Describes the value of the money supply. There are no exogenous variables in this system.
Now let's examine the nature of the causal relationship between Canada's money supply and interest rates. The money supply equation consists of the past values of the money supply and the interest rate, and the interest rate equation consists of the past values of the money supply and the money supply. Both of these examples are examples of vector autoregressive models. The term autoregressive is due to the use of past values, or delayed values, for the dependent variable on the right. The term vector is due to the fact that we are dealing with a vector of two (or more) variables.
Using the six delayed values of each variable as independent variables for the Canadian money supply and interest rates, we will see later, the causal relationship between the money supply (M1) and interest rates (90-day corporate interest rate (R)). We cannot reject the hypothesis that there is, that is, M1 affects R and R affects M1. This situation is best used with VAR.
To illustrate how to estimate VAR, we will assume that each equation has k delays of M (measured by M1) and R. In this case, OLS estimates each of the following equations:
Where u is a stochastic error term, called impulse, innovation, or shock in the VAR language.
Before estimating (22.9.1) and (22.9.2), we need to determine the maximum delay length k. This is empirically determined. The data used are 40 observations from 1979.I to 1988.IV. Including many delayed values in each equation reduces the degree of freedom. There is also the possibility of multicollinearity. If the number of delays is too low, the specifications may be incorrect. One way to solve this problem is to use information criteria such as Akaike and Schwartz and select the model with the lowest of these criteria. Trial and error is inevitable.
The following data is copied from Table 17.5.
date=pd.date_range(start='1979/1/31',end='1988/12/31',freq='Q')
M1=[22175,22841,23461,23427,23811,23612.33,24543,25638.66,25316,25501.33,25382.33,24753,
25094.33,25253.66,24936.66,25553,26755.33,27412,28403.33,28402.33,28715.66,28996.33,
28479.33,28669,29018.66,29398.66,30203.66,31059.33,30745.33,30477.66,31563.66,32800.66,
33958.33,35795.66,35878.66,36336,36480.33,37108.66,38423,38480.66]
R=[11.13333,11.16667,11.8,14.18333,14.38333,12.98333,10.71667,14.53333,17.13333,18.56667,
21.01666,16.61665,15.35,16.04999,14.31667,10.88333,9.61667,9.31667,9.33333,9.55,10.08333,
11.45,12.45,10.76667,10.51667,9.66667,9.03333,9.01667,11.03333,8.73333,8.46667,8.4,7.25,
8.30,9.30,8.7,8.61667,9.13333,10.05,10.83333]
M1=(np.array(M1)).reshape(40,1)
R=(np.array(R)).reshape(40,1)
ts=np.concatenate([M1,R],axis=1)
tsd=pd.DataFrame(ts,index=date,columns={'M1','R'})
ts_r=np.concatenate([R,M1],axis=1)
tsd_r=pd.DataFrame(ts_r,index=date,columns={'R','M1'})
tsd.M1.plot()
tsd.R.plot()
First, we use four delay values (k = 4) for each variable and use statsmodels to estimate the parameters of the two equations. The samples are from 1979.I to 1988.IV, but the samples from 1979.I to 1987.IV are used for estimation and the last four observations are used to diagnose the optimized VAR prediction accuracy.
Here we assume that both M1 and R are stationary. Also, since both equations have the same maximum delay length, we use OLS for regression. Each estimated coefficient may not be statistically significant, probably due to multicollinearity, because it contains delays for the same variable. But overall, the model is significant from the results of the F-test.
model = VAR(tsd.iloc[:-4])
results = model.fit(4)
results.summary()
Summary of Regression Results
==================================
Model: VAR
Method: OLS
Date: Wed, 06, May, 2020
Time: 22:50:28
--------------------------------------------------------------------
No. of Equations: 2.00000 BIC: 14.3927
Nobs: 32.0000 HQIC: 13.8416
Log likelihood: -289.904 FPE: 805670.
AIC: 13.5683 Det(Omega_mle): 490783.
--------------------------------------------------------------------
Results for equation M1
========================================================================
coefficient std. error t-stat prob
------------------------------------------------------------------------
const 2413.827162 1622.647108 1.488 0.137
L1.M1 1.076737 0.201737 5.337 0.000
L1.R -275.029144 57.217394 -4.807 0.000
L2.M1 0.173434 0.314438 0.552 0.581
L2.R 227.174784 95.394759 2.381 0.017
L3.M1 -0.366467 0.346875 -1.056 0.291
L3.R 8.511935 96.917587 0.088 0.930
L4.M1 0.077603 0.207888 0.373 0.709
L4.R -50.199299 64.755384 -0.775 0.438
========================================================================
Results for equation R
========================================================================
coefficient std. error t-stat prob
------------------------------------------------------------------------
const 4.919010 5.424158 0.907 0.364
L1.M1 0.001282 0.000674 1.901 0.057
L1.R 1.139310 0.191265 5.957 0.000
L2.M1 -0.002140 0.001051 -2.036 0.042
L2.R -0.309053 0.318884 -0.969 0.332
L3.M1 0.002176 0.001160 1.877 0.061
L3.R 0.052361 0.323974 0.162 0.872
L4.M1 -0.001479 0.000695 -2.129 0.033
L4.R 0.001076 0.216463 0.005 0.996
========================================================================
Correlation matrix of residuals
M1 R
M1 1.000000 -0.004625
R -0.004625 1.000000
Although the values of AIC and BIC are partially different, almost the same results as BE are obtained. First, let's look at the regression of M1. Delay 1 for M1 and delays 1 and 2 for R are statistically significant (5% level), respectively. Looking back at interest rates, M1 delays 1,2,4 and interest rate first-order delays are significant (5% level).
For comparison, the VAR results based on two delays for each endogenous variable are shown.
results = model.fit(2)
results.summary()
Summary of Regression Results
==================================
Model: VAR
Method: OLS
Date: Wed, 06, May, 2020
Time: 22:50:29
--------------------------------------------------------------------
No. of Equations: 2.00000 BIC: 13.7547
Nobs: 34.0000 HQIC: 13.4589
Log likelihood: -312.686 FPE: 603249.
AIC: 13.3058 Det(Omega_mle): 458485.
--------------------------------------------------------------------
Results for equation M1
========================================================================
coefficient std. error t-stat prob
------------------------------------------------------------------------
const 1451.976201 1185.593527 1.225 0.221
L1.M1 1.037538 0.160483 6.465 0.000
L1.R -234.884748 45.522360 -5.160 0.000
L2.M1 -0.044661 0.155908 -0.286 0.775
L2.R 160.155833 48.528324 3.300 0.001
========================================================================
Results for equation R
========================================================================
coefficient std. error t-stat prob
------------------------------------------------------------------------
const 5.796432 4.338943 1.336 0.182
L1.M1 0.001091 0.000587 1.858 0.063
L1.R 1.069081 0.166599 6.417 0.000
L2.M1 -0.001255 0.000571 -2.199 0.028
L2.R -0.223364 0.177600 -1.258 0.209
========================================================================
Correlation matrix of residuals
M1 R
M1 1.000000 -0.054488
R -0.054488 1.000000
Similarly, although the AIC and BIC values are partially different, almost the same results as BE are obtained. Here, in the money supply regression, we can see that both the first-order delay of the money supply and the delay of the interest rate term are statistically significant respectively. In the interest rate regression, the second-order delay in the money supply and the first-order delay in interest rates are significant.
Which is better, if you choose between four and two models with a number of delays? The amount of information of Akaike and Schwartz of the 4th model is 13.5683 and 14.3927, respectively, and the corresponding values of the 2nd model are 13.3058 and 13.7547. The lower the Akaike and Schwartz statistics, the better the model, so a concise model seems to be preferable. Again, the selection is a model that contains two delays for each endogenous variable.
Select a model with two delays. Used to predict the values of M1 and R. The data are from 1979.I to 1989.IV, but the 1989 values are not used to estimate the VAR model. Now let's predict 1989.I, the value of M1 for the first quarter of 1989. The predicted value for 1989.I can be obtained as follows.
mm=results.coefs_exog[0]+results.coefs[0,0,0]*tsd.iloc[-5,0]+results.coefs[1,0,0]*tsd.iloc[-6,0]+\
results.coefs[0,0,1]*tsd.iloc[-5,1]+results.coefs[1,0,1]*tsd.iloc[-6,1]
mm,M1[-4],mm-M1[-4],(mm-M1[-4])/M1[-4]
# (array([36995.50488527]),array([36480.33]),array([515.17488527]),array([0.01412199]))
Here, the coefficients are obtained from summary.report.
Using the appropriate values for M and R, we can see that the estimated quantity of money for the first quarter of 1988 is 36995 (Millions of Canadian dollars). The actual value of M in 1988 was 36480.33 (Millions of Canadian dollars). This is because the model overestimated the actual value by about 515 (millions of dollars). This is about 1.4% of the actual M in 1988. Of course, these estimates will vary depending on the number of delays in the VAR model.
Explain Y with X, and if Y changes when this X changes, it is said to have a Granger cause and effect. Let's use the grower causality tests of stats models to see if there is a causal relationship. Two endogenous variables and degree k are arguments. Tests whether the second column of the endogenous variable is the Granger causality of the first column. The null hypothesis of grangercausality tests is that the time series in the second column x2 does not cause the time series in the first column x1. Grange causality means that the past value of x1 has a statistically significant effect on the current value of x1 with the past value of x1 as the independent variable. If the p-value is below the desired significance level, we reject the null hypothesis that x2 does not cause x1 by Granger.
from statsmodels.tsa.stattools import grangercausalitytests
grangercausalitytests(tsd, 8)
Granger Causality
number of lags (no zero) 1
ssr based F test: F=15.1025 , p=0.0004 , df_denom=36, df_num=1
ssr based chi2 test: chi2=16.3610 , p=0.0001 , df=1
likelihood ratio test: chi2=13.6622 , p=0.0002 , df=1
parameter F test: F=15.1025 , p=0.0004 , df_denom=36, df_num=1
Granger Causality
number of lags (no zero) 2
ssr based F test: F=12.9265 , p=0.0001 , df_denom=33, df_num=2
ssr based chi2 test: chi2=29.7702 , p=0.0000 , df=2
likelihood ratio test: chi2=21.9844 , p=0.0000 , df=2
parameter F test: F=12.9265 , p=0.0001 , df_denom=33, df_num=2
Granger Causality
number of lags (no zero) 3
ssr based F test: F=7.7294 , p=0.0006 , df_denom=30, df_num=3
ssr based chi2 test: chi2=28.5987 , p=0.0000 , df=3
likelihood ratio test: chi2=21.1876 , p=0.0001 , df=3
parameter F test: F=7.7294 , p=0.0006 , df_denom=30, df_num=3
Granger Causality
number of lags (no zero) 4
ssr based F test: F=5.5933 , p=0.0021 , df_denom=27, df_num=4
ssr based chi2 test: chi2=29.8309 , p=0.0000 , df=4
likelihood ratio test: chi2=21.7285 , p=0.0002 , df=4
parameter F test: F=5.5933 , p=0.0021 , df_denom=27, df_num=4
Granger Causality
number of lags (no zero) 5
ssr based F test: F=4.1186 , p=0.0077 , df_denom=24, df_num=5
ssr based chi2 test: chi2=30.0318 , p=0.0000 , df=5
likelihood ratio test: chi2=21.6835 , p=0.0006 , df=5
parameter F test: F=4.1186 , p=0.0077 , df_denom=24, df_num=5
Granger Causality
number of lags (no zero) 6
ssr based F test: F=3.5163 , p=0.0144 , df_denom=21, df_num=6
ssr based chi2 test: chi2=34.1585 , p=0.0000 , df=6
likelihood ratio test: chi2=23.6462 , p=0.0006 , df=6
parameter F test: F=3.5163 , p=0.0144 , df_denom=21, df_num=6
Granger Causality
number of lags (no zero) 7
ssr based F test: F=2.0586 , p=0.1029 , df_denom=18, df_num=7
ssr based chi2 test: chi2=26.4190 , p=0.0004 , df=7
likelihood ratio test: chi2=19.4075 , p=0.0070 , df=7
parameter F test: F=2.0586 , p=0.1029 , df_denom=18, df_num=7
Granger Causality
number of lags (no zero) 8
ssr based F test: F=1.4037 , p=0.2719 , df_denom=15, df_num=8
ssr based chi2 test: chi2=23.9564 , p=0.0023 , df=8
likelihood ratio test: chi2=17.8828 , p=0.0221 , df=8
parameter F test: F=1.4037 , p=0.2719 , df_denom=15, df_num=8
granger causality tests perform four tests.
‘Params_ftest’ and ‘ssr_ftest’ use the F distribution. ‘Ssr_chi2test’ and ‘lrtest’ use a chi-square distribution. We have found that delays of up to 1-6 are causal to Granger with R of M, but with delays 7 and 8, there is no causal relationship between the two variables.
Next, let's look at the reverse relationship.
grangercausalitytests(tsd_r, 8)
Granger Causality
number of lags (no zero) 1
ssr based F test: F=0.2688 , p=0.6073 , df_denom=36, df_num=1
ssr based chi2 test: chi2=0.2912 , p=0.5894 , df=1
likelihood ratio test: chi2=0.2902 , p=0.5901 , df=1
parameter F test: F=0.2688 , p=0.6073 , df_denom=36, df_num=1
Granger Causality
number of lags (no zero) 2
ssr based F test: F=3.2234 , p=0.0526 , df_denom=33, df_num=2
ssr based chi2 test: chi2=7.4237 , p=0.0244 , df=2
likelihood ratio test: chi2=6.7810 , p=0.0337 , df=2
parameter F test: F=3.2234 , p=0.0526 , df_denom=33, df_num=2
Granger Causality
number of lags (no zero) 3
ssr based F test: F=2.7255 , p=0.0616 , df_denom=30, df_num=3
ssr based chi2 test: chi2=10.0844 , p=0.0179 , df=3
likelihood ratio test: chi2=8.9179 , p=0.0304 , df=3
parameter F test: F=2.7255 , p=0.0616 , df_denom=30, df_num=3
Granger Causality
number of lags (no zero) 4
ssr based F test: F=2.4510 , p=0.0702 , df_denom=27, df_num=4
ssr based chi2 test: chi2=13.0719 , p=0.0109 , df=4
likelihood ratio test: chi2=11.1516 , p=0.0249 , df=4
parameter F test: F=2.4510 , p=0.0702 , df_denom=27, df_num=4
Granger Causality
number of lags (no zero) 5
ssr based F test: F=1.8858 , p=0.1343 , df_denom=24, df_num=5
ssr based chi2 test: chi2=13.7504 , p=0.0173 , df=5
likelihood ratio test: chi2=11.5978 , p=0.0407 , df=5
parameter F test: F=1.8858 , p=0.1343 , df_denom=24, df_num=5
Granger Causality
number of lags (no zero) 6
ssr based F test: F=2.7136 , p=0.0413 , df_denom=21, df_num=6
ssr based chi2 test: chi2=26.3608 , p=0.0002 , df=6
likelihood ratio test: chi2=19.5153 , p=0.0034 , df=6
parameter F test: F=2.7136 , p=0.0413 , df_denom=21, df_num=6
Granger Causality
number of lags (no zero) 7
ssr based F test: F=2.8214 , p=0.0360 , df_denom=18, df_num=7
ssr based chi2 test: chi2=36.2076 , p=0.0000 , df=7
likelihood ratio test: chi2=24.4399 , p=0.0010 , df=7
parameter F test: F=2.8214 , p=0.0360 , df_denom=18, df_num=7
Granger Causality
number of lags (no zero) 8
ssr based F test: F=1.6285 , p=0.1979 , df_denom=15, df_num=8
ssr based chi2 test: chi2=27.7934 , p=0.0005 , df=8
likelihood ratio test: chi2=20.0051 , p=0.0103 , df=8
parameter F test: F=1.6285 , p=0.1979 , df_denom=15, df_num=8
Here, the null hypothesis is rejected at the 6th and 7th orders.
The results vary. One of the meanings of Granger's representation theorem is that two variables, Xt and Yt, are in a cointegration relationship, each individually I (1) and sum, and each individually unsteady. Then Xt may have Granger causal Yt, or Yt may have Granger causal Xt.
In this example, if M1 and R are individually cointegrations at I (1), then M1 has Granger causal R or R has Granger causal M1. That is, you must first check if the two variables are individually I (1) and then check if they are cointegrations. If this is not the case, the entire causal problem can be fundamentally suspected. Looking at M1 and R in practice, it's not clear if these two variables are cointegrations. Therefore, the consequences of Granger causality also vary.
Proponents of VAR emphasize the advantages of this method:
(1) The method is simple. You don't have to wonder which variables are endogenous and which are exogenous. All variables in VAR are endogenous.
(2) Prediction is easy. That is, the usual OLS method can be applied to each equation.
(3) The predictions obtained by this method are often superior to those obtained from more complex simultaneous equation models.
However, critics of VAR modeling point out the following issues:
Unlike the simultaneous equation model, the VAR model is theoretical. This is because I don't use much previous information (experience). In a simultaneous equation model, including or not including certain variables play an important role in identifying the model.
Due to the focus on forecasting, the VAR model is not very suitable for policy analysis.
The biggest practical challenge in VAR modeling is choosing the right delay length. Suppose you have a 3-variable VAR model and you decide to include 8 delays for each variable in each equation. Each equation has 24 delay parameters and constant terms, for a total of 25 parameters. Estimating many parameters, unless the sample size is large, reduces many degrees of freedom for all related problems.
Strictly speaking, in the m-variable VAR model, all m-variables (together) must be stationary. If not, the data must be transformed appropriately (for example, by first-order diffs). As Harvey points out, the results from the transformed data can be inadequate. He further states: "The usual method adopted by VAR supporters is to use levels even if some of these time series are non-stationary. In this case, the effect of unit roots on the distribution of estimators is important. To make matters worse, if your model contains a mixture of I (0) and I (1) variables, that is, stationary and non-stationary variables, converting the data is not easy.
Since each coefficient of the estimated VAR model is often difficult to interpret, proponents of this technique often estimate the so-called impulse response function (IRF). The IRF tracks the response of the dependent variable of the VAR system to investigate the effects of error terms such as u1 and u2 in (22.9.1) and (22.9.2). Suppose u1 in the M1 equation increases by one standard deviation. Such shocks or changes will change M1 now and in the future. However, since M1 appears in the regression of R, changes in u1 also affect R. Similarly, changing one standard deviation in u2 of the R equation affects M1. The IRF will track the effects of such shocks in the future. Researchers have questioned the usefulness of such IRF analysis, but it is central to VAR analysis.
This is the simple translation of BE. After this, it is written with reference to Vector autoregression.
$ y_ {t-i} $ is the i-th order delay of y. c is a vector of degree k. $ A_i $ is a time-unbiased matrix of kxk. u is a vector of error terms of degree k.
The order of the sum of all variables must be the same.
Get long-term data from FRED and analyze it from a long-term perspective. The Canadian money supply uses MANMM101CAM189S and the interest rate uses IR3TCP01CAM156N.
start="1979/1"
end="2020/12"
M1_0 = web.DataReader("MANMM101CAM189S", 'fred',start,end)/1000000
R1_0 = web.DataReader("IR3TCP01CAM156N", 'fred',start,end)#IR3TIB01CAM156N
M1=M1_0.resample('Q').last()
R1=R1_0.resample('Q').last()
M1.plot()
R.plot()
Examine the stationarity.
from statsmodels.tsa.stattools import adfuller
import pandas as pd
tsd=pd.concat([M1,R1],axis=1)
tsd.columns=['M1','R']
index=['ADF Test Statistic','P-Value','# Lags Used','# Observations Used']
adfTest = adfuller((tsd.M1), autolag='AIC',regression='nc')
dfResults = pd.Series(adfTest[0:4], index)
print('Augmented Dickey-Fuller Test Results:')
print(dfResults)
Augmented Dickey-Fuller Test Results:
ADF Test Statistic -1.117517
P-Value 0.981654
# Lags Used 5.000000
# Observations Used 159.000000
dtype: float64
Not surprisingly, the M1 follows a random walk. This is the same even if regression is set to c, ct, ctt.
adfTest = adfuller((tsd.R), autolag='AIC',regression='nc')
dfResults = pd.Series(adfTest[0:4], index)
print('Augmented Dickey-Fuller Test Results:')
print(dfResults)
Augmented Dickey-Fuller Test Results:
ADF Test Statistic -4.082977
P-Value 0.006679
# Lags Used 3.000000
# Observations Used 161.000000
dtype: float64
As a matter of course, R is a stationary process as it is. This is the same even if regression is set to c, ct, ctt.
So I took the logarithm of M1.
adfTest = adfuller((np.log(tsd.M1)), autolag='AIC',regression='ct')
dfResults = pd.Series(adfTest[0:4], index)
print('Augmented Dickey-Fuller Test Results:')
print(dfResults)
Augmented Dickey-Fuller Test Results:
ADF Test Statistic -3.838973
P-Value 0.014689
# Lags Used 14.000000
# Observations Used 150.000000
dtype: float64
The logarithm of M1 seems to have trend stationary.
Let's remove the trend.
# remove time trend
gap=np.linspace(np.log(M1.iloc[0]), np.log(M1.iloc[-1]), len(M1))
lnM1=np.log(M1)
lnM1.plot()
alnM1=lnM1.copy()
alnM1['a']=gap
alnM1=alnM1.iloc[:,0]-alnM1.a
alnM1.plot()
adfTest = adfuller(alnM1, autolag='AIC',regression='nc')
dfResults = pd.Series(adfTest[0:4], index)
print('Augmented Dickey-Fuller Test Results:')
print(dfResults)
Augmented Dickey-Fuller Test Results:
ADF Test Statistic -1.901991
P-Value 0.054542
# Lags Used 14.000000
# Observations Used 150.000000
dtype: float64
lnM has become a stationary process with the trend removed.
First, let's analyze along with BE.
tsd0=pd.concat([alnM1,R1],axis=1)
tsd0.columns=['alnM1','R']
tsd=pd.concat([lnM1,R1],axis=1)
tsd.columns=['lnM1','R']
model = VAR(tsd.iloc[:36])
results = model.fit(4)
results.summary()
Summary of Regression Results
==================================
Model: VAR
Method: OLS
Date: Thu, 07, May, 2020
Time: 11:57:17
--------------------------------------------------------------------
No. of Equations: 2.00000 BIC: -5.33880
Nobs: 32.0000 HQIC: -5.88999
Log likelihood: 25.8004 FPE: 0.00217196
AIC: -6.16328 Det(Omega_mle): 0.00132308
--------------------------------------------------------------------
Results for equation lnM1
==========================================================================
coefficient std. error t-stat prob
--------------------------------------------------------------------------
const 0.358173 0.225376 1.589 0.112
L1.lnM1 1.286462 0.194312 6.621 0.000
L1.R -0.005751 0.001961 -2.933 0.003
L2.lnM1 0.025075 0.298562 0.084 0.933
L2.R 0.001647 0.002730 0.604 0.546
L3.lnM1 -0.278622 0.295859 -0.942 0.346
L3.R 0.006311 0.002814 2.243 0.025
L4.lnM1 -0.062508 0.195688 -0.319 0.749
L4.R -0.004164 0.002222 -1.875 0.061
==========================================================================
Results for equation R
==========================================================================
coefficient std. error t-stat prob
--------------------------------------------------------------------------
const 38.199790 21.797843 1.752 0.080
L1.lnM1 -15.488358 18.793423 -0.824 0.410
L1.R 0.875018 0.189630 4.614 0.000
L2.lnM1 7.660621 28.876316 0.265 0.791
L2.R -0.345128 0.263996 -1.307 0.191
L3.lnM1 35.719033 28.614886 1.248 0.212
L3.R 0.310248 0.272203 1.140 0.254
L4.lnM1 -31.044707 18.926570 -1.640 0.101
L4.R -0.162658 0.214871 -0.757 0.449
==========================================================================
Correlation matrix of residuals
lnM1 R
lnM1 1.000000 -0.135924
R -0.135924 1.000000
Next, use the data with the trend removed.
model = VAR(tsd0.iloc[:36])
results = model.fit(4)
results.summary()
Summary of Regression Results
==================================
Model: VAR
Method: OLS
Date: Thu, 07, May, 2020
Time: 10:50:42
--------------------------------------------------------------------
No. of Equations: 2.00000 BIC: -5.31179
Nobs: 32.0000 HQIC: -5.86298
Log likelihood: 25.3682 FPE: 0.00223143
AIC: -6.13627 Det(Omega_mle): 0.00135930
--------------------------------------------------------------------
Results for equation alnM1
===========================================================================
coefficient std. error t-stat prob
---------------------------------------------------------------------------
const 0.031290 0.024819 1.261 0.207
L1.alnM1 1.237658 0.189124 6.544 0.000
L1.R -0.005209 0.001840 -2.831 0.005
L2.alnM1 0.035479 0.288928 0.123 0.902
L2.R 0.001341 0.002650 0.506 0.613
L3.alnM1 -0.267898 0.285970 -0.937 0.349
L3.R 0.006273 0.002722 2.304 0.021
L4.alnM1 -0.092060 0.190650 -0.483 0.629
L4.R -0.004456 0.002161 -2.062 0.039
===========================================================================
Results for equation R
===========================================================================
coefficient std. error t-stat prob
---------------------------------------------------------------------------
const 2.626115 2.588966 1.014 0.310
L1.alnM1 -18.059084 19.728553 -0.915 0.360
L1.R 0.945671 0.191924 4.927 0.000
L2.alnM1 7.182544 30.139598 0.238 0.812
L2.R -0.342745 0.276454 -1.240 0.215
L3.alnM1 37.385646 29.831061 1.253 0.210
L3.R 0.319531 0.283972 1.125 0.260
L4.alnM1 -30.462525 19.887663 -1.532 0.126
L4.R -0.141785 0.225455 -0.629 0.529
===========================================================================
Correlation matrix of residuals
alnM1 R
alnM1 1.000000 -0.099908
R -0.099908 1.000000
The results show almost the same characteristics, but improvements are seen in AIC and BIC.
Let's use recent data.
model = VAR(tsd0.iloc[-40:])
results = model.fit(4)
results.summary()
Summary of Regression Results
==================================
Model: VAR
Method: OLS
Date: Thu, 07, May, 2020
Time: 11:06:09
--------------------------------------------------------------------
No. of Equations: 2.00000 BIC: -12.2865
Nobs: 36.0000 HQIC: -12.8019
Log likelihood: 151.245 FPE: 2.13589e-06
AIC: -13.0783 Det(Omega_mle): 1.36697e-06
--------------------------------------------------------------------
Results for equation alnM1
===========================================================================
coefficient std. error t-stat prob
---------------------------------------------------------------------------
const 0.019669 0.012024 1.636 0.102
L1.alnM1 0.706460 0.175974 4.015 0.000
L1.R -0.015862 0.008523 -1.861 0.063
L2.alnM1 -0.046162 0.185186 -0.249 0.803
L2.R -0.020842 0.011837 -1.761 0.078
L3.alnM1 0.568076 0.186205 3.051 0.002
L3.R 0.035471 0.011813 3.003 0.003
L4.alnM1 -0.461882 0.175777 -2.628 0.009
L4.R -0.007579 0.009849 -0.769 0.442
===========================================================================
Results for equation R
===========================================================================
coefficient std. error t-stat prob
---------------------------------------------------------------------------
const -0.053724 0.308494 -0.174 0.862
L1.alnM1 1.054672 4.515026 0.234 0.815
L1.R 0.875299 0.218682 4.003 0.000
L2.alnM1 -5.332917 4.751384 -1.122 0.262
L2.R 0.257259 0.303711 0.847 0.397
L3.alnM1 3.412184 4.777534 0.714 0.475
L3.R -0.263699 0.303088 -0.870 0.384
L4.alnM1 4.872672 4.509976 1.080 0.280
L4.R 0.032439 0.252706 0.128 0.898
===========================================================================
Correlation matrix of residuals
alnM1 R
alnM1 1.000000 -0.168029
R -0.168029 1.000000
The result before trend removal is
Summary of Regression Results
==================================
Model: VAR
Method: OLS
Date: Thu, 07, May, 2020
Time: 11:07:32
--------------------------------------------------------------------
No. of Equations: 2.00000 BIC: -12.3214
Nobs: 38.0000 HQIC: -12.5991
Log likelihood: 144.456 FPE: 2.90430e-06
AIC: -12.7524 Det(Omega_mle): 2.26815e-06
--------------------------------------------------------------------
Results for equation lnM1
==========================================================================
coefficient std. error t-stat prob
--------------------------------------------------------------------------
const 0.079744 0.100134 0.796 0.426
L1.lnM1 0.784308 0.174023 4.507 0.000
L1.R -0.016979 0.009977 -1.702 0.089
L2.lnM1 0.211960 0.174036 1.218 0.223
L2.R 0.012038 0.009846 1.223 0.221
==========================================================================
Results for equation R
==========================================================================
coefficient std. error t-stat prob
--------------------------------------------------------------------------
const -1.450824 2.077328 -0.698 0.485
L1.lnM1 0.736725 3.610181 0.204 0.838
L1.R 0.884364 0.206971 4.273 0.000
L2.lnM1 -0.617456 3.610443 -0.171 0.864
L2.R -0.027052 0.204257 -0.132 0.895
==========================================================================
Correlation matrix of residuals
lnM1 R
lnM1 1.000000 -0.260828
R -0.260828 1.000000
Recommended Posts