[PYTHON] Time series analysis # 6 Spurious regression and cointegration

1. Overview

2. Spurious regression

Definition

There seems to be a significant relationship between $ x_t $ and $ y_t $ when regressing $ y_t = \ alpha + \ beta x_t + \ epsilon_t $ for two unrelated unit root processes $ x_t $ and $ y_t $. The phenomenon that looks like is called spurious regression.

Verification

#Data generation
sigma_x, sigma_y = 1, 2
T = 10000
xt = np.cumsum(np.random.randn(T) * sigma_x).reshape(-1, 1)
yt = np.cumsum(np.random.randn(T) * sigma_y).reshape(-1, 1)

save.png

from sklearn.linear_model import LinearRegression
reg = LinearRegression().fit(xt,yt)
print('R-squared : ',reg.score(xt,yt))
print('coef : ',reg.coef_, 'intercept', reg.intercept_)

R-squared : 0.4794854506874714 coef : [[-0.62353254]] intercept [-24.27600549]

import statsmodels.api as sm
reg = sm.OLS(yt,sm.add_constant(xt,prepend=False)).fit()
reg.summary()
Dep. Variable: y R-squared: 0.479
Model: OLS Adj. R-squared: 0.479
Method: Least Squares F-statistic: 9210.
Date: Tue, 07 Jan 2020 Prob (F-statistic): 0.00
Time: 22:36:57 Log-Likelihood: -51058.
No. Observations: 10000 AIC: 1.021e+05
Df Residuals: 9998 BIC: 1.021e+05
Df Model: 1
Covariance Type: nonrobust
coef std err t P>abs(t) [0.025 0.975]
const -24.2760 0.930 -26.113 0.000 -26.098 -22.454
x1 -0.6235 0.006 -95.968 0.000 -0.636 -0.611

How to avoid

Include lag variables in the model

x_t, y_t, y_t_1 = xt[1:], yt[1:], yt[:-1]
X = np.column_stack((x_t, y_t_1))
reg = sm.OLS(y_t,sm.add_constant(X)).fit()
reg.summary()
Dep. Variable: y R-squared: 0.999
Model: OLS Adj. R-squared: 0.999
Method: Least Squares F-statistic: 3.712e+06
Date: Thu, 09 Jan 2020 Prob (F-statistic): 0.00
Time: 22:12:59 Log-Likelihood: -21261.
No. Observations: 9999 AIC: 4.253e+04
Df Residuals: 9996 BIC: 4.255e+04
Df Model: 2
Covariance Type: nonrobust
coef std err t P>abs(t) [0.025 0.975]
const -0.0815 0.049 -1.668 0.095 -0.177 0.014
x1 -0.0004 0.000 -0.876 0.381 -0.001 0.000
x2 0.9989 0.001 1964.916 0.000 0.998 1.000

Regression after taking the difference of the unit root process and making it a stationary process

x_t, y_t = np.diff(xt.flatten()).reshape(-1,1), np.diff(yt.flatten()).reshape(-1,1)
reg = sm.OLS(y_t,sm.add_constant(x_t)).fit()
reg.summary()
Dep. Variable: y R-squared: 0.000
Model: OLS Adj. R-squared: 0.000
Method: Least Squares F-statistic: 3.297
Date: Thu, 09 Jan 2020 Prob (F-statistic): 0.0694
Time: 22:33:26 Log-Likelihood: -21262.
No. Observations: 9999 AIC: 4.253e+04
Df Residuals: 9997 BIC: 4.254e+04
Df Model: 1
Covariance Type: nonrobust
coef std err t P>abs(t) [0.025 0.975]
const -0.0138 0.020 -0.681 0.496 -0.054
x1 -0.0374 0.021 -1.816 0.069 -0.078

3. Cointegration

Definition

Implication

Granger Representation theorem

Recommended Posts

Time series analysis # 6 Spurious regression and cointegration
Python: Time Series Analysis
RNN_LSTM1 Time series analysis
Time series analysis 1 Basics
Time series analysis related memo
Time series analysis Part 3 Forecast
Python: Time Series Analysis: Preprocessing Time Series Data
Time series analysis practice sales forecast
About time series data and overfitting
Time series analysis 3 Preprocessing of time series data
[Statistics] [Time series analysis] Plot the ARMA model and grasp the tendency.
Time series analysis 2 Stationary, ARMA / ARIMA model
I tried time series analysis! (AR model)
Time series analysis Part 2 AR / MA / ARMA
Time series analysis 4 Construction of SARIMA model
About time series data and overfitting
About _ and __
Reading OpenFOAM time series data and sets data
Python: Time Series Analysis: Building a SARIMA Model
[scikit-learn, matplotlib] Multiple regression analysis and 3D drawing
Python: Time Series Analysis: Stationarity, ARMA / ARIMA Model
Time Series Decomposition
Smoothing of time series and waveform data 3 methods (smoothing)
Regression analysis method
[Introduction to element decomposition] Let's arrange time series analysis methods in R and python ♬
Python 3.4 Create Windows7-64bit environment (for financial time series analysis)
"Measurement Time Series Analysis of Economic and Finance Data" Solving Chapter End Problems with Python
Python time series question
Basics of regression analysis
Regression analysis with NumPy
Display TOPIX time series
Time series plot / Matplotlib
Regression analysis in Python
[Introduction to Data Scientists] Descriptive Statistics and Simple Regression Analysis ♬
Challenge to future sales forecast: ② Time series analysis using PyFlux
A study method for beginners to learn time series analysis
Graph time series data in Python using pandas and matplotlib
Challenge to future sales forecast: ⑤ Time series analysis by Prophet
Challenges for future sales forecasts: (1) What is time series analysis?
Time series analysis using a general Gaussian state-space model using Python [Implementation example considering exogenous and seasonality]