[PYTHON] Dickey fuller test by stats models

This article has been withdrawn from what I posted a few days ago and has been improved again.

It is important to know whether the time series data to be analyzed follows a random walk from the following two points.

  1. Is the trend that occurred a fake trend, or is the trend happening?
  2. When the two time series show a correlation, are they a fake correlation or do they happen to occur? That is the point.

Many economic time series and stock price levels (original series) follow a random walk. A time series that follows a random walk is a non-stationary time series whose mean and variance differ from the previous ones over time. In a random walk, the mean changes stochastically over time, and the variance increases in proportion to the square root over time. There are also various non-stationary time series. One of them is trend stationary, in which the average value rises and falls steadily over time. To determine if it is a random walk, determine whether the time series has a unit root. There is a Dickey-Fuller test as a method. However, it is often pointed out that the detection power is low. So I'll look into it a little.

Before moving on to the DF test

First, a first-order autoregressive model (AR (1)) y_t=\rho y_{t-1}+u_t will do. If $ \ rho $ is 1, this AR (1) is said to have a unit root. $ y_t $ is a random walk. When $ \ Delta y_t = y_t-y_ {t-1} $ is a stationary process with a constant mean and variance over time, it is called a first-order sum. This unit root process y_1=y_0+e_1 y_2=y_1+e_2 y_3=y_2+e_3 ​... And rewrite t = 2 and t = 3 y_2=(y_0+e_1)+e2=x_0+e1+e2 y_3=(y_0+e_1+e_2)+e_3=y_0+e_1+e_2+e_3 It will be. To generalize y_t=y_0+\sum_{i=1}^te_i Is the sum of the initial value and the random number.

Since the time series diverges with $ \ rho> 1 $, set $ \ rho <1 $. Then the AR (1) model will do the same. y_1=ρ\cdot y_0+e_1 y_2=ρ\cdot y_1+e_2 y_3=ρ\cdot y_2+e_3 Rewriting $ y_2 $ and $ y_3 $ y_2=ρ\cdot (ρ\cdot y_0+e_1)+e2=ρ^2\cdot y_0+ρ\cdot e_1+e2 y_3=ρ\cdot (ρ^2\cdot y_0 +ρ\cdot e_1+e_2)+e_3 =ρ^3y_0+ρ^2\cdot (e_1)+ρ\cdot (e_2)+e_3 =ρ^3y_0+\sum_{i=1}^3ρ^{3-i}\cdot (e_i) It will be. To generalize y_t=ρ^ty_0+\sum_{i=1}^tρ^{t-i}\cdot (e_i) It will be.

Let's check up to this point using python.

import numpy as np
import matplotlib.pyplot as plt
def generate(y0,rho,sigma,nsample):
    e = np.random.normal(size=nsample)
    y=[]
    y.append(rho*y0+e[0])
    for i,u in enumerate(e[1:]):
        y.append(rho*y[i]+u)
    plt.plot(y)
generate(1,1,1,100)    

image.png

Next, let's set $ rho = 0.9 $.

generate(1,0.9,1,100)    

image.png You can see that it converges to a certain value. Next, let's set sigma = 0 for both.

generate(1,1,0,100)    

image.png

generate(1,0.9,0,100)    

image.png

It remains 1 in the random walk and converges to zero in AR (1). This is inconvenient, so rewrite the AR (1) model as follows: y_t=c+\rho y_{t-1}+u_t will do. Next, add this series. \sum_{t=2}^n y_t=\sum_{t=2}^n c+\sum_{t=2}^n\rho y_{t-1}+\sum_{t=2}^nu_t

Assuming that the average value of $ y_t $ is $ \ mu $, if n is large enough, then $ \ sum_ {t = 2} ^ nu_t = 0 $. It becomes $ \ mu = c + \ rho \ mu $. Therefore c=\mu(1-\rho) It will be. Let's use this to rewrite the function generate.

def generate(y0,rho,c,sigma,nsample):
    e = np.random.normal(size=nsample)
    y=[]
    y.append(rho*y0+e[0]*sigma+c)
    for i,u in enumerate(e[1:]):
        y.append(rho*y[i]+u*sigma+c)
    plt.plot(y)
generate(1,0.9,0.1,0,100)    

image.png

have become. The average value of AR (1) is now constant over time. Let's make sigma smaller and check it.

generate(1,0.9,0.1,0.01,100)  

image.png

Somehow, you can see that it is returning to the center. The AR (1) model becomes steady (weak steady) if certain conditions are met. This c is sometimes called the drift rate. The drift rate is the rate at which the average changes. But it doesn't change here. However, c is If it is $ c> \ mu (1- \ rho) $ or $ c <\ mu (1- \ rho) $, the rate will change.

In stochastic theory, stochastic drift is the change in the mean of a stochastic process. There is a drift rate with a similar concept, but this does not change stochastically. It is a non-stochastic parameter.

In long-term studies of long-term events, time-series characteristics are often conceptualized by breaking them down into trend, periodic, and stochastic components (stochastic drifts). The components of periodic and stochastic drift are identified by autocorrelation analysis and the difference in trends. Autocorrelation analysis helps identify the correct phase of the fitted model, and repeating the difference transforms the stochastic drift component into white noise.

Trend stationary

The trend stationary process $ y_t $ follows $ y_t = f (t) + u_t $. Where t is time, f is a deterministic function, and $ u_t $ is a stationary random variable with a long-term mean of zero. In this case, the stochastic term is stationary and therefore there is no stochastic drift. The deterministic component f (t) does not have an average value that remains constant over the long term, but the time series itself can drift. This deterministic drift can be calculated by regressing $ y_t $ at t and removing it from the data if a steady residual is obtained. The simplest trend stationary process is $ y_t = d \ cdot t + u_t $. $ d $ is a constant.

Steady state

The unit root (stationary) process follows $ y_t = y_ {t-1} + c + u_t $. $ u_t $ is a stationary random variable with a mean value that is zero in the long run. Where c is a non-stochastic drift parameter. Even without the random shock $ u_t $, the mean of y changes by c over time. This non-stationarity can be removed from the data by taking a first-order difference. The graded variable $ z_t = y_t-y_ {t-1} = c + u_t $ has a long-term constant mean of c, so there is no drift in this mean. Next, consider c = 0. This unit root process is $ y_t = y_ {t-1} + c + u_t $. Although $ u_t $ is a stationary process, the presence of its stochastic shock $ u_t $ causes drift, especially stochastic drift, in $ y_t $. Once a non-zero value u occurs, it is included in y for the same period. This will be one period behind y after one period and will affect the y value for the next period. y affects new y values one after another. Therefore, after the first shock affects y, its value is permanently incorporated into the average of y, resulting in a stochastic drift. So this drift can also be eliminated by first differentiating y to get the non-drifting z. This stochastic drift also occurs in $ c \ ne0 $. So it also happens with $ y_t = y_ {t-1} + c + u_t $. I explained that c is a non-stochastic drift parameter. Why isn't it a definite drift? In fact, the drift that results from $ c + u_t $ is part of the stochastic process.

numpy.random.normal From numpy reference

numpy.random.normal(loc=0.0, scale=1.0, size=None)

loc float or array_like of floats: Mean of distribution

scalefloat or array_like of floats: standard deviation of distribution

is. Now consider the meaning of loc. The mean of the samples generated from this distribution closely follows $ N (μ, σ ^ 2 / n) $. Let's see how it works. Dare to set loc = 1, scale = 1.0. I calculated the sample mean and standard deviation with n = 2. However, there are two calculation methods. One is Find the mean and standard deviation of $ z_t $. The other is Subtract 1 from each observation $ z_t $ to find the mean and standard deviation of the sample. Then add 1 to the sample mean.

import numpy as np
import matplotlib.pyplot as plt

s = np.random.normal(1, 1, 2)
print(np.mean(s),np.std(s))
o=np.ones(len(s))
Print(np.mean(s-o)+1,np.std(s-o))

# 0.9255104221128653 0.5256849930003175
# 0.9255104221128653 0.5256849930003175 #1 was subtracted from the sample and 1 was added to the mean.

The result is the same. It can be seen that np.random.normal (1,1, n) generates random numbers with 1 / n + np.random.normal (0,1, n). But this is a story in the computer. In reality, it should be impossible to tell which one is creating the stochastic drift. So let's generate data 1000000 times for n = 2 and n = 100 and see what the result looks like.

Arithmetic mean distribution of sample of step difference stationary process

The stepped variable $ z_t = y_t-y_ {t-1} = c + u_t $ has a long-term constant mean of c. The sample mean closely follows $ N (μ, σ ^ 2 / n) $ if n is large. Therefore, the mean value is not constant. However, in this case there is no drift in the mean. It has an average value that is constant over the long term.

m1=[]
trial=1000000
for i in range(trial):
    s = np.random.normal(1, 1, 2)
    m1.append(np.mean(s))
m2=[]
for i in range(trial):
    s = np.random.normal(1, 1, 100)
    m2.append(np.mean(s))
bins=np.linspace(np.min(s),np.max(s),1000)
plt.hist(m1,bins=bins,label='n=2')
plt.hist(m2,bins=bins,label='n=100',alpha=0.5)
plt.legend()
plt.show()

image.png

Both the sample mean and the standard deviation form a distribution and are not unique. N needs to be much larger to be unique. Therefore, the characteristics of the time series are decomposed into trend components, periodic components, and stochastic components (stochastic drift) to conceptualize them, but care must be taken when dealing with stochastic drift and drift rate.

Stochastic drift occurs in the unit root process. Even if there is a definite drift rate, it is indistinguishable.

Distribution of y with unit root on the last day

In the unit root process, the standard deviation increases as the number of samples increases. The mean value of y with a sample size of n is zero at c = 0, but its distribution expands as the standard deviation $ \ sqrt {n \ cdot \ sigma ^ 2} $ increases.

m1=[]
trial=1000000
for i in range(trial):
    s = np.cumsum(np.random.normal(0, 1, 2))
    m1.append(s[-1])
m2=[]
for i in range(trial):
    s = np.cumsum(np.random.normal(0, 1, 100))
    m2.append(s[-1])
bins=np.linspace(np.min(m2),np.max(m2),1000)
plt.hist(m1,bins=bins,label='n=2')
plt.hist(m2,bins=bins,label='n=100',alpha=0.5)
plt.legend()
plt.show()

image.png

The distribution is different from the previous one. The larger the sample size, the wider the distribution. Let's plot one s series as a trial.

plt.plot(s)

image.png

In this case, the level of each specimen changes stepwise. This change is due to a stochastic shock $ u_t $. What it was added to is plotted. A series of stochastic drifts is plotted. This can form a big trend. It corresponds to the area at both ends in the distribution map of n = 100. Which range depends on the trend definition. The trend is a stochastic trend. This is clearly distinguished from the deterministic trend created by trend stationary.

First-order autoregressive process

The first-order autoregressive process can be rewritten as follows. x_1=ρ\cdot x_0+y_1+e_1 x_2=ρ\cdot x_1+y_2+e_2 x_3=ρ\cdot x_2+y_3+e_3

x_2=ρ\cdot (ρ\cdot x_0+y_1+e_1)+e2=ρ^2\cdot x_0+ρ\cdot (y_1+e1)+y_2+e2 x_3=ρ\cdot (ρ\cdot ρ\cdot x_0+ρ\cdot (y_1+e_1)+y_2+e_2)+y_3+e_3 =ρ^3x_0+ρ^2\cdot (y_1+e_1)+ρ\cdot (y_2+e_2)+y_3+e_3 =ρ^3x_0+\sum_{i=1}^n\rho^{n-i}(y_i+e_i) Where $ y_i = c + d \ cdot i $

When $ \ rho = 1 $, it will be a random walk process.

#Initialization
import numpy as np
import statsmodels.api as sm
import matplotlib.pyplot as plt
from scipy.stats import norm
from statsmodels.tsa.stattools import adfuller
#np.random.seed(0)
# AR(1)Process generation
#x0 is the initial price
#nsample is the number of periods in one trial
# rho=1: random walk
## c=c1-rho is drift, c1=When it is 1, there is no drift Random walk d is a definite trend
# rho<1 to 1st order autoregressive model
##The effects of x0, c and d disappear over time
def ar1(rho,x0,c1,d,sigma,nsample):
    e = np.random.normal(size=nsample)#random variable
    rhoo=np.array([rho ** num for num in range(nsample)]) #effects of rho
    rhoo_0=np.array([rho**(num+1) for num in range(nsample)]) #effects of rho
    t=np.arange(nsample) #deterministic trend
    one=np.ones(nsample) #effect of x0
    x00=x0*one*rhoo_0 #effect of rho to x0
    c=c1-rho # constant of autoregressive mdoel
    c0=np.cumsum(c*one*rhoo) #effect of rho to c
    d0=(d*t*rhoo) # effect of rho to determinist trend
    y=np.cumsum(rhoo*e*sigma)+x00+c0+d0
    return y

Distribution of y on the last day of the AR (1) process

m1=[]
trial=1000000
nsample=250
x0=1
for i in range(trial):
    s = np.cumsum(np.random.normal(0, 1, nsample))
    m1.append(s[-1]+x0)
def generate(rho,x0,c1,sigma,nsample,trial):
    m=[]
    for i in range(trial):
        e = np.random.normal(0, 1, nsample)
        rhoo=np.array([rho ** num for num in range(nsample)]) #effects of rho
        rhoo_0=np.array([rho**(num+1) for num in range(nsample)]) #effects of rho
        one=np.ones(nsample) #effect of x0
        x00=x0*one*rhoo_0 #effect of rho to x0
        c=c1-rho # constant of autoregressive mdoel
        c0=np.cumsum(c*one*rhoo) #effect of rho to c
        s=np.cumsum(rhoo*e*sigma)+x00+c0
        m.append(s[-1])
    return m
c1=1
x0=1
rho=0.99
sigma=1
m2=generate(rho,x0,c1,sigma,nsample,trial)
rho=0.9
m3=generate(rho,x0,c1,sigma,nsample,trial)
rho=0.5
m4=generate(rho,x0,c1,sigma,nsample,trial)
bins=np.linspace(np.min(m1),np.max(m1),1000)
plt.hist(m1,bins=bins,label='rw')
plt.hist(m2,bins=bins,label='ar1 rho=0.99',alpha=0.5)
plt.hist(m3,bins=bins,label='ar1 rho=0.9',alpha=0.5)
plt.hist(m4,bins=bins,label='ar1 rho=0.5',alpha=0.5)
plt.legend()
plt.show()

image.png In the first-order autoregressive model, you can see that as $ \ rho $ becomes less than 1, it wanders around the initial value. As $ \ rho $ approaches 1, it becomes difficult to distinguish it from a random walk. The following figure is for n = 7. In about 7 days, the spread of any $ \ rho $ will be similar.

image.png

Detection power (Introduction to statistical hypothesis testing by stats models)

The hypothesis test makes a null hypothesis and an alternative hypothesis. The null hypothesis consists of parameters. Since the population parameter is generally unknown, an estimate may be used. Specimens are treated as if they were extracted from this population. In the following figure, the distribution of the population parameter is shown by the blue dotted line. The sample gives the sample distribution of that statistic. It is indicated by an orange line. Compare the two. If the two distributions are close, it is difficult to judge that the two distributions are different, and if they are far apart, it can be judged that they are different. image.png This judgment can be made using the significance level. The significance level $ \ alpha $ is the rightmost area of the population. 10%, 5%, etc. are used. The area to the right of the blue vertical line in the blue distribution. The blue vertical line shows the boundary where the null hypothesis is rejected. The area inside this line is called the acceptance area, and the area outside this line is called the rejection area.

Once the sample is available, statistics such as mean and variance can be calculated. The p value is calculated from the distribution of the population parameter when the null hypothesis is rejected with the statistic as the lower limit. Next, in the figure, it is the area to the right of the red vertical line. If this value is smaller than the significance level that is the criterion, it is judged that the sample (statistic) was obtained from a distribution different from the population (parameter). In the figure, the red vertical line is to the left of the blue vertical line, but if the mean of the orange sampling distribution is to the right of the blue vertical line, it is clear that its p-value is less than the significance level. image.png The ability to correctly reject a false null hypothesis is called detection. The detection power $ \ beta $ corresponds to the area to the right of the orange distribution of the blue vertical lines. If the $ \ beta $ part is large, the probability that the alternative hypothesis is correct increases when the null hypothesis is rejected. image.png In order to increase this $ \ beta $, it is possible that the two distributions are far apart, or the positions of the centers of the distributions are significantly different. The other is the large size of the specimen. Then the shape of the distribution will be narrowed, and if the size of the sample is large enough to regard the distribution as a line, then the two straight lines will be compared. image.png

Dickey-fuller test

A simple AR (1) process $ y_t= \rho y_{t-1}+u_t$ And subtract $ y_ {t-1} $ from both sides y_t-y_{t-1}=(\rho-1)y_{t-1}+u_t will do. As $ \ delta = \ rho-1 $, the null hypothesis is that the AR model has unit roots. H_0:\delta=0 H_1: \delta<0 It will be.

There are three types.

  1. Random walk without drift \Delta y_t=\delta y_{t-1}+u_t
  2. Random walk with drift \Delta y_t=c+\delta y_{t-1}+u_t
  3. Random walk with drift + time trend \Delta y_t=c+c\cdot t+\delta y_{t-1}+u_t These mean that the difference has a drift or a time trend because the left side is a first-order difference.
# Dicky-Fuller test
def adfcheck(a0,mu,sigma,nsample,trial):
    prw=0
    prwc=0
    prwct=0
    for i in range(trial):
        y=ar1(a0,1,mu,sigma,nsample)
        rw=sm.tsa.adfuller(y,maxlag=0,regression='nc')[1]
        rwc=sm.tsa.adfuller(y,maxlag=0,regression='c')[1]
        rwct=sm.tsa.adfuller(y,maxlag=0,regression='ct')[1]
        if rw>0.05:
            prw+=1
        if rwc>0.05:
            prwc+=1
        if rwct>0.05:
            prwct+=1
    print('nsample',nsample,prw/trial,prwc/trial,prwct/trial)

Landau Walk Test

Random walk without drift

rho=1
c1=1
c1=(c1-1)/250+1
d=0
d=d/250
sigma=0.4/np.sqrt(250)
trial=10000    
for i in [7,20,250]:
    adfcheck(rho,x0,c1,d,sigma,i,trial)

# nsample 7 0.9448 0.8878 0.8528
# nsample 20 0.9635 0.9309 0.921
# nsample 250 0.9562 0.9509 0.9485

Random walk with drift

rho=1
c1=1.1
c1=(c1-1)/250+1
d=0
d=d/250
sigma=0.4/np.sqrt(250)
trial=10000    
for i in [7,20,250]:
    adfcheck(rho,x0,c1,d,sigma,i,trial)

# nsample 7 0.9489 0.8889 0.8524
# nsample 20 0.9649 0.9312 0.921
# nsample 250 0.9727 0.9447 0.9454

Random walk + drift + deterministic trend

rho=1
c1=1.1
c1=(c1-1)/250+1
d=0.9
d=d/250
sigma=0.4/np.sqrt(250)
trial=10000    
for i in [7,20,250]:
    adfcheck(rho,x0,c1,d,sigma,i,trial)

# nsample 7 0.9971 0.948 0.8581
# nsample 20 1.0 1.0 0.983
# nsample 250 1.0 1.0 0.9998

If the random walk should be rejected

AR (1) model without drift

rho=0.99
c1=rho
#c1=(c1-1)/250+1
d=0
d=d/250
sigma=0.4/np.sqrt(250)
trial=10000    
for i in [7,20,250]:
    adfcheck(rho,x0,c1,d,sigma,i,trial)

# nsample 7 0.7797 0.9012 0.8573
# nsample 20 0.6188 0.9473 0.9225
# nsample 250 0.0144 0.7876 0.944

AR (1) model with drift

rho=0.90
c1=1.1
c1=(c1-1)/250+1
d=0
d=d/250
sigma=0.4/np.sqrt(250)
trial=10000    
for i in [7,20,250]:
    adfcheck(rho,x0,c1,d,sigma,i,trial)

# nsample 7 0.9715 0.8868 0.8539
# nsample 20 0.9989 0.9123 0.912
# nsample 250 1.0 0.0306 0.1622

AR (1) model + drift + deterministic trend

rho=0.90
c1=1.1
c1=(c1-1)/250+1
d=1
d=d/250
sigma=0.4/np.sqrt(250)
trial=10000    
for i in [7,20,250]:
    adfcheck(rho,x0,c1,d,sigma,i,trial)

# nsample 7 0.999 0.9404 0.8576
# nsample 20 1.0 1.0 0.918
# nsample 250 1.0 1.0 0.0044

We can see that we should use a model with the correct specifications, that the sample size should be large, and that ρ should be far from 1 when rejecting the null hypothesis.

About the influence of noise

Let's examine how the Cauchy distribution is used to generate fat tail noise and use it to generate a time series. First, let's look at the distribution of the Cauchy distribution.

from scipy.stats import cauchy
trial=1000000
m1=[]
for i in range(trial):
    s = np.random.normal(0, 1, 250)
    m1.append(np.mean(s))
m2=[]
for i in range(trial):
    s=[]
    i=1
    while i<=250:
        s0 = (cauchy.rvs(0,1,1))
        if abs(s0)<10:
            s.append(s0)
            i+=1
    m2.append(np.mean(s))
bins=np.linspace(np.min(m2),np.max(m2),1000)
plt.hist(m1,bins=bins,label='normal')
plt.hist(m2,bins=bins,label='abs(cauchy) <10',alpha=0.5)
plt.legend()
plt.show()

image.png

You can see that the fat tail is strong. Think of this as noise.

Next, form the sum of random numbers from the Cauchy distribution, and look at the distribution of the last value.

from scipy.stats import cauchy
m1=[]
trial=1000000
for i in range(trial):
    s = np.cumsum(np.random.normal(0, 1, 250))
    m1.append(s[-1])
m2=[]
for i in range(trial):
    s=[]
    i=1
    while i<=250:
        s0 = (cauchy.rvs(0,1,1))
        if abs(s0)<10:
            s.append(s0)
            i+=1
    m2.append(np.cumsum(s)[-1])
bins=np.linspace(np.min(m2),np.max(m2),1000)
plt.hist(m1,bins=bins,label='normal')
plt.hist(m2,bins=bins,label='cauchy',alpha=0.5)
plt.legend()
plt.show()

image.png After all, you can see that the spread is larger than the normal distribution.

Next, let's create an AR1 function that uses the Cauchy distribution.

def ar1_c(rho,x0,c1,d,sigma,nsample):
    e = cauchy.rvs(loc=0,scale=1,size=nsample)    
    c=c1-rho # constant of autoregressive mdoel
    y=[]
    y.append(e[0]+x0*rho+c+d)
    for i,ee in enumerate(e[1:]):
        y.append(rho*y[i]+ee+c+d*(i+1))
    return y

DF test is performed using the created first-order autoregressive time series.

def adfcheck_c(rho,x0,c1,d,sigma,nsample,trial):
    prw=0
    prwc=0
    prwct=0
    for i in range(trial):
        y=ar1_c(rho,x0,c1,d,sigma,nsample)
        rw=sm.tsa.adfuller(y,maxlag=0,regression='nc')[1]
        rwc=sm.tsa.adfuller(y,maxlag=0,regression='c')[1]
        rwct=sm.tsa.adfuller(y,maxlag=0,regression='ct')[1]
        if rw>0.05:
            prw+=1
        if rwc>0.05:
            prwc+=1
        if rwct>0.05:
            prwct+=1
    print('nsample',nsample,prw/trial,prwc/trial,prwct/trial)
rho=0.90
c1=1.1
c1=(c1-1)/250+1
d=1
d=d/250
sigma=0.4/np.sqrt(250)
trial=10000    
for i in [7,20,250]:
    adfcheck_c(rho,x0,c1,d,sigma,i,trial)

# nsample 7 0.973 0.8775 0.8377
# nsample 20 0.9873 0.9588 0.9143
# nsample 250 0.8816 0.8425 0.0897

Due to the influence of the fat tail, in this case it is a time trend with drift, but when the sample size is 250, the null hypothesis cannot be rejected at the 5% significance level. With rho = 0.85, the result is as follows.

nsample 7 0.9717 0.8801 0.8499 nsample 20 0.987 0.9507 0.903 nsample 250 0.8203 0.7057 0.0079

When the sample size is 250, the null hypothesis can be rejected at the 5% significance level.

Next, let's cut off the fat tail part of the Cauchy distribution.

def cau(cut,nsample):
    s=[]
    i=1
    while i<=nsample:
        s0 = cauchy.rvs(0,1,1)
        if abs(s0)<cut:
            s.append(s0[0])
            i+=1
    return s

def ar1_c2(rho,x0,c1,d,sigma,nsample):
    e = cau(10,nsample)
    c=c1-rho # constant of autoregressive mdoel  
    y=[]
    y.append(e[0]*sigma+x0*rho+c+d)
    for i,ee in enumerate(e[1:]):
        y.append(rho*y[i]+ee*sigma+c+d*(i+1))
    return y

def adfcheck_c2(rho,x0,c1,d,sigma,nsample,trial):
    prw=0
    prwc=0
    prwct=0
    for i in range(trial):
        y=ar1_c2(rho,x0,c1,d,sigma,nsample)
        rw=sm.tsa.adfuller(y,maxlag=0,regression='nc')[1]
        rwc=sm.tsa.adfuller(y,maxlag=0,regression='c')[1]
        rwct=sm.tsa.adfuller(y,maxlag=0,regression='ct')[1]
        if rw>0.05:
            prw+=1
        if rwc>0.05:
            prwc+=1
        if rwct>0.05:
            prwct+=1
    print('nsample',nsample,prw/trial,prwc/trial,prwct/trial)
rho=0.90
c1=1.1
c1=(c1-1)/250+1
d=1
d=d/250
sigma=0.4/np.sqrt(250)
trial=10000    
for i in [7,20,250]:
    adfcheck_c2(rho,x0,c1,d,sigma,i,trial)

# nsample 7 0.9969 0.9005 0.8462
# nsample 20 1.0 0.9912 0.9182
# nsample 250 1.0 1.0 0.0931

The result was different from the previous one. Only time trends with drift can reject the null hypothesis, but the significance level is 10%. When rho = 0.85

nsample 7 0.9967 0.9012 0.8466 nsample 20 1.0 0.9833 0.9121 nsample 250 1.0 1.0 0.0028

The result is.

Next, try the Augmented Dickey Fuller test.

\Delta y_t=c+c\cdot t+\delta y_{t-1}+\delta_1\Delta y_{t-1}+,\cdots,+\delta_{p-1}\Delta y_{t-p+1}+u_t By increasing the first-order difference of the delay, structural effects such as autocorrelation are removed.

def adfcheck_c3(rho,x0,c1,d,sigma,nsample,trial):
    prw=0
    prwc=0
    prwct=0
    for i in range(trial):
        y=ar1_c2(rho,x0,c1,d,sigma,nsample)
        rw=sm.tsa.adfuller(y,regression='nc')[1]
        rwc=sm.tsa.adfuller(y,regression='c')[1]
        rwct=sm.tsa.adfuller(y,regression='ct')[1]
        if rw>0.05:
            prw+=1
        if rwc>0.05:
            prwc+=1
        if rwct>0.05:
            prwct+=1
    print('nsample',nsample,prw/trial,prwc/trial,prwct/trial)
rho=0.90
c1=1.1
c1=(c1-1)/250+1
d=1
d=d/250
sigma=0.4/np.sqrt(250)
trial=10000    
for i in [7,20,250]:
    adfcheck_c3(rho,x0,c1,d,sigma,i,trial)

# nsample 7 0.94 0.8187 0.8469
# nsample 20 1.0 0.873 0.7969
# nsample 250 1.0 1.0 0.1253

It cannot be judged correctly with rho = 0.90. If rho = 0.85, the result will be

nsample 7 0.9321 0.8109 0.8429 nsample 20 1.0 0.8512 0.7942 nsample 250 1.0 1.0 0.0252

The drift + time trend can be rejected.

Drift and time trend detection when standard deviation is zero or small

#drift only, standard deviation zero
rho=0
c1=1.1
c1=(c1-1)/250+1
d=0
d=d/250
sigma=0#.4/np.sqrt(250)
trial=1#0000    
for i in [7,20,250]:
    adfcheck(rho,x0,c1,d,sigma,i,trial)    

nsample 7 0.0 0.0 0.0 nsample 20 0.0 0.0 0.0 nsample 250 0.0 0.0 0.0

The null hypothesis is rejected at all when there is only drift.

#Drift and time trend, standard deviation is zero
rho=0
c1=1.1
c1=(c1-1)/250+1
d=1
d=d/250
sigma=0#.4/np.sqrt(250)
trial=1#0000    
for i in [7,20,250]:
    adfcheck(rho,x0,c1,d,sigma,i,trial)    

nsample 7 1.0 1.0 0.0 nsample 20 1.0 1.0 0.0 nsample 250 1.0 1.0 0.0 AR (1) + drift + time trend rejects null hypothesis regardless of sample size

Next, add some standard deviation to the drift.

# drift+ very small sigma
rho=0
c1=1.1
c1=(c1-1)/250+1
d=0
d=d/250
sigma=0.004#.4/np.sqrt(250)
trial=10000    
for i in [7,20,250]:
    adfcheck(rho,x0,c1,d,sigma,i,trial)    

nsample 7 0.9986 0.6379 0.7223 nsample 20 1.0 0.0396 0.1234 nsample 250 1.0 0.0 0.0

Next, add some standard deviation to the drift + time trend.

# drift+ very small sigma
rho=0
c1=1.1
c1=(c1-1)/250+1
d=1
d=d/250
sigma=0.004#.4/np.sqrt(250)
trial=10000    
for i in [7,20,250]:
    adfcheck(rho,x0,c1,d,sigma,i,trial)    

nsample 7 1.0 0.9947 0.6877 nsample 20 1.0 1.0 0.1095 nsample 250 1.0 1.0 0.0

reference

Campbell, J. Y.; Perron, P. (1991). "Pitfalls and Opportunities: What Macroeconomists Should Know about Unit Roots"

Stock J. Unit Roots, Structural Breaks, and Trends. In: Engle R, McFadden D Handbook of Econometrics. Amsterdam: Elsevier ; 1994. pp. 2740-2843.

MacKinnon, J.G. 2010. “Critical Values for Cointegration Tests.” Queen’s University, Dept of Economics, Working Papers.

statsmodels.tsa.stattools.adfuller

Unit root

[Dickey-Fuller Test](https://ja.wikipedia.org/wiki/%E3%83%87%E3%82%A3%E3%83%83%E3%82%AD%E3%83%BC% E2% 80% 93% E3% 83% 95% E3% 83% A9% E3% 83% BC% E6% A4% 9C% E5% AE% 9A)

Stochastic drift

http://web.econ.ku.dk/metrics/Econometrics2_05_II/Slides/08_unitroottests_2pp.pdf

Recommended Posts

Dickey fuller test by stats models
Primality test by Python