Introduction.

Due to the influence of the new coronavirus, the Shanghai Composite Index is a big bargain sale with a 7% discount. If it goes down so far, the government will probably put in money, so I feel like it will return a little in the short term. (I was sorry if I removed it) My wife (Chinese), who loves bargain sales, recommends buying with all her might, so let's analyze it a little quantitatively. What I want to know is "whether it will be profitable from the reaction from the day after the crash to the next month."

Quantitative model framework

So, first, zscore the daily return and extract only the days that exceed -2σ.

t\in\{t~|~{\rm zscore}(r_t)\leq -2\},~{\rm zscore}(r_t)=\frac{r_t - \mu}{\sigma}

Then, find the relationship with the returns the day after that day, one week later, two weeks later, three weeks later, and one month later.

E[r_{t+d}]=f(r_t), ~d\in\{1, 5, 10, 15, 20\}

Model ① Autocorrelation

First, let's look at autocorrelation. Shanghai synthesis has only outliers like China, and I feel that the normal pearson product moment correlation (which is easily affected by outliers) is not good, so here is the rank correlation of spearman (because it is ranked, it is affected by outliers). I asked for it.

\rho=1-\frac{6\sum D^2}{N^3-N}

Where D is the difference between the corresponding X and Y ranks, and N is the number of value pairs (see Wiki for details).

Finding the rank correlation in Excel is a pain, but with pandas, the following code is a one-shot. It is wonderful!

rho_spearman = df.corr(method='spearman')

The result of actually calculating the correlation with pandas is as shown in the figure, and a negative autocorrelation is confirmed as a whole (= if the average is 0, it tends to repel).

Model ② Regression analysis

Next, let's use regression analysis to predict the return from the next day to the next month from today's crash (-7.72% at the close). As usual, in order to mitigate the effect of the regression coefficient due to outliers, first winsorize the return (pre & post) within the range of ± 2σ.

{\rm winsorize}(r_{t+d})={\rm min}({\rm max}(r_{t+d},~\mu-2\sigma),~\mu+2\sigma)

Then, a linear regression was performed with x as the return on the day of the crash and y as the return on the next d days, and the predicted value y for today's return x = -7.72% was obtained for each d.

{\rm winsorize}(r_{t+d})=\beta_d*{\rm winsorize}(r_{t})+\alpha_d+\epsilon_{t+d}

As a result of performing the above regression analysis with Linear Regression of scikit-learn.linear_models, here is the predicted return after d days (daily rate per day). As you can see from this figure, it seems that the life span is about 1 to 2 weeks even if it repels. As a caveat here, some people may feel that the predicted value is negative even though the autocorrelation was negative even after one month, but in the first place, the period that meets the conditions ( This phenomenon occurs because the average return one month after (the day when the return on the day fell below -2σ) was significantly negative. Correlation is considered with the mean set to 0, whereas in regression analysis the mean is also taken into account by the intercept term. If the average return is non-zero, it's often the case, so it's dangerous to try to understand just the correlation.

Furthermore, the figure below is a scatter plot of x: return on the day (only on the crash day) and y: cumulative return on the next n days (n = 1, 5, 10, 15, 20). If you use seaborn's sns.regplot (), it will plot the regression line and its prediction range on the scatter plot at once! Convenient!

python code

Historical data of Shanghai Composite Index was downloaded from yahoo finance. https://finance.yahoo.com/quote/000001.SS/history?p=000001.SS Using this, I formatted the data like this ↓ and read it into pandas.

import pandas as pd
import numpy as np
from sklearn.linear_model import LinearRegression
reg = LinearRegression()
import matplotlib.pyplot as plt
import seaborn as sns
sns.set()

df = pd.read_clipboard()
df['Date'] = pd.to_datetime(df['Date'])
df = df.set_index('Date').astype('float')

df_clip = df.apply(lambda x: np.clip(x, x.mean() - x.std() * 2, x.mean() + x.std() * 2), axis='index')

span_list = df_clip.columns[1:-1]
pred = pd.Series(index=span_list)
X_pred = -0.0772
for span in span_list:
    X = df_clip['ret'].values.reshape(-1, 1)
    Y = df_clip[span].values.reshape(-1, 1)
    reg.fit(X, Y)
    pred[span] = reg.predict(np.array(X_pred).reshape(-1, 1)).flatten() * int(span[:-1])
    plt.clf()
    sns.regplot(x=df_clip['ret'], y=df_clip[span])
    plt.title('x: ret(t), y:average_ret(t+1:t+' + span[:-1] + ')')
    plt.savefig('span + '.png')

plt.clf()
fig, [ax1, ax2] = plt.subplots(ncols=1, nrows=2)
df_clip.corr(method='spearman').iloc[0, 1:-1].plot(kind='bar', ax=ax1)
ax1.set_title('conditional autocorrelation when ret < -2σ')
pred.plot(kind='bar', ax=ax2)
ax2.set_title('conditional predicted return when ret < -2σ')
plt.tight_layout()
plt.savefig('pred_ret.png')

Predict the Shanghai Composite Index immediately after the crash with python

Introduction.

Quantitative model framework

Model ① Autocorrelation

Model ② Regression analysis

python code