Predict the Shanghai Composite Index immediately after the crash with python

Introduction.

Due to the influence of the new coronavirus, the Shanghai Composite Index is a big bargain sale with a 7% discount. image.png If it goes down so far, the government will probably put in money, so I feel like it will return a little in the short term. (I was sorry if I removed it) My wife (Chinese), who loves bargain sales, recommends buying with all her might, so let's analyze it a little quantitatively. What I want to know is "whether it will be profitable from the reaction from the day after the crash to the next month."

Quantitative model framework

So, first, zscore the daily return and extract only the days that exceed -2σ.

t\in\{t~|~{\rm zscore}(r_t)\leq -2\},~{\rm zscore}(r_t)=\frac{r_t - \mu}{\sigma}

Then, find the relationship with the returns the day after that day, one week later, two weeks later, three weeks later, and one month later.

E[r_{t+d}]=f(r_t), ~d\in\{1, 5, 10, 15, 20\}

Model ① Autocorrelation

First, let's look at autocorrelation. Shanghai synthesis has only outliers like China, and I feel that the normal pearson product moment correlation (which is easily affected by outliers) is not good, so here is the rank correlation of spearman (because it is ranked, it is affected by outliers). I asked for it.

\rho=1-\frac{6\sum D^2}{N^3-N}

Where D is the difference between the corresponding X and Y ranks, and N is the number of value pairs (see Wiki for details).

Finding the rank correlation in Excel is a pain, but with pandas, the following code is a one-shot. It is wonderful!

rho_spearman = df.corr(method='spearman')

The result of actually calculating the correlation with pandas is as shown in the figure, and a negative autocorrelation is confirmed as a whole (= if the average is 0, it tends to repel). image.png

Model ② Regression analysis

Next, let's use regression analysis to predict the return from the next day to the next month from today's crash (-7.72% at the close). As usual, in order to mitigate the effect of the regression coefficient due to outliers, first winsorize the return (pre & post) within the range of ± 2σ.

{\rm winsorize}(r_{t+d})={\rm min}({\rm max}(r_{t+d},~\mu-2\sigma),~\mu+2\sigma)

Then, a linear regression was performed with x as the return on the day of the crash and y as the return on the next d days, and the predicted value y for today's return x = -7.72% was obtained for each d.

{\rm winsorize}(r_{t+d})=\beta_d*{\rm winsorize}(r_{t})+\alpha_d+\epsilon_{t+d}

As a result of performing the above regression analysis with Linear Regression of scikit-learn.linear_models, here is the predicted return after d days (daily rate per day). image.png As you can see from this figure, it seems that the life span is about 1 to 2 weeks even if it repels. As a caveat here, some people may feel that the predicted value is negative even though the autocorrelation was negative even after one month, but in the first place, the period that meets the conditions ( This phenomenon occurs because the average return one month after (the day when the return on the day fell below -2σ) was significantly negative. Correlation is considered with the mean set to 0, whereas in regression analysis the mean is also taken into account by the intercept term. If the average return is non-zero, it's often the case, so it's dangerous to try to understand just the correlation.

Furthermore, the figure below is a scatter plot of x: return on the day (only on the crash day) and y: cumulative return on the next n days (n = 1, 5, 10, 15, 20). If you use seaborn's sns.regplot (), it will plot the regression line and its prediction range on the scatter plot at once! Convenient! image.png

python code

import pandas as pd
import numpy as np
from sklearn.linear_model import LinearRegression
reg = LinearRegression()
import matplotlib.pyplot as plt
import seaborn as sns
sns.set()

df = pd.read_clipboard()
df['Date'] = pd.to_datetime(df['Date'])
df = df.set_index('Date').astype('float')

df_clip = df.apply(lambda x: np.clip(x, x.mean() - x.std() * 2, x.mean() + x.std() * 2), axis='index')

span_list = df_clip.columns[1:-1]
pred = pd.Series(index=span_list)
X_pred = -0.0772
for span in span_list:
    X = df_clip['ret'].values.reshape(-1, 1)
    Y = df_clip[span].values.reshape(-1, 1)
    reg.fit(X, Y)
    pred[span] = reg.predict(np.array(X_pred).reshape(-1, 1)).flatten() * int(span[:-1])
    plt.clf()
    sns.regplot(x=df_clip['ret'], y=df_clip[span])
    plt.title('x: ret(t), y:average_ret(t+1:t+' + span[:-1] + ')')
    plt.savefig('span + '.png')

plt.clf()
fig, [ax1, ax2] = plt.subplots(ncols=1, nrows=2)
df_clip.corr(method='spearman').iloc[0, 1:-1].plot(kind='bar', ax=ax1)
ax1.set_title('conditional autocorrelation when ret < -2σ')
pred.plot(kind='bar', ax=ax2)
ax2.set_title('conditional predicted return when ret < -2σ')
plt.tight_layout()
plt.savefig('pred_ret.png')

Recommended Posts

Predict the Shanghai Composite Index immediately after the crash with python
Call the API with python3.
What should I do with the Python directory structure after all?
Extract the xz file with python
Get the weather with Python requests 2
Find the Levenshtein Distance with python
Hit the Etherpad-lite API with Python
Install the Python plugin with Netbeans 8.0.2
I liked the tweet with python. ..
Master the type with Python [Python 3.9 compatible]
Make the Python console covered with UNKO
[Python] Set the graph range with matplotlib
Behind the flyer: Using Docker with Python
Check the existence of the file with python
[Python] Get the variable name with str
[Python] Round up with just the operator
Display Python 3 in the browser with MAMP
Search the maze with the python A * algorithm
Let's read the RINEX file with Python ①
Working with OpenStack using the Python SDK
Download files on the web with Python
Learn the design pattern "Composite" in Python
Learn the design pattern "Singleton" with Python
[Python] Automatically operate the browser with Selenium
Learn the design pattern "Facade" with Python
The road to compiling to Python 3 with Thrift
[Python] I tried the same calculation as LSTM predict with from scratch [Keras]
[Introduction to Python] How to get the index of data with a for statement