[PYTHON] Predicting stock price changes using metal labeling and two-step machine learning

1 What about this article?

Chapter 3 of Advanced in Financial Machine Learning introduces a two-step machine learning method that introduces the Triple-Barrier Method and Metalabel, and the current stock price data is triple-Barrier threshold within the time limit. It introduces a method to predict whether the upper limit is exceeded, the Triple-Barrier threshold lower limit is exceeded, or the Triple-Barrier threshold upper and lower limits are met. Here is an example of applying this method to Bitcoin to improve the F1 score (Financial Machine Learning Part 1: Labels) ) Is written. Since the concept itself is still difficult, I wrote an article that explained by chewing (Financial Machine Learning Part 1: Labels). I would like.

Reference article </ b> ・ Improvement of performance metrix by 2-step learning modelFinancial Machine Learning Part 1: Labels

2 What do you want to do?

① Does the current stock price exceed the threshold upper limit? </ B> ② Is it below the threshold lower limit? </ B> ③ Will it move within the threshold upper and lower limits? I want to predict </ b>. It does not predict the future stock price itself, but (1) will it exceed the upper limit of the threshold, (2) will it fall below the lower limit of the threshold, and (3) will it remain within the upper and lower limits of the threshold? I want to predict which of the three cases will be.

119.JPG

3 How do you do it?

3-1 Use 2-step machine learning

1st machine learning of training data recording the movement of the price of Bitcoin in Triple-Barrier (the upper limit of Triple-Barrier threshold has been reached, the lower limit of threshold has been reached, and it has fallen within Triple-Barrier) starting from any time. Input to the model (logistic regression) and train the 1st machine learning model. Train the 2nd machine learning model using the output of the 1st machine learning model and the training data.

3-2 What are the benefits of 2-step machine learning?

Accuracy can be improved by using a two-step machine learning model.

128.JPG

4 What did you do?

December 4, 2018 → We were able to improve the accuracy of Bitcoin price data from December 9, 2018 by applying the "two-step machine learning model".

● December 4, 2018 → Bitcoin price on December 9, 2018 129.JPG

● Accuracy and F1-score could be improved by applying the "two-step machine learning model". (Prediction accuracy has improved.) </ B> In other words, by applying the "two-step machine learning model"

[TN] </ b> Predicted that the price will change within Triple-Barrier → Predicted that the price will actually change within Triple-Barrier [TP] </ b> Predicted price increase → Actual price increase [TP] </ b> Predicted that the price will go down → The price actually went down. I was able to improve the frequency of. 127.JPG

The training data and test data were input to the trained machine learning model, and the Confusion matrix was calculated. Since TP (True Positive) is a profit pattern, FP (False Positive) is a loss cut pattern, and TN (True Negative) and FN (False Negative) are patterns that do nothing, the number of TPs increases and the number of FPs increases. It can be said that it is preferable to reduce.

130.JPG

In fact, the 2nd model is compared to the 1st model ・ In the training data ... TP has improved and FP has decreased, so the probability of profitability has improved. Accuracy and F1-score are also improved. ・ In the test data ... TP decreased, FP decreased. Since Accuracy and F1-score are improved, the probability that the prediction by the machine learning model will be correct is increasing, but since TP is decreasing and FP is decreasing, which one can make a profit depends on the situation. (Should we evaluate that the probability of loss cut is decreasing)

131.JPG

5 Summary

--A two-step machine learning model was applied to predict future price movements of Bitcoin. --Accuracy and F1-score could be improved by applying the "two-step machine learning model". (Prediction accuracy has improved.) </ B> ――The "two-step machine learning model" is preferable to the "one-step machine learning model" because the profit probability is improved and the loss cut probability is reduced in the training data. --However, when applied to test data, the "two-step machine learning model" has a lower profit probability and loss cut probability than the "one-step machine learning model", so the "two-step machine learning model" and "one-step machine" Which of the "learning models" can be profitable depends on the situation. However, the probability of losing money is decreasing

6 Code release

The Jupyter code that implemented the above has been uploaded below.

https://github.com/fdfpy/studyresult/tree/master/3-5

6-1 Calculation of daily earnings standard deviation (preparation)

Calculate the daily earnings standard deviation according to the flow shown in the figure below. 120.JPG

6-2 Labeling method

Let volstd ($ t_ {i} $) be the standard deviation of daily earnings at time $ t_ {i} $. Also, let the stock price at time $ t_ {i} $ be $ c [t_ {i}] . Define a Triple-Barrier. (In the figure below, the upper threshold Vthu = volstd ( t_ {i} ), the lower threshold Vthd = -volstd ( t_ {i} $) ,, and the time limit is 15 minutes.) Calculate $ vol = c [t] / c [t_ {i}]-1 $. Label = 0, -1,1 is assigned according to the following.

\begin{eqnarray}
Label
=
\begin{cases}
1 & ( vol \geqq Vthu ) \\
0 & ( Vthd \lt b \lt Vthu ) \\
-1 & (vol \lt Vthd)
\end{cases}
\end{eqnarray}

121.JPG

6-3 Definition of allocation to Confusion matrix

Assign to the Confusion matrix according to the figure below.

123.JPG

6-4 1st model training and application

124.JPG

6-5 Training and application of 2nd model

125.JPG

Recommended Posts