[PYTHON] With deep learning, you can exceed 100% recovery rate in horse racing

4370674310_b118d0f62a_c.jpg pohotos by Ronnie Macdonald

It's been a while since I started being told that ** AI robs humans of work, but now it's also said that I've entered a period of disillusionment **. Thanks to that, I was swayed by a crowded train every day without being robbed of my work. The scam that robs you is also good.

While it seems that the development of such AI will take a little longer, it has become easier to obtain an environment where students can learn. If you touch it, it feels like everyone is about to be disillusioned. With that said, I decided to touch on deep learning in order to know the power of ** AI. ** **

I tried various things, but here I will mainly tell you that the result is ** "Even if you study from an ignorant state, you can enjoy this much with deep learning" **. The program is not a model, so I will publish it for a fee only to those who want to see it.

Take a look at deep learning with Kaggle

First, I learned a little about the mathematics that is the basis of deep learning. No ... it's quite heartbreaking. The formula next to the formula. You can write the code without knowing it, but ** If you know the mathematical meaning, you will understand it better, so I think it would be worthwhile to learn just the outline **.

So, after studying lightly, I decided to try it with the famous machine learning theme ** "Titanic Survivor Prediction" ** in order to confirm the power of deep learning. A guy who predicts who survived based on attributes such as the age and gender of each passenger.

The environment is Google Colaboratory, and TensorFlow, which seems to be the easiest to implement, is used. It was easy to make just by referring to the Google tutorial. When you upload the prediction result, the score will be returned.

titanic.jpg

** Correct answer rate 76.5% **. Nearly 80% of the models are correct for beginners. Adjusting the parameters and data will make it even higher.

Originally, I thought about various things such as "Women and children were given priority in helping me?" And analyzed it by trial and error, but ** the whole thing was done by deep learning **. Certainly deep learning, it's pretty amazing.

Great but not interesting

While I wanted to touch more, I was making this prediction and I felt a big problem.

Uninteresting…. Deep learning isn't, but ** the theme isn't interesting. Predicting the life and death of passengers on overseas ships that sank more than 100 years ago is not interesting at all! ** **

** "James ... I thought you were dead ... Did you live!" ** or ** "Reina ... Why did you die !!" ** Isn't it? No one knows. All are dead. Kaggle doesn't tell me the correct answer in the first place.

I'd like a more exciting theme ... So, for this theme related to money that I've always wanted to try.

Try to exceed 100% recovery rate in horse racing using deep learning

Horse racing seems to have a deduction rate of about 20%, so the average return starts at 80%. It should be relatively difficult to exceed 100% recovery rate. But there must be a lot of past data, and deep learning will do it for you? I will try it with the expectation.

The goal is ** "A double-winning betting ticket with a recovery rate of over 100%" **.

Due to the characteristics of horse racing, it may be more efficient to aim for a betting ticket with a larger payout than a double win, but it seems that it will not be interesting unless it is a betting ticket that is easy to win, so I narrowed it down to double wins.

Target data

For learning: 2010-2017 For verification: 2018-2019 (until early November)

We aim to exceed 100% with verification data. First of all, I scraped on the net and prepared such data.

Classification item
Horse information Horse number
Frame number
age
sex
Weight (current)
Weight (difference from previous run)
Burden weight
Race information on the day Race track
Number of horses running
Course distance
Course type
Course type (da/Turf/Obstacle)
weather
Going
Past race information of the horse (× 5 runs) Odds
Popular
Ranking
Time (seconds)
Difference
Elapsed days from the previous run
Course distance
Course type
Course type (da/Turf/Obstacle)
weather
Going

Predict horses within 3rd place

Using this data as an input, deep learning predicts ** "whether or not it is within 3rd place" **. ** The predicted value is a value from 0 to 100 (to call this the "third place index") **. The larger this value, the easier it is to finish in 3rd place.

By the way, this is the only code for the predictive model creation part, which is the core of deep learning.

python


import tensorflow as tf

model = tf.keras.Sequential([
    tf.keras.layers.Dense(300, kernel_regularizer=tf.keras.regularizers.l2(0.001), activation=tf.nn.relu, input_dim=len(train_df.columns)),
    tf.keras.layers.Dropout(0.2),
    tf.keras.layers.Dense(300, kernel_regularizer=tf.keras.regularizers.l2(0.001), activation=tf.nn.relu),
    tf.keras.layers.Dropout(0.2),
    tf.keras.layers.Dense(1, activation=tf.nn.sigmoid)
])

model.compile(
    loss='binary_crossentropy',
    optimizer=tf.keras.optimizers.Adam(),
    metrics=['accuracy'])

fit = model.fit(train_df,
    train_labels,
    validation_data=(valid_df, valid_labels),
    epochs=30,
    batch_size=32)

Try to buy in all races

With the model you created, try purchasing a betting ticket ** with the highest ** 3 finish index for each race. This is the result of simulation in all races after 2018.

item result
Number of target races (*) 3639
Number of target records 41871
Purchase number 3639
Hit number 1976
Hit rate 54.3%
Recovery rate 82.7%

It's a decent hit, but ** the recovery rate doesn't increase. ** I'm curious about the relationship between the 3rd place index, the hit rate (the rate that was actually within 3rd place), and the recovery rate.

Relationship between 3rd place index and hit rate / recovery rate

The relationship between the 3rd place index and the hit rate was like this. スクリーンショット 2019-11-01 14.10.12.png

** The higher the 3rd place index, the higher the hit rate **, so it seems to function as a model to hit the 3rd place or less. Then, ** If you buy only horses with a high index, will the recovery rate exceed 100%? ** **

Let's add the average recovery rate to the graph above. スクリーンショット 2019-11-01 14.05.30.png

The recovery rate is around 80% to 90% regardless of the hit rate. In other words, the higher the ** 3 finish index (the horse that is likely to win), the smaller the return. ** **

Then, I feel that the relationship between this index and the hit rate is similar to the ** relationship between odds and hit rate ** (the lower the odds, the smaller the return). So, for the time being, if you look at the relationship between the 3rd place index and the average odds ... スクリーンショット 2019-11-01 18.58.38.png

In inverse proportional graph. Horses with a high ** 3 finish index also have low odds. ** Even if you predict without looking at the odds, it will end up like this. It's interesting, but this seems to be the difficulty of horse racing.

To exceed 100% recovery rate

I found that the third place index and the odds are almost inversely proportional. But that's about average odds. If you look at each one, there should be ** "horses with high odds for a high third place index" **. In order to increase the recovery rate, it seems good to take advantage of such ** odds distortion **.

However, for example, if you buy all betting tickets with a 3rd place index of 70 or more and odds of 100 or more, the result will be disastrous.

item result
Purchase number 73
Hit number 1
Hit rate 1.37%
Recovery rate 16.9%

Horses with very high odds seem to have a low hit rate, even if the index is high. There may be good reason for the high odds. On the other hand, if the odds are too low, the dividend will naturally be small and the recovery rate will not increase. If so, ** the aim is in the middle. ** ** sanpu.png

(Since then, other people have pointed out the forecast target, so we have revised it to the result of expanding the target.)

In the 2018 forecast results, we narrowed down to a part of the range (around 55 to 60) where the 3rd place index is 60 or more and the odds are not too high **, and the recovery rate is 213% ** It feels pretty good.

item result
Purchase number 44
Hit number 10
Hit rate 22.7%
Recovery rate 213.6%

However, this is a good point for 2018, so it is natural with good results. However, when I simulated it in 2019 with the same prediction method, it also showed a ** recovery rate of 194% **. This is the total for 2 years (about 22 months).

item result
Purchase number 99
Hit number 19
Hit rate 19.2%
Recovery rate 202.63%

With this, the ** recovery rate of 100%, which was the target, was successfully exceeded. ** By the way ** After 2018, the income and expenditure graph ** when you buy 100 yen each time according to this forecast is as follows. It is rising steadily without a big drop. tmp.jpg

The possibility that the last two years have been good is not zero, but if it is stepped so far, it seems to be credible to some extent. The number of purchases is not large (about 4 times a month), but ** not choosing an enemy who cannot win ** may be a prerequisite for winning.

(Bonus) Predict horses with high dividends

It's okay to finish with this, but since it's a big deal, I'll try the ** "Expected Dividend Value" ** forecast as a different pattern from the 3rd place forecast.

I will give the data included in the input as the odds to deep learning and graph the predicted result in the same way as before. ** Horses that are likely to win even with high odds ** appear to have higher expectations. (The horizontal axis of the graph is still ahead, but it is omitted.) kitaichi_odds.png

It is the relationship between the expected value and the recovery rate / hit rate. kitaichi_return.png

As far as the recovery rate is rising, ** it seems that if you keep buying betting tickets with high expectations forever, the recovery rate will increase **, but this is also ** small in number and unstable in about 1 to 2 years ** it seems like.

If you buy all the horses with good performance ** expected value of 390-450 ** on the graph, it will exceed 100% for the time being, but it seems to be a local rise, so it seems that there will be no continuous stability.

item result
Purchase number 275
Hit number 37
Hit rate 13.5%
Recovery rate 131.1%

The balance of payments also fluctuates more than in the case of the 3rd place index. I'm aiming for higher odds, so the return when I hit it is big. shushi2.jpg

That's all we have done.

Click here for the program

The program (Python) for creating and verifying the prediction model of the 3rd place index is experimentally published on the following page for a fee. It's not beautiful enough to be used as a textbook, so please only look at it if you are curious and have plenty of money and mind.

With deep learning, the recovery rate can exceed 100% in horse racing (program)

[Click here for later talks] (https://note.mu/yossymura/n/na3d0a471193c)

Afterword

Recently, mobile payments have finally become widespread due to the large amount of QR payment campaign battles. After seeing how returns and coupons are being rushed to make anything popular other than payments, I once again felt that it is still "money" that moves people **, and then the next step is money-related. I was thinking of writing an article, so I'm happy to write it this way.

The input for this forecast does not include "horse name" or "jockey name". In other words, if you consider ** horse pedigree, jockey's battle history, compatibility **, etc., you will be able to make more accurate predictions. Also, I think that it will be improved not only by ** complementing missing data ** and ** batch normalization **, but also by simply ** parameter adjustment ** as deep learning, and ** betting tickets other than double wins *. * Can also be expected. In short, there is still room for growth.

On the other hand, it was convenient, but the only thing I was concerned about was that predicting horse racing with AI would spoil the "fun of predicting with my own head" **. I feel that AI cannot provide the joy of winning a horse that you choose with your own intuition, attachment to horses, and other feelings that overflow from you.

When everyone uses AI for all future predictions, will people still be able to gather at the racetrack and forget about themselves and continue to be enthusiastic? How long will you be able to see that large number of betting tickets flying in the sky? I want to make good use of it so that not only my work but also my heart will not be fascinated by the approaching wave of AI.

Recommended Posts

With deep learning, you can exceed 100% recovery rate in horse racing
[Verification] Just because there is deep learning, it does not mean that the recovery rate can easily exceed 100% in horse racing.
Horse Racing Prediction: If you think that the recovery rate has exceeded 100% in machine learning (LightGBM), it's a story
A story about achieving a horse racing recovery rate of over 100% through machine learning
A concrete method of predicting horse racing by machine learning and simulating the recovery rate
I wrote a code that exceeds 100% recovery rate in horse racing prediction using LightGBM (Part 2)
Try deep learning with TensorFlow
Deep Kernel Learning with Pyro
Try Deep Learning with FPGA
Generate Pokemon with Deep Learning
[Windows] Library Keras course where you can try Deep Learning immediately-Part 1
Dealing with tensorflow suddenly stopped working using GPU in deep learning
Try Deep Learning with FPGA-Select Cucumbers
Cat breed identification with deep learning
Make ASCII art with deep learning
Try deep learning with TensorFlow Part 2
Try horse racing prediction with Chainer
Solve three-dimensional PDEs with deep learning.
Check squat forms with deep learning
Categorize news articles with deep learning
Forecasting Snack Sales with Deep Learning
Make people smile with Deep Learning
Python | What you can do with Python
Put your own image data in Deep Learning and play with it
Dealing with Python error "Attribute Error: module'scipy.misc' has no attribute'imresize'" in deep learning
Coursera's TensorFlow introductory course to get you started with Deep Learning implementations
[Windows] A library where you can try Deep Learning immediately Keras course-Part 2