[Python & SQLite] I tried to analyze the expected value of a race with horses in the 1x win range ①

Overview

The topic of horse racing and AI has increased in the last few years, but many of them are for all races. I would like to analyze only the cases that I personally care about! I thought, I tried using netkeiba-scraper this time.

The theme to be narrowed down this time is "How to buy a race in which horses with 1x wins run". In 2019, there were many cases where horses with a win of 1x were sunk outside the betting ticket, such as Japan Derby, Hanshin JF, and Arima Kinen. The analysis proceeded with a winning betting ticket.

Data acquisition with netkeiba-scraper

For scraping horse racing data, I used the code of a great ancestor. Click here for gitHub

We have acquired the data for the past 10 years. I didn't use genfeature for feature extraction because it seemed to take about 40 days to process. Also, since the JRA class name has been changed (ex. 5 million under → 1 win class), it seems that it will not be possible to properly genfeature without fixing the Scala code.

Load the DL race.db file in Python and start data analysis.

Load SQLite with Jupyter Notebook

dbconnect


import sqlite3 as sq

conn = sq.connect('race.db')
cur = conn.cursor()

cur.execute("SELECT r.race_id, i.race_name, r.order_of_finish, r.odds \
             FROM race_result r INNER JOIN race_info i on r.race_id=i.id \
             WHERE r.odds<2.0")

rows = cur.fetchall()
for row in rows:
    print(row)

Move the race.db file to the same hierarchy as the Jupyter Notebook ipynb file and load it. Since the race data in race.db is stored in the race_info table and the race_result table, Get the information you want from each table.

Expected value of a horse that won with 1x popularity

Expected value by odds(All classes)


#Number of wins
cur.execute("SELECT r.odds, count(r.race_id) \
             FROM race_result r \
             WHERE r.odds<2.0 AND r.order_of_finish='1' \
             GROUP BY r.odds ORDER BY r.odds ASC")
rows = cur.fetchall()
#All races
cur.execute("SELECT r.odds, count(r.race_id) \
             FROM race_result r \
             WHERE r.odds<2.0 \
             GROUP BY r.odds ORDER BY r.odds")
rows2 = cur.fetchall()

#Expected value calculation I think I'll write more
try:
    t11 = rows[0][0] * rows[0][1] / (rows2[0][0] * rows2[0][1]) * 100
    t12 = rows[1][0] * rows[1][1] / (rows2[1][0] * rows2[1][1]) * 100
    t13 = rows[2][0] * rows[2][1] / (rows2[2][0] * rows2[2][1]) * 100
    t14 = rows[3][0] * rows[3][1] / (rows2[3][0] * rows2[3][1]) * 100
    t15 = rows[4][0] * rows[4][1] / (rows2[4][0] * rows2[4][1]) * 100
    t16 = rows[5][0] * rows[5][1] / (rows2[5][0] * rows2[5][1]) * 100
    t17 = rows[6][0] * rows[6][1] / (rows2[6][0] * rows2[6][1]) * 100
    t18 = rows[7][0] * rows[7][1] / (rows2[7][0] * rows2[7][1]) * 100
    t19 = rows[8][0] * rows[8][1] / (rows2[8][0] * rows2[8][1]) * 100
except:
    None

print('Win 1.Expected value of 1x betting ticket%d' %t11)
print('Win 1.Double betting ticket expected value%d' %t12)
print('Win 1.Expected value of triple betting ticket%d' %t13)
print('Win 1.Expected value of quadruple betting ticket%d' %t14)
print('Win 1.Expected value of 5 times betting ticket%d' %t15)
print('Win 1.Expected value of 6 times betting ticket%d' %t16)
print('Win 1.7 times the expected value of betting tickets%d' %t17)
print('Win 1.Expected value of 8 times betting ticket%d' %t18)
print('Win 1.9 times the expected value of betting tickets%d' %t19)

image.png Interestingly, we found that the lower the odds, the higher the expected win (= 1st place is likely). Is it because there are many historically famous horses when it becomes 1.1 times as much?

If we narrowed down to [Open class and above (including heavy prizes)] under similar conditions, the expected value of 1.1 times increased further.

OnlyOpenClass


cur.execute("SELECT r.odds, count(r.race_id) \
             FROM race_result r INNER JOIN race_info i on r.race_id=i.id \
             WHERE r.odds<2.0 AND r.order_of_finish='1' AND i.race_class like '%open%' \
             GROUP BY r.odds ORDER BY r.odds ASC")

image.png

Find out the results by popularity

Which horse is most likely to win in a race in which horses in the 1x win range run? Is the 2nd or 3rd popularity interesting?

Unlike the analysis up to the above, it can not be calculated unless it is [limited to races with horses with 1x wins]. I used a subquery in the notation WHERE ~ IN (SELECT ~~).

ResultByPopularity


#Number of wins by popularity among races with horses with 1x wins
cur.execute("SELECT r.popularity, count(r.race_id) \
             FROM race_result r \
             WHERE r.order_of_finish='1' AND r.race_id IN (SELECT race_id from race_result WHERE odds<2.0 AND popularity='1') \
             GROUP BY r.popularity ORDER BY r.popularity ASC")
rows = cur.fetchall()
#All races with horses in the 1x win
cur.execute("SELECT count(r.race_id) FROM race_result r WHERE r.odds<2.0 AND r.popularity='1'")
rows2 = cur.fetchall()

for row in rows:
    print(row)
print(rows2[0][0])

image.png Of all 7,619 races, about half are the most popular. The second most popular win is 6 times or more, and the third most popular is close to 10 times, or the expected value of 100 will not be exceeded.

image.png I calculated the expected value by multiplying the odds by the number of horses and dividing by the number of races, but it was terrible at 32.

Summary

The result was that if you didn't subdivide the racetrack, dirt / turf, horse age, horse number, and horse leg quality, you wouldn't make a profit by looking at the odds. In addition, since there are various types of betting tickets for horse racing, [Which betting ticket type has the highest expectation in a race in which horses with a win of 1x are running? 】Such, It seems that we can analyze it more deeply.

Now that I've figured out how to extract SQL with Python (Jupiter Notebook), I'll continue to analyze it.

Reference article

A story about whether netkeiba-scraper works as of June 2019 (Ubuntu 18.04.2 LTS)

Recommended Posts

[Python & SQLite] I tried to analyze the expected value of a race with horses in the 1x win range ①
[Python & SQLite] I analyzed the expected value of a race with horses with a win of 1x ②
I tried to create a Python script to get the value of a cell in Microsoft Excel
I wrote a doctest in "I tried to simulate the probability of a bingo game with Python"
I tried to display the altitude value of DTM in a graph
I tried to find the entropy of the image with python
I also tried to imitate the function monad and State monad with a generator in Python
I tried to create a list of prime numbers with python
I tried to improve the efficiency of daily work with Python
A story that didn't work when I tried to log in with the Python requests module
I tried to get the authentication code of Qiita API with Python.
(Python) I tried to analyze 1 million hands ~ I tried to estimate the number of AA ~
I tried to verify and analyze the acceleration of Python by Cython
I tried to analyze the negativeness of Nono Morikubo. [Compare with Posipa]
I tried to streamline the standard role of new employees with Python
I tried to get the movie information of TMDb API with Python
I made a program to check the size of a file in Python
I tried to implement a card game of playing cards in Python
I tried hundreds of millions of SQLite with python
I tried to open the latest data of the Excel file managed by date in the folder with Python
I tried to get and analyze the statistical data of the new corona with Python: Data of Johns Hopkins University
I tried to make a simple mail sending application with tkinter of Python
I installed Pygame with Python 3.5.1 in the environment of pyenv on OS X
To output a value even in the middle of a cell with Jupyter Notebook
I tried to predict the horses that will be in the top 3 with LightGBM
I tried to easily visualize the tweets of JAWS DAYS 2017 with Python + ELK
How to get a list of files in the same directory with python
I tried to create a model with the sample of Amazon SageMaker Autopilot
I tried to automatically send the literature of the new coronavirus to LINE with Python
I tried to graph the packages installed in Python
I tried to touch the CSV file with Python
I tried to draw a route map with Python
I tried to solve the soma cube with python
I tried to implement a pseudo pachislot in Python
I want to work with a robot in python.
I tried to automatically generate a password with Python3
In IPython, when I tried to see the value, it was a generator, so I came up with it when I was frustrated.
I tried to solve the problem with Python Vol.1
I tried to analyze J League data with Python
I wanted to know the number of lines in multiple files, so I tried to get it with a command
Introduction to AI creation with Python! Part 2 I tried to predict the house price in Boston with a neural network
I tried to summarize the string operations of Python
I tried to find out how to streamline the work flow with Excel x Python ④
Get the value of a specific key in a list from the dictionary type in the list with Python
I tried to find out how to streamline the work flow with Excel x Python ⑤
I tried to put out the frequent word ranking of LINE talk with Python
I tried to automate the article update of Livedoor blog with Python and selenium.
[Python] I tried to automatically create a daily report of YWT with Outlook mail
I tried to find out how to streamline the work flow with Excel x Python ①
I tried to find out how to streamline the work flow with Excel x Python ③
I tried to compare the processing speed with dplyr of R and pandas of Python
The 15th offline real-time I tried to solve the problem of how to write with python
I tried to make a function to retrieve data from database column by column using sql with sqlite3 of python [sqlite3, sql, pandas]
(Python: OpenCV) I tried to output a value indicating the distance between regions while binarizing the video in real time.
I'm tired of Python, so I tried to analyze the data with nehan (I want to go live even with corona sickness-Part 2)
I'm tired of Python, so I tried to analyze the data with nehan (I want to go live even with corona sickness-Part 1)
I tried "gamma correction" of the image with Python + OpenCV
I tried to simulate how the infection spreads with Python
I tried to analyze the whole novel "Weathering with You" ☔️
I tried the accuracy of three Stirling's approximations in python
I tried to find the average of the sequence with TensorFlow