The topic of horse racing and AI has increased in the last few years, but many of them are for all races. I would like to analyze only the cases that I personally care about! I thought, I tried using netkeiba-scraper this time.
The theme to be narrowed down this time is "How to buy a race in which horses with 1x wins run". In 2019, there were many cases where horses with a win of 1x were sunk outside the betting ticket, such as Japan Derby, Hanshin JF, and Arima Kinen. The analysis proceeded with a winning betting ticket.
For scraping horse racing data, I used the code of a great ancestor. Click here for gitHub
We have acquired the data for the past 10 years.
I didn't use genfeature
for feature extraction because it seemed to take about 40 days to process.
Also, since the JRA class name has been changed (ex. 5 million under → 1 win class), it seems that it will not be possible to properly genfeature
without fixing the Scala code.
Load the DL race.db file in Python and start data analysis.
dbconnect
import sqlite3 as sq
conn = sq.connect('race.db')
cur = conn.cursor()
cur.execute("SELECT r.race_id, i.race_name, r.order_of_finish, r.odds \
FROM race_result r INNER JOIN race_info i on r.race_id=i.id \
WHERE r.odds<2.0")
rows = cur.fetchall()
for row in rows:
print(row)
Move the race.db file to the same hierarchy as the Jupyter Notebook ipynb file and load it.
Since the race data in race.db is stored in the race_info
table and the race_result
table,
Get the information you want from each table.
Race_info table Information for each race is listed.
Race_result table The results of each horse are listed.
Expected value by odds(All classes)
#Number of wins
cur.execute("SELECT r.odds, count(r.race_id) \
FROM race_result r \
WHERE r.odds<2.0 AND r.order_of_finish='1' \
GROUP BY r.odds ORDER BY r.odds ASC")
rows = cur.fetchall()
#All races
cur.execute("SELECT r.odds, count(r.race_id) \
FROM race_result r \
WHERE r.odds<2.0 \
GROUP BY r.odds ORDER BY r.odds")
rows2 = cur.fetchall()
#Expected value calculation I think I'll write more
try:
t11 = rows[0][0] * rows[0][1] / (rows2[0][0] * rows2[0][1]) * 100
t12 = rows[1][0] * rows[1][1] / (rows2[1][0] * rows2[1][1]) * 100
t13 = rows[2][0] * rows[2][1] / (rows2[2][0] * rows2[2][1]) * 100
t14 = rows[3][0] * rows[3][1] / (rows2[3][0] * rows2[3][1]) * 100
t15 = rows[4][0] * rows[4][1] / (rows2[4][0] * rows2[4][1]) * 100
t16 = rows[5][0] * rows[5][1] / (rows2[5][0] * rows2[5][1]) * 100
t17 = rows[6][0] * rows[6][1] / (rows2[6][0] * rows2[6][1]) * 100
t18 = rows[7][0] * rows[7][1] / (rows2[7][0] * rows2[7][1]) * 100
t19 = rows[8][0] * rows[8][1] / (rows2[8][0] * rows2[8][1]) * 100
except:
None
print('Win 1.Expected value of 1x betting ticket%d' %t11)
print('Win 1.Double betting ticket expected value%d' %t12)
print('Win 1.Expected value of triple betting ticket%d' %t13)
print('Win 1.Expected value of quadruple betting ticket%d' %t14)
print('Win 1.Expected value of 5 times betting ticket%d' %t15)
print('Win 1.Expected value of 6 times betting ticket%d' %t16)
print('Win 1.7 times the expected value of betting tickets%d' %t17)
print('Win 1.Expected value of 8 times betting ticket%d' %t18)
print('Win 1.9 times the expected value of betting tickets%d' %t19)
Interestingly, we found that the lower the odds, the higher the expected win (= 1st place is likely). Is it because there are many historically famous horses when it becomes 1.1 times as much?
If we narrowed down to [Open class and above (including heavy prizes)] under similar conditions, the expected value of 1.1 times increased further.
OnlyOpenClass
cur.execute("SELECT r.odds, count(r.race_id) \
FROM race_result r INNER JOIN race_info i on r.race_id=i.id \
WHERE r.odds<2.0 AND r.order_of_finish='1' AND i.race_class like '%open%' \
GROUP BY r.odds ORDER BY r.odds ASC")
Which horse is most likely to win in a race in which horses in the 1x win range run? Is the 2nd or 3rd popularity interesting?
Unlike the analysis up to the above, it can not be calculated unless it is [limited to races with horses with 1x wins].
I used a subquery in the notation WHERE ~ IN (SELECT ~~)
.
ResultByPopularity
#Number of wins by popularity among races with horses with 1x wins
cur.execute("SELECT r.popularity, count(r.race_id) \
FROM race_result r \
WHERE r.order_of_finish='1' AND r.race_id IN (SELECT race_id from race_result WHERE odds<2.0 AND popularity='1') \
GROUP BY r.popularity ORDER BY r.popularity ASC")
rows = cur.fetchall()
#All races with horses in the 1x win
cur.execute("SELECT count(r.race_id) FROM race_result r WHERE r.odds<2.0 AND r.popularity='1'")
rows2 = cur.fetchall()
for row in rows:
print(row)
print(rows2[0][0])
Of all 7,619 races, about half are the most popular. The second most popular win is 6 times or more, and the third most popular is close to 10 times, or the expected value of 100 will not be exceeded.
I calculated the expected value by multiplying the odds by the number of horses and dividing by the number of races, but it was terrible at 32.
The result was that if you didn't subdivide the racetrack, dirt / turf, horse age, horse number, and horse leg quality, you wouldn't make a profit by looking at the odds. In addition, since there are various types of betting tickets for horse racing, [Which betting ticket type has the highest expectation in a race in which horses with a win of 1x are running? 】Such, It seems that we can analyze it more deeply.
Now that I've figured out how to extract SQL with Python (Jupiter Notebook), I'll continue to analyze it.
A story about whether netkeiba-scraper works as of June 2019 (Ubuntu 18.04.2 LTS)
Recommended Posts