[Python & SQLite] I analyzed the expected value of a race with horses with a win of 1x ②

Overview

This is a sequel to the previous post [Python & SQLite] I analyzed the expected value of a race with horses in the 1x win range ①. Last time, I summarized the scraped data into SQL and calculated the expected win value with Python (Jupiter Notebook).

Continuing from the last time, the theme is "How to buy a race in which horses with 1x wins run".

Last reflection

I couldn't find a way to buy a winning betting ticket that exceeds the expected value of 100 because I calculated it without narrowing down the conditions such as the racetrack and distance. This time, let's narrow down the racetrack, distance, turf / dirt, and ticket type for detailed analysis.

Search for conditions that are likely to have high expectations by narrowing down the racetrack and ticket types

This time, I will write the results focusing on "Tokyo Racecourse". Other racecourses are fine, but I chose Tokyo Racecourse because it is a historic racecourse and I think it is easy for strong horses to demonstrate their abilities.

In addition, the ticket type will be analyzed by Maren (winning if two horses selected for either 1st or 2nd place are included). As you can see in the previous post, the most popular wins in the 1x win range have a nearly 50% chance of winning. We hypothesized that Maren has a good balance between ease of winning and payouts. (There is also a reason that there may be noise from horses that are not popular in the triple system.)

TokyoRacecource


#Shiba course Search for delicious betting tickets at Maren
#Find out the solidarity rate of the most popular horses in the 1x win range
cur.execute("SELECT i.distance, count(r.race_id) FROM race_result r \
INNER JOIN race_info i on r.race_id=i.id \
WHERE r.odds<2.0 AND r.order_of_finish IN ('1','2') AND i.surface like 'Turf%' AND i.place_detail like '%Tokyo%' \
AND r.race_id IN (SELECT race_id from race_result WHERE odds<2.0 AND popularity='1') \
GROUP BY i.distance ORDER BY i.distance")
rows = cur.fetchall()

print('Number of solidarity(Turf)')
for row in rows:
    print(row)
    
cur.execute("SELECT i.distance, count(r.race_id) FROM race_result r \
INNER JOIN race_info i on r.race_id=i.id \
WHERE r.odds<2.0 AND r.order_of_finish NOT IN ('1', '2') AND i.surface like 'Turf%' AND i.place_detail like '%Tokyo%'\
AND r.race_id IN (SELECT race_id from race_result WHERE odds<2.0 AND popularity='1') \
GROUP BY i.distance ORDER BY i.distance")
rows2 = cur.fetchall()

print('Number of 3 or less(Turf)')
for row2 in rows2:
    print(row2)

rentai1800 = round(rows[2][1] / (rows[2][1] + rows2[2][1]) * 100, 2)
print('1800m solidarity rate: %f percent' %rentai1800)
rentai2000 = round(rows[3][1] / (rows[3][1] + rows2[3][1]) * 100, 2)
print('2000m solidarity rate: %f percent' %rentai2000)

----------result----------
Number of solidarity(Turf)
(1400, 66)
(1600, 89)
(1800, 90)
(2000, 84)
(2300, 2)
(2400, 40)
(3400, 1)
Number of 3 or less(Turf)
(1400, 31)
(1600, 59)
(1800, 38)
(2000, 31)
(2300, 2)
(2400, 22)
1800m solidarity rate: 70.310000 percent
2000m solidarity rate: 73.040000 percent

The results of 1800m and 2000m are especially good. The 2400m where the Japan Derby, Oaks, and Japan Cup are held is delicate at about 64%.

Let's dig deeper into the conditions of turf 2000m, which has the highest solidarity rate.

Tokyo Shiba 2000m Let's dig deeper into the conditions when the most popular is solidarity

Another solidarity horse [winning popularity]

Another solidarity horse


#Try to squeeze at 2000m
#1 Another popularity when popularity is solidarity

cur.execute("SELECT r.popularity, count(r.race_id) FROM race_result r \
INNER JOIN race_info i on r.race_id=i.id \
WHERE r.order_of_finish IN ('1','2') AND r.popularity != '1' \
AND r.race_id IN (SELECT race_id from race_result r INNER JOIN race_info i on r.race_id=i.id WHERE odds<2.0 AND popularity='1' \
AND order_of_finish IN ('1','2') AND i.surface like 'Turf%' AND i.place_detail like '%Tokyo%' AND i.distance=2000) \
GROUP BY r.popularity ORDER BY r.popularity")
rows = cur.fetchall()

print('(Popular,Number of times)')
for row in rows:
    print(row)

----------result----------
(Popular,Number of times)
(2, 21)
(3, 31)
(4, 12)
(5, 6)
(6, 7)
(7, 4)
(8, 2)
(11, 1)

Surprisingly, there were more solidarities of 1st and 3rd popularity than solidarity of 1st and 2nd popularity. Is it the result of other horses marking the second most popular horse, which is more likely to win than the most popular horse?

By the way, [when the most popular horse with a win of 1x was not solidarity] and [the most number of solidarity] was also the 3rd most popular. Whether the most popular wins or moss, buying from the second most seems to be less interesting.

Another solidarity horse [Position at the 4th corner]

4cornerPosition


cur.execute("SELECT substr(r.pass, -2), count(r.race_id) FROM race_result r \
INNER JOIN race_info i on r.race_id=i.id \
WHERE r.order_of_finish IN ('1','2') AND r.popularity != '1' \
AND r.race_id IN (SELECT race_id from race_result r INNER JOIN race_info i on r.race_id=i.id WHERE odds<2.0 AND popularity='1' \
AND order_of_finish IN ('1','2') AND i.surface like 'Turf%' AND i.place_detail like '%Tokyo%' AND i.distance=2000) \
GROUP BY substr(r.pass, -2) ORDER BY substr(r.pass, -2)")
rows = cur.fetchall()

print('(Count,Number of times)')
for row in rows:
    print(row)

----------result----------
(Count,Number of times)
('-1', 14)
('-2', 15)
('-3', 14)
('-4', 4)
('-5', 10)
('-6', 6)
('-7', 4)
('-8', 4)
('-9', 2)
('10', 3)
('11', 3)
('12', 2)
('13', 3)

At the last straight entrance, horses that were in 5th place seem to be in solidarity. It is possible that this is because the horse with the leading leg is sticky, and because there was spare capacity in the first place, it is in the front.

A popular horse with a win of 1x is running in the Tokyo Shiba 2000m, and if the 3rd most popular horse is the leading leg, the expected value seems to be high. It reminds me of the 2019 Tenno Sho Autumn ...!

Try to calculate the expected value of Maren (information acquisition from 3 tables)

From here, we will utilize three tables obtained by netkeiba-scraper. Therefore, I did [Nesting subqueries to narrow down the conditions].

Winning No. 1 Popular-Winning No. 2 Popular Maren

First of all, it is the expected value when you purchase the [1st most popular-2nd most popular] Maren, which seems to have a low expected value.

1-2 Popular Maren Expected Value


# race_result race_id and payoff race_Calculate the expected value by associating the id
# ticket_type win 0,Double win 1,Frame Ren 2,Maren 3,Wide 4,Horse single 5,Triple 6,Triple single 7

#Using double subqueries, total horse reimbursement for [among the races in which one horse was solidarity] and [the race in which another horse was the second most popular]
cur.execute("SELECT DISTINCT p.race_id, p.payoff FROM payoff p \
INNER JOIN race_result r ON p.race_id=r.race_id INNER JOIN race_info i on p.race_id=i.id \
WHERE p.ticket_type=3 AND p.race_id IN \
        (SELECT r.race_id FROM race_result r \
        INNER JOIN race_info i on r.race_id=i.id \
        WHERE r.order_of_finish IN ('1','2') AND r.popularity='2' \
            AND r.race_id IN (SELECT race_id from race_result r INNER JOIN race_info i on r.race_id=i.id \
            WHERE odds<2.0 AND popularity='1' AND order_of_finish IN ('1','2') AND i.surface like 'Turf%' \
            AND i.place_detail like '%Tokyo%' AND i.distance=2000))")
rows = cur.fetchall()
umaren_sum = 0
for row in rows:
    umaren_sum += row[1]

#Find out how many races a horse with a win of 1x has entered
cur.execute("SELECT count(race_id) from race_result r INNER JOIN race_info i on r.race_id=i.id \
            WHERE odds<2.0 AND popularity='1' AND i.surface like 'Turf%' AND i.place_detail like '%Tokyo%' \
            AND i.distance=2000")
rows2 = cur.fetchall()

print('Total number of horses in a race in which horses in the 1x win:')
print(umaren_sum)

print('The number of races in which horses with a win of 1x entered the Tokyo Shiba 2000m:')
print(rows2[0][0])

print('Maren expected value')
print(round(umaren_sum / rows2[0][0], 2))

-----result-----
Total number of horses in a race in which horses in the 1x win:
6740.0
The number of races in which horses with a win of 1x entered the Tokyo Shiba 2000m:
115
Maren expected value
58.61

After all it is low. It doesn't make sense to buy another popular horse, as one horse seems to be tough. In this case, it is better to buy the most popular win.

Winning No. 1 Popular-Winning No. 3 Popular Maren

Next, it is the expected value when you purchase the [1st most popular-3rd most popular] Maren, which you can expect the most expected value.

1-3 Popular Maren Expected Value


# race_result race_id and payoff race_Calculate the expected value by associating the id
# ticket_type win 0,Double win 1,Frame Ren 2,Maren 3,Wide 4,Horse single 5,Triple 6,Triple single 7

#Using double subqueries, total horse reimbursement for [among the races in which one horse was solidarity] and [the race in which another horse was the third most popular]
cur.execute("SELECT DISTINCT p.race_id, p.payoff FROM payoff p \
INNER JOIN race_result r ON p.race_id=r.race_id INNER JOIN race_info i on p.race_id=i.id \
WHERE p.ticket_type=3 AND p.race_id IN \
        (SELECT r.race_id FROM race_result r \
        INNER JOIN race_info i on r.race_id=i.id \
        WHERE r.order_of_finish IN ('1','2') AND r.popularity='3' \
            AND r.race_id IN (SELECT race_id from race_result r INNER JOIN race_info i on r.race_id=i.id \
            WHERE odds<2.0 AND popularity='1' AND order_of_finish IN ('1','2') AND i.surface like 'Turf%' \
            AND i.place_detail like '%Tokyo%' AND i.distance=2000))")
rows = cur.fetchall()
umaren_sum = 0
for row in rows:
    umaren_sum += row[1]

#Find out how many races a horse with a win of 1x has entered
cur.execute("SELECT count(race_id) from race_result r INNER JOIN race_info i on r.race_id=i.id \
            WHERE odds<2.0 AND popularity='1' AND i.surface like 'Turf%' AND i.place_detail like '%Tokyo%' \
            AND i.distance=2000")
rows2 = cur.fetchall()

print('Total number of horses in a race in which horses in the 1x win:')
print(umaren_sum)

print('The number of races in which horses with a win of 1x entered the Tokyo Shiba 2000m:')
print(rows2[0][0])

print('Maren expected value')
print(round(umaren_sum / rows2[0][0], 2))

-----result-----
Total number of horses in a race in which horses in the 1x win:
18460.0
The number of races in which horses with a win of 1x entered the Tokyo Shiba 2000m:
115
Maren expected value
160.52

The expected value has finally exceeded 100! If you knew this data, you would have bought [Almond Eye and Danon Premium Maren] in the fall of the 2019 Tenno Sho.

Winning No. 1 Popular-Winning No. 4 Popular Maren

Just in case, let's take a look at the expected value when purchasing the [1st most popular-4th most popular] Maren.

1-4 Popular Maren Expected Value


# race_result race_id and payoff race_Calculate the expected value by associating the id
# ticket_type win 0,Double win 1,Frame Ren 2,Maren 3,Wide 4,Horse single 5,Triple 6,Triple single 7

#Using double subqueries, total horse reimbursement for [among the races in which 1x horses were solidarity] [race in which another horse was the 4th most popular]
cur.execute("SELECT DISTINCT p.race_id, p.payoff FROM payoff p \
INNER JOIN race_result r ON p.race_id=r.race_id INNER JOIN race_info i on p.race_id=i.id \
WHERE p.ticket_type=3 AND p.race_id IN \
        (SELECT r.race_id FROM race_result r \
        INNER JOIN race_info i on r.race_id=i.id \
        WHERE r.order_of_finish IN ('1','2') AND r.popularity='4' \
            AND r.race_id IN (SELECT race_id from race_result r INNER JOIN race_info i on r.race_id=i.id \
            WHERE odds<2.0 AND popularity='1' AND order_of_finish IN ('1','2') AND i.surface like 'Turf%' \
            AND i.place_detail like '%Tokyo%' AND i.distance=2000))")
rows = cur.fetchall()
umaren_sum = 0
for row in rows:
    umaren_sum += row[1]

#Find out how many races 1x horses have run
cur.execute("SELECT count(race_id) from race_result r INNER JOIN race_info i on r.race_id=i.id \
            WHERE odds<2.0 AND popularity='1' AND i.surface like 'Turf%' AND i.place_detail like '%Tokyo%' \
            AND i.distance=2000")
rows2 = cur.fetchall()

print('Total number of horses in a race in which horses in the 1x win:')
print(umaren_sum)

print('The number of races in which horses with a win of 1x entered the Tokyo Shiba 2000m:')
print(rows2[0][0])

print('Maren expected value')
print(round(umaren_sum / rows2[0][0], 2))

-----result-----
Total number of horses in a race in which horses in the 1x win:
10030.0
The number of races in which horses with a win of 1x entered the Tokyo Shiba 2000m:
115
Maren expected value
87.22

Although this does not exceed 100, it was found that the expected value is higher than [1-2 Popular Maren].

Summary

By narrowing down the racetracks, courses, and ticket types, the expected value calculation has become deeper. I'm getting used to handling SQL. If you change the conditions, you can analyze it at other racetracks, so why not try calculating the expected value of the racetrack near your place of residence?

The next goal is to capture WIN5 (a betting ticket that wins all the winning horses in the set 5 races. The maximum dividend is hundreds of millions of yen)! It will be more difficult because it is necessary to analyze the horse field reading and development reading of horse racing, but I will try it!

Thank you for reading this far.

Recommended Posts

[Python & SQLite] I analyzed the expected value of a race with horses with a win of 1x ②
I tried hundreds of millions of SQLite with python
I installed Pygame with Python 3.5.1 in the environment of pyenv on OS X
[Discode Bot] I created a bot that tells me the race value of Pokemon
A memo that I touched the Datastore with python
Get the value of a specific key in a list from the dictionary type in the list with Python
I thought about why Python self is necessary with the feeling of a Python interpreter
I tried to find the entropy of the image with python
I replaced the Windows PowerShell cookbook with a python script.
I tried "gamma correction" of the image with Python + OpenCV
I just changed the sample source of Python a little.
I wrote the basic grammar of Python with Jupyter Lab
I evaluated the strategy of stock system trading with Python.
I tried a stochastic simulation of a bingo game with Python
[Python] I want to make a 3D scatter plot of the epicenter with Cartopy + Matplotlib!
I tried to create a Python script to get the value of a cell in Microsoft Excel
I wrote a doctest in "I tried to simulate the probability of a bingo game with Python"
I compared the speed of Hash with Topaz, Ruby and Python
I tried scraping the ranking of Qiita Advent Calendar with Python
I wrote AWS Lambda, and I was a little addicted to the default value of Python arguments
Save the result of the life game as a gif with python
[python, ruby] fetch the contents of a web page with selenium-webdriver
[Introduction to StyleGAN] I played with "The Life of a Man" ♬
I want to output the beginning of the next month with Python
I made a fortune with Python.
I tried to create a list of prime numbers with python
The story of making a standard driver for db with python.
I liked the tweet with python. ..
I wanted to solve the ABC164 A ~ D problem with Python
The idea of feeding the config file with a python file instead of yaml
Tips: [Python] Calculate the average value of the specified area with bedgraph
I tried to improve the efficiency of daily work with Python
I made a daemon with Python
The story of making a module that skips mail with python
Create a compatibility judgment program with the random module of python.
I'm tired of Python, so I analyzed the data with nehan (corona related, is that word now?)
[Python] I analyzed the diary of a first-year member of society and made a positive / negative judgment on the life of a member of society.
I replaced the numerical calculation of Python with Rust and compared the speed
The story of making a university 100 yen breakfast LINE bot with Python
[AtCoder explanation] Control the A, B, C problems of ABC182 with Python!
Calculate the shortest route of a graph with Dijkstra's algorithm and Python
Get the number of searches with a regular expression. SeleniumBasic VBA Python
[Introduction to Python] How to sort the contents of a list efficiently with list sort
I tried to get the authentication code of Qiita API with Python.
Calculate the probability of being a squid coin with Bayes' theorem [python]
Hit a method of a class instance with the Python Bottle Web API
I get a Python No module named'encodings' error with the aws command
Receive a list of the results of parallel processing in Python with starmap
[Python, ObsPy] I drew a beach ball on the map with Cartopy + ObsPy.
I made a GAN with Keras, so I made a video of the learning process.
A reminder of what I got stuck when starting Atcoder with python
I tried to streamline the standard role of new employees with Python
[AtCoder explanation] Control the A, B, C problems of ABC187 with Python!
I tried to get the movie information of TMDb API with Python
I made a program to check the size of a file in Python
I made a mistake in fetching the hierarchy with MultiIndex of pandas
I tried to display the altitude value of DTM in a graph
Get the return value of an external shell script (ls) with python3
I measured the speed of list comprehension, for and while with python2.7.
Python: I want to measure the processing time of a function neatly
[AtCoder explanation] Control the A, B, C problems of ABC184 with Python!