[PYTHON] [Horse Racing] I tried to quantify the strength of racehorses

Hello, this is Aoki (@aoki_eng). This time, I tried to quantify the strength of racehorses in horse racing, so I will summarize it in this article.

Click here for github (https://github.com/katsuomi/keiba-BTmodel)

Introduction

I love horse racing. Every weekend, I watch all the big races, called grade races, on TV and bet a small amount of money.

So, I often look at the past race results of competing horses, but I felt that I didn't know exactly how strong the horse was. For example image.png How strong is this horse? There are so many races that are in the first place, so it seems to be reasonably strong!

How about this horse? image.png There are so many races that are in the first place, so it seems to be reasonably strong!

Well, I can intuitively tell whether it is a strong horse or a weak horse, but I don't know how strong it is.

I want to express the strength of a horse concretely! !! Because of my curiosity, I decided to quantify it this time.

Using the Bradley-Terry model

What is the Bradley-Terry model?

There are n elements (teams and individuals), and some kind of battle is to be played. A match is a one-to-one match, and the result is only victory or defeat against one element. Let's assume that the "strength" of each element is measured from the results of several battles. Here, when the probability that element i wins element j is Pij, for all combinations, スクリーンショット 2020-05-12 17.54.11.png Introduce πi. The relational expression of equation (1) is called the Bradley-Terry (BT) model. In the BT model, πi can be thought of as representing the strength of element i. It is said that the BT model can decide the victory or defeat through the battle with a third party even if there is no direct confrontation. (Quoted from here)

This article does not go into detail about the BT model. Put simply,

** It is a model that can reasonably show the strength of each element against things like one-on-one battles! ** **

(I can't express what I'm good at or weak at, such as rock-paper-scissors)

A common example ・ Let's show the strength of the Central League and Pacific League teams! ・ Let's show the strength of the J League team! There is something like.

Now let's think about horse racing. For example, if the result of a race is as follows スクリーンショット 2020-05-12 18.11.15.png Focusing on the second horse, スクリーンショット 2020-05-12 18.08.41.png ・ I lost to the first horse ・ Won against horses 3-18

It can be said that.

In this way, in horse racing "A racehorse vs racehorse match is taking place" I thought about applying the BT model.

Implementation

・ From the official website of JRA, scraping and tabulating the race results from 2014 to 2018 ・ Apply the BT model to the result

The specific implementation method is posted here (https://github.com/katsuomi/keiba-BTmodel/blob/master/pointToHorseStrength.py)

result

スクリーンショット 2020-05-12 18.18.23.png

The strongest racehorse among active horses is ** Almond Eye **!

in conclusion

This time, I used the BT model to show the strength of the racehorse. After all, the numbers of horses that are still active and horses that have been active in the past are high, and there is nothing that can be obtained in particular. (Lol)

Oops This weekend, there will be a race called Victoria Mile, where Almond Eye, the strongest racehorse in the field, is scheduled to run. It's a rough race every year, but ... !!!!

[reference] About Bradley-Terry model https://www.gavo.t.u-tokyo.ac.jp/~mine/japanese/IT/2017/toukei171211.pdf Regarding horse performance https://www.netkeiba.com/ Regarding past race information http://www.jra.go.jp/

Recommended Posts

[Horse Racing] I tried to quantify the strength of racehorses
I tried to automate the 100 yen deposit of Rakuten horse racing (python / selenium)
I tried to get a database of horse racing using Pandas
I tried to touch the API of ebay
I tried to correct the keystone of the image
I tried to predict the price of ETF
I tried to vectorize the lyrics of Hinatazaka46!
zoom I tried to quantify the degree of excitement of the story at the meeting
I tried to summarize the basic form of GPLVM
I tried to visualize the spacha information of VTuber
I tried to erase the negative part of Meros
I tried to classify the voices of voice actors
I tried to summarize the string operations of Python
I tried to find the entropy of the image with python
I tried to get the location information of Odakyu Bus
I tried to find the average of the sequence with TensorFlow
[Python] I tried to visualize the follow relationship of Twitter
[Machine learning] I tried to summarize the theory of Adaboost
I tried to visualize the running data of the racing game (Assetto Corsa) with Plotly
I tried to fight the Local Minimum of Goldstein-Price Function
I tried to move the ball
I tried to estimate the interval.
[Linux] I tried to summarize the command of resource confirmation system
I tried to get the index of the list using the enumerate function
I tried to automate the watering of the planter with Raspberry Pi
I tried to build the SD boot image of LicheePi Nano
I tried to expand the size of the logical volume with LVM
I tried to summarize the frequently used implementation method of pytest-mock
I tried to improve the efficiency of daily work with Python
I tried to visualize the common condition of VTuber channel viewers
I tried the asynchronous server of Django 3.0
I tried to summarize the umask command
I tried to recognize the wake word
I tried to summarize the graphical modeling.
I tried to estimate the pi stochastically
I tried to touch the COTOHA API
I tried to transform the face image using sparse_image_warp of TensorFlow Addons
I tried to get the batting results of Hachinai using image processing
I tried to visualize the age group and rate distribution of Atcoder
I tried transcribing the news of the example business integration to Amazon Transcribe
I tried to estimate the similarity of the question intent using gensim's Doc2Vec
I tried how to improve the accuracy of my own Neural Network
I tried to solve the 2020 version of 100 language processing [Chapter 3: Regular expressions 25-29]
I tried to get the authentication code of Qiita API with Python.
I tried to automatically extract the movements of PES players with software
(Python) I tried to analyze 1 million hands ~ I tried to estimate the number of AA ~
I tried to summarize the logical way of thinking about object orientation.
I tried to find the optimal path of the dreamland by (quantum) annealing
I tried to extract and illustrate the stage of the story using COTOHA
I tried to verify and analyze the acceleration of Python by Cython
I tried to analyze the negativeness of Nono Morikubo. [Compare with Posipa]
I tried to streamline the standard role of new employees with Python
I tried to visualize the text of the novel "Weathering with You" with WordCloud
[Linux] I tried to verify the secure confirmation method of FQDN (CentOS7)
I tried to get the RSS of the top song of the iTunes store automatically
I tried to get the movie information of TMDb API with Python
I tried to display the altitude value of DTM in a graph
I tried the common story of using Deep Learning to predict the Nikkei 225
Using COTOHA, I tried to follow the emotional course of Run, Melos!
I tried to verify the result of A / B test by chi-square test
I tried to predict the behavior of the new coronavirus with the SEIR model.