I'm sorry, but I will omit the explanation of horse racing terms. I think people who are interested in horse racing are reading it.
Information published on netkeiba.com (information obtained by scraping) There are various pedigrees, running times, mileage, etc. As a premise, scraped data is used as a model as it is I don't expect anything when I fit it. Information needs to be sorted, organized and analyzed.
Before deciding on a data analysis policy, first as a hypothesis
** There were various things from the start of the race to the last straight line, but the horse ran with all the remaining power in 3 halon uphill. The order of arrival is decided in the order of passing the goal **
will do. This may seem obvious, but it narrows down the information to consider. The data I use to proceed with the analysis are:
·lap time → Evaluate the pace of the entire race and the grade of the race ・ Race type turf or dirt / distance → Subdivide many types of races
Data for each horse ・ Order of passing corners for each horse, 3f time up → Classification of leg quality, evaluation of the last straight line in horses of the same leg quality
Supplement 1: Jockey, horse name, pedigree, frame order, running time, etc. are not considered Supplement 2: Make debut race is not included in the analysis because it does not consider pedigree and jockey.
That's a long but preface.
Now, let's create training data for AI to predict the race. Even if you create data for lower horses, it is useless because it does not get involved in the betting ticket. Create data with 1 to 6 horses in past races.
Example: scr.csv train.csv
Just a few lines.
Row data scraped by scr. Train calculated such as tidying up and standard deviation (training data) Even if it is troublesome, write out a new csv file firmly.
If you create the training data properly, you will get good results without having to mess with the model parameters. I'm playing with it.
With the data derived using this model, I was able to buy a 4-horse box and hit the horse single and the triple single. Forecast published on a certain central horse racing prediction site
Even in the race I removed, I think I was able to make a reasonably good one, with two of them in the third place.
What can be done as an improvement measure ・ Pedigree ・ Jockey ・ Frame order ・ Season ・ Negative features
Building predictions based on pedigree and jockey is the real thrill of horse racing, so I want to do it someday. But I can't even imagine how to do it. Is it possible to count the jockeys and fathers and mares who are within the third place in the race? Exploring. The order of the frames is not related to the turf, although it is said that the inner frame is advantageous. It is said that the outer frame is more advantageous in dirt because the riding ground is not rough. Does it mean that there is a difference in power between stallion and mare depending on the season during the estrus period? I have less than a year of horse racing, so I honestly don't know at all. I think that data scientists will exchange opinions about this area with horse racing enthusiasts.
Negative features are the spirit of trying to roast a horse that will surely be 4th or less. The aim is to avoid wasting money on dangerous popular horses and sneaking horses. Also, by making predictions from other perspectives, we will be able to have more confidence in the four horses that will buy the BOX.
The development language is python, and the framework is AWS / Cloud9 / jupyter notebook. I would like to write an article when I can afford the detailed code. The actual forecast is available on the cashier Magu site, so please come visit us.
Recommended Posts