I will write the actual development content (*) of the content written in Last blog.
It doesn't last as if I were studying.
Even if you ask a person, it will be fragmented, so you will think that you were able to do it. Also, even if I ask a technician on that road, I often don't even understand what I'm saying because it's so difficult (laughs). .. .. Even if one formula was explained by a new formula ... it was like!
(*) About SIVA being developed [facebook] https://www.facebook.com/AIkeiba/ [Twitter] https://twitter.com/Siva_keiba I like it because it will be live from time to time! Please follow me.
Then to the main subject!
When starting machine learning for horse racing prediction, we first gathered the target information. I didn't understand horse racing at all and wasn't interested in it, so I didn't understand it at all, but I referred to it because there was information on the internet that various people were actually trying.
http://stockedge.hatenablog.com/entry/2016/01/03/103428 http://www.mm-lab.jp/analysis/expect_the_arima_kinen_in_multiple_regression_analysis/
When I asked various people on Facebook, I heard that the weather, horse gender, horse age, and past race information are necessary, so I decided to collect them.
Fortunately, I found a person who is collecting data, so I got the data. I'll give it to git along with the program so that you can use it.
https://github.com/tsunaki00/horse_racing
I wanted to classify the information to some extent, but I gave up because I was a horse racing amateur. .. Therefore, the race information and horse parameter information were mastered and quantified in the program.
Prepare the following program and create a model
Label name | Description |
---|---|
event date | yyyy-mm-dd |
Racecourse | |
Race number | |
Race name | |
course | dirt |
Orbit | For dirt or turf, "right" for clockwise and "left" for counterclockwise |
distance | [m] |
Going | Ryo |
Prize money | [Ten thousand yen] |
Number of heads | |
Order of arrival | |
Frame number | |
Horse number | |
Horse name | |
sex | |
age | |
Jockey | |
time | [s] |
Difference | The difference with the horse in front,neck, |
Passing order | |
Uphill 3F | Last 600m time[s] |
Repulsion amount | [kg] |
Horse weight | [kg] |
Increase / decrease | Horse weight change from the previous race[kg] |
Popular | Descending number of odds |
Odds | |
Blinker | Blinker(Blindfold)If yes, "B" |
Trainer | |
Training comment | |
Training evaluation |
$ pip install numpy
$ pip install sklearn
#coding:utf-8
import csv
from sklearn.externals import joblib
from sklearn.ensemble import RandomForestClassifier
class Predict:
def __init__ (self) :
self.model = None
self.horse_data = []
self.train_data = []
self.train_target = []
#Test target
self.test_row_no = -1
self.master = {
1 : {
"Fukushima": 0, "Kokura": 1, "Kyoto": 2, "Hakodate": 3,
"Nakayama": 4, "Sapporo": 5, "Tokyo": 6,
"Hanshin": 7, "Chukyo": 8, "Niigata": 9
},
4 : { "Turf" : 0, "dirt" : 1, "Obstacle" : 2 },
5 : { "right" : 0, "left" : 1, "Turf" : 2, "Straight line" : 3, "right2周" : 4 },
7 : { "Bad" : 0, "Heavy" : 1, "稍Heavy" : 2, "Good" : 3 },
14 : {"Male" : 0, "Female" : 1, "Sen" : 2},
29 : {"A" : 0, "B" : 1, "C" : 2, "D" : 3, "E" : 4, "nan" : -1 }
}
def train(self):
hurdle_race_count = 0
header = []
label = []
with open("data/jra_race_result.csv", "r") as f:
reader = csv.reader(f)
#Create forecast data with data excluding failures
for idx, row in enumerate(reader):
if idx == 0:
for i, col in enumerate(row):
header = row
continue
elif row[4] == 'Obstacle' :
hurdle_race_count += 1
continue
horse = []
parameter = []
#Quantify with master data
for i, col in enumerate(row):
if i in {3, 13, 16, 18, 19, 26, 27, 28}:
horse.append(col)
continue
elif i == 0 :
if self.test_row_no == -1 and col == '2016-09-17' :
self.test_row_no = (idx - hurdle_race_count)
parameter.append(col.replace('-',''))
elif i == 10 :
label.append(header[i])
horse.append(col)
self.train_target.append(col)
elif self.master.has_key(i) :
if i == 1 :
horse.append(col)
label.append(header[i])
parameter.append(self.master[i][col])
else :
if i in (2, 12) :
horse.append(col)
label.append(header[i])
if col == '' or col == ' - ':
col = -1
parameter.append(float(col))
self.horse_data.append(horse)
self.train_data.append(parameter)
#Create a learning model(The algorithm is Random Forest)
#Parameter example n_estimators=xx, max_features="auto", n_jobs=-1
self.model = RandomForestClassifier()
#Learn with fit(9/Learn up to 17)
self.model.fit(self.train_data[0 : self.test_row_no - 1], self.train_target[0 : self.test_row_no - 1])
#When serializing a model
# joblib.dump(model, 'model.pkl')
#Importance of features (importance at the branch of Random Forest)
for i, xi in enumerate(self.model.feature_importances_):
print '{0}\t{1:.1f}%'.format(label[i], xi * 100)
def predict(self):
for i, val in enumerate(self.train_data[self.test_row_no:]):
#Predict with predict
predict = self.model.predict([val])[0]
if int(predict) == 1 :
result = "☓"
if int(predict) == int(self.horse_data[i][3]) :
result = "○"
print '{0} {1}R {2} {3} {4}Actual order of arrival: {5}Arrival{6}'.format(self.horse_data[i][0],
self.horse_data[i][1],
self.horse_data[i][2],
self.horse_data[i][4],
self.horse_data[i][5],
self.horse_data[i][3], result )
if __name__ == "__main__":
predict = Predict()
predict.train()
predict.predict()
It doesn't hit at all (laughs)
Nakayama 01R Sarah 3 years old unwinned 14 Diap Pira Actual order of arrival:16th ☓
Nakayama 02R Sarah 3 years old unwinned 1 Keiai Libra Actual order of arrival:1st ○
Nakayama 05R Sarah 3 years old unwinned 16 Mieno Wonder Actual order of arrival:1st ○
Nakayama 05R Sarah 3 years old unwinned 13 Belmule Actual order of arrival:12th ☓
Nakayama 06R Sarah 4 years old and under 5 million yen 14 Swift Swift Actual order of arrival:12th ☓
Nakayama 07R Sarah 4 years old and under 5 million yen 6 Symboli Sonne Actual order of arrival:12th ☓
Nakayama 08R Sarah 4 years old and under 10 million yen 13 Asakusa Marimba Actual order of arrival:11th ☓
Nakayama 09R First sunrise stakes 5 Hammer price Actual order of arrival:9th ☓
Nakayama 10R Junior Cup 14 Red Vivo Actual order of arrival:9th ☓
Nakayama 10R Junior Cup 11 Red Jive Actual order of arrival:10th ☓
Nakayama 11R Nikkan Sports Award Nakayama Kimpai (GIII) 9 Just Away Actual order of arrival:3rd ☓
Nakayama 11R Nikkan Sports Award Nakayama Kimpai (GIII) 16 Ike Dragon Actual order of arrival:15th ☓
Nakayama 12R Sarah 4 years old and under 10 million yen 2 Omega Blue Hawaii Actual order of arrival:3rd ☓
Kyoto 01R Sarah 3 years old unwinned 8 Lorraine Cross Actual order of arrival:3rd ☓
Kyoto 01R Sarah 3 years old unwinned 4 Denkou Showin Actual order of arrival:15th ☓
Kyoto 04R Sarah 4 years old and under 5 million yen 1 Mickey Kris S Actual order of arrival:8th ☓
Kyoto 05R Sarah 3 years old unwinned 13 Tanino black tie Actual order of arrival:8th ☓
Kyoto 06R Sarah 3 years old Shinma 11 Meishou Oniguma Actual order of arrival:8th ☓
Kyoto 07R Sarah 4 years old and under 5 million yen 5 Western Musashi Actual order of arrival:8th ☓
Kyoto 07R Sarah 4 years old and under 5 million yen 14 Soni Actual order of arrival:16th ☓
Kyoto 09R Fukujusou Special 12 Admire Dubai Actual order of arrival:2nd ☓
Kyoto 10R New Year Stakes 2 Taiki Percival Actual order of arrival:2nd ☓
Kyoto 11R Sports Nippon Award Kyoto Kimpai (GIII) 11 Sound of Your Heart Actual order of arrival:4th ☓
Kyoto 12R Sarah 4 years old and under 10 million yen 2 Suzuka Jonburu Actual order of arrival:4th ☓
Nakayama 02R Sarah 3 years old unwinned 9 Belmont Joey Actual order of arrival:7th ☓
Nakayama 03R Sarah 3 years old Shinma 8 Batting power Actual order of arrival:7th ☓
Nakayama 05R Sarah 3 years old unwinned 8 Ogon Chacha Actual order of arrival:9th ☓
Nakayama 05R Sarah 3 years old unwinned 9 Eaglemore Actual order of arrival:10th ☓
Nakayama 06R Sarah 3 years old Shinma 9 Macaroon Actual order of arrival:8th ☓
Nakayama 07R Sarah 4 years old and under 5 million yen 7 Torsen Airence Actual order of arrival:13th ☓
Nakayama 07R Sarah 4 years old and under 5 million yen 14 Cosmo dictat Actual order of arrival:14th ☓
Nakayama 08R Sarah 4 years old and under 10 million yen 4 Danone Schnapps Actual order of arrival:14th ☓
Nakayama 09R Kantake Award 12 Hikaru Pegasus Actual order of arrival:11th ☓
Nakayama 09R Kantake Award 4 Million Fresh Actual order of arrival:12th ☓
Nakayama 10R First Fuji Stakes 6 Stella Rossa Actual order of arrival:1st ○
Nakayama 10R First Fuji Stakes 2 Full Accelerator Actual order of arrival:2nd ☓
Nakayama 10R First Fuji Stakes 3 Maine Epona Actual order of arrival:11th ☓
Nakayama 11R January Stakes 5 Everest O Actual order of arrival:13th ☓
Nakayama 12R Sarah 4 years old and under 10 million yen 7 Symbolic Cardinal Actual order of arrival:13th ☓
Kyoto 02R Sarah 3 years old unwinned 6 Bamboo Baggio Actual order of arrival:14th ☓
Kyoto 03R Sarah 3 years old unwinned 3 Road Crosite Actual order of arrival:13th ☓
Kyoto 05R Sarah 3 years old Shinma 7 Bright Idea Actual order of arrival:6th ☓
Kyoto 05R Sarah 3 years old Shinma 3 Unshackled Actual order of arrival:7th ☓
Kyoto 06R Sarah 3 years old 5 million yen or less 4 Makoto Tannhäuser Actual order of arrival:12th ☓
Kyoto 07R Sarah 4 years old and under 5 million yen 9 Daisy Burrows Actual order of arrival:14th ☓
Kyoto 08R Sarah 4 years old and under 10 million yen 14 Nangoku Universe Actual order of arrival:14th ☓
Kyoto 09R Hatsuyume Stakes 2 Takao Noboru Actual order of arrival:1st ○
Kyoto 09R Hatsuyume Stakes 7 Albaton Actual order of arrival:11th ☓
Kyoto 09R Hatsuyume Stakes 5 Mickey Ballad Actual order of arrival:12th ☓
Kyoto 10R Manyo Stakes 5 Forgettable Actual order of arrival:6th ☓
Kyoto 10R Manyo Stakes 4 Seika Presto Actual order of arrival:9th ☓
Kyoto 11R Nikkan Sports Award Shinzan Kinen (GIII) 16 At Will Actual order of arrival:6th ☓
That's all for today. Parameter tuning methods and data normalization methods will be updated in the future.
The reason for this project was that my friend contacted me in the information processing test held in mid-October, saying "I may have failed the test!".
So I decided to predict the next year's exam questions for machine learning, which I had been thinking about learning for a long time.
The library used for the program used scikit-learn, which has a lot of information. [What I did] I categorized the problems from FY2013 to FY2016 and predicted the problem categories that will appear next year.
When I first touched it, I felt that the difficulty of the program was low, but I was at a loss when choosing an algorithm. (I still don't know what's right (laughs)) I found some useful information, so I will share it.
I want to predict the problems of the information processing test and verify it immediately, but I do not know the result because there will be no test until next spring ... Since there was such a problem, I decided to get results immediately and came to SIVA.
I will write again! SIVA will be live at any time, so I like it! Please follow me. [facebook] https://www.facebook.com/AIkeiba/ [Twitter] https://twitter.com/Siva_keiba
Recommended Posts