[PYTHON] About the development contents of machine learning (Example)

Introduction

I will write the actual development content (*) of the content written in Last blog.

Important to get started with machine learning! !!

・ Have a clear purpose to study!

It doesn't last as if I were studying.

・ If you really want to do it, study for yourself

Even if you ask a person, it will be fragmented, so you will think that you were able to do it. Also, even if I ask a technician on that road, I often don't even understand what I'm saying because it's so difficult (laughs). .. .. Even if one formula was explained by a new formula ... it was like!

(*) About SIVA being developed [facebook] https://www.facebook.com/AIkeiba/ [Twitter] https://twitter.com/Siva_keiba I like it because it will be live from time to time! Please follow me.

Then to the main subject!

The first step in machine learning (marketing)

When starting machine learning for horse racing prediction, we first gathered the target information. I didn't understand horse racing at all and wasn't interested in it, so I didn't understand it at all, but I referred to it because there was information on the internet that various people were actually trying.

http://stockedge.hatenablog.com/entry/2016/01/03/103428 http://www.mm-lab.jp/analysis/expect_the_arima_kinen_in_multiple_regression_analysis/

Horse racing data collection (data collection)

When I asked various people on Facebook, I heard that the weather, horse gender, horse age, and past race information are necessary, so I decided to collect them.

Fortunately, I found a person who is collecting data, so I got the data. I'll give it to git along with the program so that you can use it.

https://github.com/tsunaki00/horse_racing

Information cleansing, analysis and classification

I wanted to classify the information to some extent, but I gave up because I was a horse racing amateur. .. Therefore, the race information and horse parameter information were mastered and quantified in the program.

Predictive modeling and experimentation

Prepare the following program and create a model

The prepared data is the following CSV (from JRA-VAN)

Label name Description
event date yyyy-mm-dd
Racecourse
Race number
Race name
course dirt
Orbit For dirt or turf, "right" for clockwise and "left" for counterclockwise
distance [m]
Going Ryo
Prize money [Ten thousand yen]
Number of heads
Order of arrival
Frame number
Horse number
Horse name
sex
age
Jockey
time [s]
Difference The difference with the horse in front,neck,
Passing order
Uphill 3F Last 600m time[s]
Repulsion amount [kg]
Horse weight [kg]
Increase / decrease Horse weight change from the previous race[kg]
Popular Descending number of odds
Odds
Blinker Blinker(Blindfold)If yes, "B"
Trainer
Training comment
Training evaluation

Package installation

  $ pip install numpy
  $ pip install sklearn

Forecasting program

#coding:utf-8
import csv

from sklearn.externals import joblib
from sklearn.ensemble import RandomForestClassifier

class Predict:
  def __init__ (self) :
    self.model = None
    self.horse_data = []  
    self.train_data = []  
    self.train_target = [] 
    #Test target
    self.test_row_no = -1 

    self.master = {
      1 : {
        "Fukushima": 0, "Kokura": 1, "Kyoto": 2, "Hakodate": 3,
        "Nakayama": 4, "Sapporo": 5, "Tokyo": 6,
        "Hanshin": 7, "Chukyo": 8, "Niigata": 9 
      },
      4 :  { "Turf" : 0, "dirt" : 1, "Obstacle" : 2 },
      5 :  { "right" : 0, "left" : 1, "Turf" : 2, "Straight line" : 3, "right2周" : 4 },
      7 :  { "Bad" : 0,  "Heavy" : 1, "稍Heavy" : 2, "Good" : 3 },
      14 : {"Male" : 0, "Female" : 1, "Sen" : 2},
      29 : {"A" : 0, "B" : 1, "C" : 2, "D" : 3, "E" : 4, "nan" : -1 }
    }

  def train(self):
    hurdle_race_count = 0
    header = []
    label = []
    with open("data/jra_race_result.csv", "r") as f:
      reader = csv.reader(f)
      #Create forecast data with data excluding failures
      for idx, row in enumerate(reader):
        if idx == 0:
          for i, col in enumerate(row):
            header = row
          continue
        elif row[4] == 'Obstacle' :
          hurdle_race_count += 1
          continue
        horse = []
        parameter = []
        #Quantify with master data
        for i, col in enumerate(row):
          if i in {3, 13, 16, 18, 19, 26, 27, 28}:
            horse.append(col)
            continue
          elif i == 0 : 
            if self.test_row_no == -1 and col == '2016-09-17' :
              self.test_row_no = (idx - hurdle_race_count)
            parameter.append(col.replace('-',''))
          elif i == 10 : 
            label.append(header[i])
            horse.append(col)
            self.train_target.append(col)
          elif self.master.has_key(i) :
            if i == 1 :
              horse.append(col)
            label.append(header[i])
            parameter.append(self.master[i][col])
          else :
            if i in (2, 12) :
              horse.append(col)
            label.append(header[i])
            if col == ''  or col == ' - ': 
              col = -1
            parameter.append(float(col))
        self.horse_data.append(horse)
        self.train_data.append(parameter)
    
    #Create a learning model(The algorithm is Random Forest)
    #Parameter example n_estimators=xx, max_features="auto", n_jobs=-1
    self.model = RandomForestClassifier()
    #Learn with fit(9/Learn up to 17)
    self.model.fit(self.train_data[0 : self.test_row_no - 1], self.train_target[0 : self.test_row_no - 1])
    #When serializing a model
    # joblib.dump(model, 'model.pkl') 
    #Importance of features (importance at the branch of Random Forest)
    for i, xi in enumerate(self.model.feature_importances_): 
      print '{0}\t{1:.1f}%'.format(label[i], xi * 100)

  def predict(self):
    for i, val in enumerate(self.train_data[self.test_row_no:]):
      #Predict with predict
      predict = self.model.predict([val])[0]
      if int(predict) == 1 :
        result = "☓"
        if int(predict) == int(self.horse_data[i][3]) :
          result = "○"
        print '{0} {1}R {2} {3} {4}Actual order of arrival: {5}Arrival{6}'.format(self.horse_data[i][0],
                                                             self.horse_data[i][1],
                                                             self.horse_data[i][2],
                                                             self.horse_data[i][4],
                                                             self.horse_data[i][5],
                                                             self.horse_data[i][3], result )

if __name__ == "__main__":
  predict = Predict()
  predict.train()
  predict.predict()

result

It doesn't hit at all (laughs)

Nakayama 01R Sarah 3 years old unwinned 14 Diap Pira Actual order of arrival:16th ☓
Nakayama 02R Sarah 3 years old unwinned 1 Keiai Libra Actual order of arrival:1st ○
Nakayama 05R Sarah 3 years old unwinned 16 Mieno Wonder Actual order of arrival:1st ○
Nakayama 05R Sarah 3 years old unwinned 13 Belmule Actual order of arrival:12th ☓
Nakayama 06R Sarah 4 years old and under 5 million yen 14 Swift Swift Actual order of arrival:12th ☓
Nakayama 07R Sarah 4 years old and under 5 million yen 6 Symboli Sonne Actual order of arrival:12th ☓
Nakayama 08R Sarah 4 years old and under 10 million yen 13 Asakusa Marimba Actual order of arrival:11th ☓
Nakayama 09R First sunrise stakes 5 Hammer price Actual order of arrival:9th ☓
Nakayama 10R Junior Cup 14 Red Vivo Actual order of arrival:9th ☓
Nakayama 10R Junior Cup 11 Red Jive Actual order of arrival:10th ☓
Nakayama 11R Nikkan Sports Award Nakayama Kimpai (GIII) 9 Just Away Actual order of arrival:3rd ☓
Nakayama 11R Nikkan Sports Award Nakayama Kimpai (GIII) 16 Ike Dragon Actual order of arrival:15th ☓
Nakayama 12R Sarah 4 years old and under 10 million yen 2 Omega Blue Hawaii Actual order of arrival:3rd ☓
Kyoto 01R Sarah 3 years old unwinned 8 Lorraine Cross Actual order of arrival:3rd ☓
Kyoto 01R Sarah 3 years old unwinned 4 Denkou Showin Actual order of arrival:15th ☓
Kyoto 04R Sarah 4 years old and under 5 million yen 1 Mickey Kris S Actual order of arrival:8th ☓
Kyoto 05R Sarah 3 years old unwinned 13 Tanino black tie Actual order of arrival:8th ☓
Kyoto 06R Sarah 3 years old Shinma 11 Meishou Oniguma Actual order of arrival:8th ☓
Kyoto 07R Sarah 4 years old and under 5 million yen 5 Western Musashi Actual order of arrival:8th ☓
Kyoto 07R Sarah 4 years old and under 5 million yen 14 Soni Actual order of arrival:16th ☓
Kyoto 09R Fukujusou Special 12 Admire Dubai Actual order of arrival:2nd ☓
Kyoto 10R New Year Stakes 2 Taiki Percival Actual order of arrival:2nd ☓
Kyoto 11R Sports Nippon Award Kyoto Kimpai (GIII) 11 Sound of Your Heart Actual order of arrival:4th ☓
Kyoto 12R Sarah 4 years old and under 10 million yen 2 Suzuka Jonburu Actual order of arrival:4th ☓
Nakayama 02R Sarah 3 years old unwinned 9 Belmont Joey Actual order of arrival:7th ☓
Nakayama 03R Sarah 3 years old Shinma 8 Batting power Actual order of arrival:7th ☓
Nakayama 05R Sarah 3 years old unwinned 8 Ogon Chacha Actual order of arrival:9th ☓
Nakayama 05R Sarah 3 years old unwinned 9 Eaglemore Actual order of arrival:10th ☓
Nakayama 06R Sarah 3 years old Shinma 9 Macaroon Actual order of arrival:8th ☓
Nakayama 07R Sarah 4 years old and under 5 million yen 7 Torsen Airence Actual order of arrival:13th ☓
Nakayama 07R Sarah 4 years old and under 5 million yen 14 Cosmo dictat Actual order of arrival:14th ☓
Nakayama 08R Sarah 4 years old and under 10 million yen 4 Danone Schnapps Actual order of arrival:14th ☓
Nakayama 09R Kantake Award 12 Hikaru Pegasus Actual order of arrival:11th ☓
Nakayama 09R Kantake Award 4 Million Fresh Actual order of arrival:12th ☓
Nakayama 10R First Fuji Stakes 6 Stella Rossa Actual order of arrival:1st ○
Nakayama 10R First Fuji Stakes 2 Full Accelerator Actual order of arrival:2nd ☓
Nakayama 10R First Fuji Stakes 3 Maine Epona Actual order of arrival:11th ☓
Nakayama 11R January Stakes 5 Everest O Actual order of arrival:13th ☓
Nakayama 12R Sarah 4 years old and under 10 million yen 7 Symbolic Cardinal Actual order of arrival:13th ☓
Kyoto 02R Sarah 3 years old unwinned 6 Bamboo Baggio Actual order of arrival:14th ☓
Kyoto 03R Sarah 3 years old unwinned 3 Road Crosite Actual order of arrival:13th ☓
Kyoto 05R Sarah 3 years old Shinma 7 Bright Idea Actual order of arrival:6th ☓
Kyoto 05R Sarah 3 years old Shinma 3 Unshackled Actual order of arrival:7th ☓
Kyoto 06R Sarah 3 years old 5 million yen or less 4 Makoto Tannhäuser Actual order of arrival:12th ☓
Kyoto 07R Sarah 4 years old and under 5 million yen 9 Daisy Burrows Actual order of arrival:14th ☓
Kyoto 08R Sarah 4 years old and under 10 million yen 14 Nangoku Universe Actual order of arrival:14th ☓
Kyoto 09R Hatsuyume Stakes 2 Takao Noboru Actual order of arrival:1st ○
Kyoto 09R Hatsuyume Stakes 7 Albaton Actual order of arrival:11th ☓
Kyoto 09R Hatsuyume Stakes 5 Mickey Ballad Actual order of arrival:12th ☓
Kyoto 10R Manyo Stakes 5 Forgettable Actual order of arrival:6th ☓
Kyoto 10R Manyo Stakes 4 Seika Presto Actual order of arrival:9th ☓
Kyoto 11R Nikkan Sports Award Shinzan Kinen (GIII) 16 At Will Actual order of arrival:6th ☓

Finally

That's all for today. Parameter tuning methods and data normalization methods will be updated in the future.

(Bonus) Trigger for SIVA development

The reason for this project was that my friend contacted me in the information processing test held in mid-October, saying "I may have failed the test!".

So I decided to predict the next year's exam questions for machine learning, which I had been thinking about learning for a long time.

The library used for the program used scikit-learn, which has a lot of information. [What I did] I categorized the problems from FY2013 to FY2016 and predicted the problem categories that will appear next year.

14642235_1785738458343919_3303819700484393142_n.jpg

When I first touched it, I felt that the difficulty of the program was low, but I was at a loss when choosing an algorithm. (I still don't know what's right (laughs)) I found some useful information, so I will share it.

14721760_1785764201674678_6869920007789574956_n.jpg

However, the results cannot be verified immediately by the prediction of the national examination! !! !!

I want to predict the problems of the information processing test and verify it immediately, but I do not know the result because there will be no test until next spring ... Since there was such a problem, I decided to get results immediately and came to SIVA.

I will write again! SIVA will be live at any time, so I like it! Please follow me. [facebook] https://www.facebook.com/AIkeiba/ [Twitter] https://twitter.com/Siva_keiba

Recommended Posts

About the development contents of machine learning (Example)
About testing in the implementation of machine learning models
About machine learning overfitting
Machine learning of sports-Analysis of J-League as an example-②
About data preprocessing of systems that use machine learning
Predict the gender of Twitter users with machine learning
Summary of the basic flow of machine learning with Python
Record of the first machine learning challenge with Keras
Basics of Machine Learning (Notes)
About machine learning mixed matrices
Importance of machine learning datasets
About the components of Luigi
About the features of Python
Simulation of the contents of the wallet
Try to evaluate the performance of machine learning / regression model
The result of Java engineers learning machine learning in Python www
Predict the presence or absence of infidelity by machine learning
How to increase the number of machine learning dataset images
[Machine learning] I tried to summarize the theory of Adaboost
An example of a mechanism that returns a prediction by HTTP from the result of machine learning
Significance of machine learning and mini-batch learning
Understand the contents of sklearn's pipeline
About the return value of pthread_mutex_init ()
Machine learning ③ Summary of decision tree
About the return value of the histogram.
About the basic type of Go
See the contents of Kumantic Segumantion
About the upper limit of threads-max
About the behavior of yield_per of SqlAlchemy
About the size of matplotlib points
About the basics list of Python basics
Application development using Azure Machine Learning
A story stuck with the installation of the machine learning library JAX
[Machine learning] Check the performance of the classifier with handwritten character data
How to use machine learning for work? 01_ Understand the purpose of machine learning
A memorandum of scraping & machine learning [development technique] by Python (Chapter 4)
A memorandum of scraping & machine learning [development technique] by Python (Chapter 5)
I checked the contents of docker volume
Machine learning algorithm (generalization of linear regression)
About the order of learning programming languages (from beginner to intermediate) Part 2
Record the steps to understand machine learning
Othello-From the tic-tac-toe of "Implementation Deep Learning" (3)
Learning notes from the beginning of Python 1
About the behavior of enable_backprop of Chainer v2
About the virtual environment of python version 3.7
How to use machine learning for work? 02_Overview of AI development project
A story about machine learning with Kyasuket
2020 Recommended 20 selections of introductory machine learning books
Machine learning
Machine learning algorithm (implementation of multi-class classification)
Personal notes and links about machine learning ① (Machine learning)
About the arguments of the setup function of PyCaret
About the Normal Equation of Linear Regression
Read all the contents of proc / [pid]
[Machine learning] List of frequently used packages
[Machine learning] What is the LP norm?
Let the machine "learn" the rules of FizzBuzz
Basic machine learning procedure: ③ Compare and examine the selection method of features
Python learning memo for machine learning by Chainer until the end of Chapter 2
Creating a development environment for machine learning
Learning notes from the beginning of Python 2