[PYTHON] I tried to get a database of horse racing using Pandas

Introduction

I was interested in it as a data analysis theme, so I tried it.

The site I referred to is here.

Flow to horse racing prediction

If you want to build a predictive model from scratch, you need to take the following steps:

  1. Scraping data from horse racing site
  2. Data preprocessing
  1. Divide into training data and test data
  2. Model building
  1. Predict the race you want to predict

This time, I will briefly summarize the scraping related items in 1.

About scraping

net.keiba.com I scraped from this site.

important point

Retrieving a large amount of data at one time puts a load on the server. By inserting time.sleep (1), it waits when requesting race_id_list every second. It is etiquette to reduce the server load by this.

import pandas pd
from tqdm import tqdm_notebook as tqdm
import time

def scrape_race_results(race_id_list):
    race_results={}
    for race_id in tqdm(race_id_list):
        try:
            url = 'https://db.netkeiba.com/race/'+ race_id
            race_results[race_id]= pd.read_html(url)[0]
            time.sleep(1)
        except IndexError:
            continue
        except:
            break
    return race_results

Put the race you want to check in this race_id. For example, suppose you have an ID of 202009020611. this is,

2020 → Number of years
09 → Location(If it is 09, it is Hanshin, if it is 10, it is Kokura, etc.)
02 → month
06 → Sun
11 → Number of races

Is shown.

You can see it in this way as a trial.

image.png

We will analyze the data using basic pandas. For peace of mind, save it as a pickle file and csv.

Assuming that the acquired data is stored in resluts_new, it will be as follows.

results_new.to_pickle('results_new2017-2020')
results_new.to_csv('results_new2017-2020.csv',encoding="SHIFT-JIS")

At the end

We have summarized the data acquisition method easily.

Recommended Posts

I tried to get a database of horse racing using Pandas
I tried to get a list of AMI Names using Boto3
[Horse Racing] I tried to quantify the strength of racehorses
I tried to get the index of the list using the enumerate function
I tried to make a regular expression of "amount" using Python
I tried to make a regular expression of "time" using Python
I tried to make a regular expression of "date" using Python
I tried to make a function to retrieve data from database column by column using sql with sqlite3 of python [sqlite3, sql, pandas]
I tried using a database (sqlite3) with kivy
I tried to make a ○ ✕ game using TensorFlow
I tried to get the batting results of Hachinai using image processing
I learned scraping using selenium to make a horse racing prediction model.
I tried to perform a cluster analysis of customers using purchasing data
How to scrape horse racing data using pandas read_html
I tried to automate the 100 yen deposit of Rakuten horse racing (python / selenium)
I tried to get an AMI using AWS Lambda
[Python] I tried to get Json of squid ring 2
I tried using Python (3) instead of a scientific calculator
I tried to draw a configuration diagram using Diagrams
I tried to notify the update of "Become a novelist" using "IFTTT" and "Become a novelist API"
Every time I try to read a csv file using pandas, I get a numpy error.
I want to collect a lot of images, so I tried using "google image download"
I tried to get the location information of Odakyu Bus
I tried to get Web information using "Requests" and "lxml"
I tried "How to get a method decorated in Python"
I tried to get started with Hy ・ Define a class
I tried crawling and scraping a horse racing site Part 2
I tried to automate [a certain task] using Raspberry Pi
I tried to make a stopwatch using tkinter in python
I tried to make a simple text editor using PyQt
I tried to get data from AS / 400 quickly using pypyodbc
I tried using GrabCut of OpenCV
I tried to compare the accuracy of machine learning models using kaggle as a theme.
I tried to create a Python script to get the value of a cell in Microsoft Excel
I tried using PI Fu to generate a 3D model of a person from one image
I tried to automate the construction of a hands-on environment using IBM Cloud's SoftLayer API
A memorandum when I tried to get it automatically with selenium
I tried to implement anomaly detection using a hidden Markov model
[Python] A memo that I tried to get started with asyncio
I tried to create a list of prime numbers with python
I tried to make a todo application using bottle with python
Create a function to get the contents of the database in Go
[Python] I tried to get various information using YouTube Data API!
I tried to get data from AS / 400 quickly using pypyodbc Preparation 1
I tried to make a mechanism of exclusive control with Go
I tried to create a linebot (implementation)
I tried using Azure Speech to Text.
I tried to create a linebot (preparation)
I tried to get started with Hy
I tried playing a ○ ✕ game using TensorFlow
I tried drawing a line using turtle
I tried to classify text using TensorFlow
I tried to make a Web API
I tried using pipenv, so a memo
Vectorization of horse racing pedigree using fastText
I tried 3D detection of a car
I tried to predict Covid-19 using Darts
I tried to transform the face image using sparse_image_warp of TensorFlow Addons
I tried to estimate the similarity of the question intent using gensim's Doc2Vec
A memorandum of how to write pandas that I tend to forget personally
I tried to get the authentication code of Qiita API with Python.