Horse Racing Site Web Scraping with Python

How to web scrape with Python

About this article

I don't know how many brews it will be, but I will describe how to put the information on the horse racing site into CSV format. The language used is Python 3.6, and the environment is Jupyter Notebook.

I'm new to Python, so I'm wondering if there's verbose code or smarter techniques. However, this time the purpose is not to create beautiful code, so it is good to recognize it as a future improvement.

Scraping destination

Extract information from the following sites. Site name: netkeiba.com (https://www.netkeiba.com/?rf=logo)

#Preliminary knowledge for URL generation

At netkeiba.com, there is a web page for each race. The URL of this web page is determined by the following rules.

https://race.netkeiba.com/race/result.html?race_id=開催年+競馬場コード+開催回数+日数+レース数+&rf=race_list

Let's take the first race of the 4th day of Tokyo, which was held on May 3, 2020 at Tokyo Racecourse, as an example.

--Date: 2020 (= 2020) --Racetrack code: 05 (= Tokyo) --Number of times: 02 (= 2 times) --Days: 04 (= 4th day) --Number of races: 01 (= 1st race)

Racetrack name Racetrack code
Sapporo 01
Hakodate 02
Fukushima 03
Niigata 04
Tokyo 05
Nakayama 06
Chukyo 07
Kyoto 08
Hanshin 09
Kokura 10

When the above is applied, it becomes as follows. https://race.netkeiba.com/race/result.html?race_id=202005020401&rf=race_list

URL generation-information acquisition (code)

This is the code from URL generation to information acquisition.

web1.ipynb


# -*- coding: utf-8 -*- 
import csv
import requests
import codecs
import time
from datetime import datetime as dt
from collections import Counter
from bs4 import BeautifulSoup
import re
import pandas

race_date ="2020"
race_course_num="06"
race_info ="03"
race_count ="05"
race_no="01"
url = "https://race.netkeiba.com/race/result.html?race_id="+race_date+race_course_num+race_info+race_count+race_no+"&rf=race_list"

#Get the data of the corresponding URL in HTML format
race_html=requests.get(url)
race_html.encoding = race_html.apparent_encoding  
race_soup=BeautifulSoup(race_html.text,'html.parser')
print(url)

After doing the above, you will see the generated URL.

Get table (code)

It is the code to get the table from the obtained HTML text. (Added to the above code)

web1.ipynb


#Get and save only the race table
HorseList = race_soup.find_all("tr",class_="HorseList")

#Lace table shaping
#Create a list to include the race table
Race_lists = []
#Number of rows in the table=15("Order of arrival,frame,Horse number,Horse name,Sexual age,Weight,Jockey,time,Difference,Popular,Win odds,After 3F,Corner passing order,stable,Horse weight(Increase / decrease))
Race_row = 15

#Count the number of runners
uma_num = len(HorseList)

#Remove unnecessary strings and store in list
for i in range(uma_num):
    Race_lists.insert(1+i, HorseList[i])
    
    Race_lists[i] = re.sub(r"\n","",str(Race_lists[i]))
    Race_lists[i] = re.sub(r" ","",str(Race_lists[i]))
    Race_lists[i] = re.sub(r"</td>",",",str(Race_lists[i]))
    Race_lists[i] = re.sub(r"<[^>]*?>","",str(Race_lists[i]))
    Race_lists[i] = re.sub(r"\[","",str(Race_lists[i]))
    Race_lists[i] = re.sub(r"\]","",str(Race_lists[i]))
    print(Race_lists[i])

When the above is executed, the output will be as follows. 1,1,1, Red Calm, Female 3,54.0, Shu Ishibashi, 1: 25.7 ,, 3,4.6,37.1 ,, Takeshi Miho Okumura, 512 (-4), 2,6,12, Sanky West, Female 3,54.0, Iwabe, 1: 25.7, Hana, 2,3.2,36.5 ,, Miho Kayano, 442 (-8), (Omitted below)

Other tables can be obtained in a similar way, with some differences.

Save in CSV format (code)

Now that you have the information you want, save it as a CSV file. (Added to the above code)

web1.ipynb


#open csv
out=codecs.open("./this_race_table"+race_date+race_course_num+race_info+race_count+race_no+".csv","w")
#This time, the column name is described in CSV for the sake of clarity..(Note that you don't really need it)
out.write("Order of arrival,frame,Horse number,Horse name,Sexual age,Weight,Jockey,time,Difference,Popular,Win odds,After 3F,Corner passing order,stable,Horse weight(Increase / decrease)\n")

#Fill in the contents of the race table list in csv
for i in range(uma_num):
    out.write(str(Race_lists[i]+"\n")) 
    
out.close()

When you execute the above, CSV will be created in the folder where the source code file exists.

This is the end of scraping.

Other

Please note that web scraping (crawlers) can be illegal, as represented by the Librahack case.

Recommended Posts

Horse Racing Site Web Scraping with Python
Web scraping with python + JupyterLab
Web scraping beginner with python
Web scraping with Python First step
I tried web scraping with python.
Scraping with Python
Scraping with Python
WEB scraping with Python (for personal notes)
Getting Started with Python Web Scraping Practice
[Personal note] Web page scraping with python3
Web scraping with Python ② (Actually scraping stock sites)
Getting Started with Python Web Scraping Practice
Scraping from an authenticated site with python
Practice web scraping with Python and Selenium
Easy web scraping with Python and Ruby
[For beginners] Try web scraping with Python
Get past performance of runners from Python scraping horse racing site
Try scraping with Python.
Scraping with Selenium [Python]
Python web scraping selenium
Scraping with Python + PyQuery
Scraping RSS with Python
AWS-Perform web scraping regularly with Lambda + Python + Cron
Let's do web scraping with Python (weather forecast)
Let's do web scraping with Python (stock price)
I tried scraping with Python
Data analysis for improving POG 1 ~ Web scraping with Python ~
Scraping with selenium in Python
Scraping with Selenium + Python Part 1
Web scraping notes in python3
Scraping with chromedriver in python
Festive scraping with Python, scrapy
Save images with web scraping
Scraping with Selenium in Python
Quick web scraping with Python (while supporting JavaScript loading)
Easy web scraping with Scrapy
Scraping with Tor in Python
Web API with Python + Falcon
Python beginners get stuck with their first web scraping
Horse Racing Data Scraping Flow
Web scraping using Selenium (Python)
Scraping weather forecast with python
Scraping with Selenium + Python Part 2
Web application with Python + Flask ② ③
I tried scraping with python
Streamline web search with python
Web application with Python + Flask ④
[8th] Let's predict horse racing with Python ~ Review so far ~
Web crawling, web scraping, character acquisition and image saving with python
I tried crawling and scraping a horse racing site Part 2
Try scraping with Python + Beautiful Soup
Scraping with Node, Ruby and Python
Scraping with Selenium in Python (Basic)
Web scraping with BeautifulSoup4 (layered page)
Scraping with Python, Selenium and Chromedriver
Getting Started with Python Web Applications
Scraping Alexa's web rank with pyQuery
Scraping with Python and Beautiful Soup
Monitor Python web apps with Prometheus
Get web screen capture with python
Let's do image scraping with Python