Predict horse racing with machine learning and aim for a recovery rate of 100%.
In Previous article, I scraped the data of all race results in 2019 from netkeiba.com. This time, in addition to this, I would like to scrape data such as race date information and riding conditions.
Like last time, if you put a list of race_id, create a function that returns the scraping result in dictionary type for each race.
import requests
from bs4 import BeautifulSoup
import time
from tqdm.notebook import tqdm
import re
def scrape_race_info(race_id_list):
race_infos = {}
for race_id in tqdm(race_id_list):
try:
url = "https://db.netkeiba.com/race/" + race_id
html = requests.get(url)
html.encoding = "EUC-JP"
soup = BeautifulSoup(html.text, "html.parser")
texts = (
soup.find("div", attrs={"class": "data_intro"}).find_all("p")[0].text
+ soup.find("div", attrs={"class": "data_intro"}).find_all("p")[1].text
)
info = re.findall(r'\w+', texts) #Hitting a backslash in Qiita causes a bug, so it is capitalized.
info_dict = {}
for text in info:
if text in ["Turf", "dirt"]:
info_dict["race_type"] = text
if "Obstacle" in text:
info_dict["race_type"] = "Obstacle"
if "m" in text:
info_dict["course_len"] = int(re.findall(r"\d+", text)[0]) #This is also capitalized.
if text in ["Good", "Going", "Heavy", "不Good"]:
info_dict["ground_state"] = text
if text in ["Cloudy", "Fine", "rain", "小rain", "Koyuki", "snow"]:
info_dict["weather"] = text
if "Year" in text:
info_dict["date"] = text
race_infos[race_id] = info_dict
time.sleep(1)
except IndexError:
continue
except Exception as e:
print(e)
break
return race_infos
Create race_id_list from Last scraped data, make it DataFrame type like last time, and merge it with the original data.
race_id_list = results.index.unique()
race_infos = scrape_race_info(race_id_list)
for key in race_infos:
race_infos[key].index = [key] * len(race_infos[key])
race_infos = pd.concat([pd.DataFrame(race_infos[key], index=[key]) for key in race_infos])
results = results.merge(race_infos, left_index=True, right_index=True, how='left')
The completed data looks like this.
We have a detailed explanation in the video! Data analysis and machine learning starting with horse racing prediction
Recommended Posts