[PYTHON] Scraping the result of "Schedule-kun"

Overview

The other day, the schedule for the 2020 Meiji Yasuda Life J League was announced. (Release) This release provides the J1, J2, and J3 League dates in PDF format. In addition, the J League provides various data on different sites based on the game axis, team axis, and player axis.

Jleague Data Site

This post uses `read_html``` provided by `pandas``` to page the page obtained from the" Schedule / Results "menu provided on the above site, instead of scraping the PDF format. Easy to get.

URL structure

https://data.j-league.or.jp/SFMS01/search?competition_years=2020&competition_frame_ids=1&competition_ids=477&tv_relay_station_name=

Code description

game_schedule.py


# cording:uft-8
import pandas as pd
yyyy = 2020
url = 'https://data.j-league.or.jp/SFMS01/search?'
category = {'1': 477, '2': 478, '3': 479}
schedule = pd.DataFrame(index=None, columns=['year', 'Tournament', 'section', 'Match day', 'K/O time', 'home', 'Score', 'Away', 'Stadium', 'Number of visitors', 'Internet broadcasting / TV broadcasting'])

Create J1, J2, J3 categories and yearly IDs in dic format. Create an empty data frame.

game_schedule.py


for key, value in category.items():
    para = 'competition_years=' + str(yyyy)
    para1 = '&competition_frame_ids=' + str(key)
    para2 = '&competition_ids=' + str(value)
    para3 = '&tv_relay_station_name='

    full_url = url + para + para1 + para2 + para3
    # print(full_url)
    df = pd.read_html(full_url, attrs={'class': 'table-base00 search-table'}, skiprows=0)
    schedule = pd.concat([schedule, df[0]], sort=False)

The point is pd.read_html (full_url, attrs = {'class':'table-base00 search-table'} ..., which specifies the target URL and the attributes of <table>. Combine the retrieved ones into the schedule.

game_schedule.py


#If you want to replace NaN
# schedule = schedule.fillna({'KO time': '● Undecided ●', 'Visitors':0})
schedule.to_csv('./csv/Game_Schedule_' + str(yyyy) + '.csv', index=False, sep=',')

Save in csv format in the specified folder.

Summary

Utilization of data

About "Schedule-kun"

Recommended Posts

Scraping the result of "Schedule-kun"
Process the result of% time,% timeit
Scraping the usage history of the community cycle
The result of installing python in Anaconda
View the result of geometry processing in Python
Extract only complete from the result of Trinity
Scraping the winning data of Numbers using Docker
The beginning of cif2cell
The meaning of self
Basics of Python scraping basics
the zen of Python
The story of sys.path.append ()
Revenge of the Types: Revenge of types
Scraping the usage history of the community cycle PhantomJS version
A memorandum about the warning of the pylint output result
I want to grep the execution result of strace
The definitive edition of python scraping! (Target site: BicCamera)
I tried scraping the advertisement of the pirated cartoon site
Decrease the class name of the detection result display of object detection
Output the output result of sklearn.metrics.classification_report as a CSV file
Align the version of chromedriver_binary
Try scraping the data of COVID-19 in Tokyo with Python
10. Counting the number of lines
The story of building Zabbix 4.4
Towards the retirement of Python2
Measure the execution result of the program in C ++, Java, Python.
[Apache] The story of prefork
The result of Java engineers learning machine learning in Python www
Scraping the Excel file of the list of stores handling regional coupons
Scraping member images from the official website of Sakamichi Group
Compare the fonts of jupyter-themes
About the ease of Python
Get the number of digits
Explain the code of Tensorflow_in_ROS
Reuse the results of clustering
GoPiGo3 of the old man
Calculate the number of changes
scraping the Nikkei 225 with playwright-python
Change the theme of Jupyter
The popularity of programming languages
Change the style of matplotlib
Visualize the orbit of Hayabusa2
About the components of Luigi
Connected components of the graph
Filter the output of tracemalloc
About the features of Python
Output the result of gradient descent method as matplotlib animation
Simulation of the contents of the wallet
The Power of Pandas: Python
I tried scraping the ranking of Qiita Advent Calendar with Python
Save the result of the life game as a gif with python
Scraping the schedule of Hinatazaka46 and reflecting it in Google Calendar
[Python] Let's reduce the number of elements in the result in set operations
Feature extraction by TF method using the result of morphological analysis
Flow of getting the result of asynchronous processing using Django and Celery
Studying web scraping for the purpose of extracting data from Filmarks # 2
How to output the output result of the Linux man command to a file
When you want to save the result of the callback function somewhere
Convert the result of python optparse to dict and utilize it
[Introduction to SIR model] Consider the fitting result of Diamond Princess ♬
[Word2vec] Let's visualize the result of natural language processing of company reviews