Try importing MLB data on Mac and Python

environment:

Introduction

In this article, I want to analyze baseball data like Sabermetrics myself! This is a summary of the process of bringing in the necessary data that I did and putting it in the database.

Trigger

As of October 29, 2015, Softbank won the Japan Series in Japan, and the Royals won the World Series for the second time in a row, which is a very exciting season for me as a baseball lover.

The reason why I was interested in Saber was that I wanted to study data analysis at first. Then, when I thought about what kind of data I should analyze and how to get the most out of it, I arrived at the idea of hacking my favorite baseball data with Python.

Therefore, I hope this article will be of some help to those who are thinking about the same thing.

Retrosheet Unfortunately, as for NPB, as detailed data as MLB is not published and cannot be downloaded (it can be collected by scraping).

So, this time, I will download the MLB data from Retrosheet (there are other Lahman etc.) and put it in the database.

(1) py-retrosheet, Chadwick Retrosheet data is very difficult to read as it is, and data cleaning is very troublesome. However, there are some people who have arranged such data neatly, so I will use that code.

$ brew update
$ brew install chadwick
$ git clone https://github.com/wellsoliver/py-retrosheet.git

Also download the libraries needed to use py-retrosheet. mysql-python required for sqlalchemy and MySQL used this time.

$ pip install sqlalchemy mysql-python

(2) MySQL Data download and import to SQL are also done with py-retrosheet, so I will prepare a database such as MySQL.

$ brew install mysql

Next, make various settings for MySQL. (This time, create and use a test user)

$ mysql_secure_installation #Security settings such as password
$ mysql.server start
$ mysql -u root -p
Enter password:What you set

mysql> CREATE DATABASE retrosheet #Create database
mysql> CREATE USER test IDENTIFIED BY "*****(Your Password)" #Create a test user named test
mysql> GRANT ALL ON retrosheet.* TO test #Set retrosheet database access to test users

After that, import the schema with the test user created earlier

$ mysql -u test[USER_NAME] -p retrosheet[DB_NAME] < sql/schema.sql

This completes the MySQL settings.

parse.py[line58-59,61]


sql = 'SELECT * FROM rosters WHERE year = %s AND player_id = %s AND team_tx = %s'
res = conn.execute(sql, [row[0], row[1], row[6]])

if results.rowcount() == 1:

parse.py[Revised]


sql = 'SELECT * FROM rosters WHERE year = ? AND player_id = ? AND team_tx = ?'
res = conn.execute(sql, [row[0], row[1], row[6]])

if len(results.fetchall()) == 1:

(3) Edit config.ini

Finally, edit confit.ini and set up the database you just created

$ cd scripts
$ cp config.ini.dist config.ini
$ vim config.ini  # config.ini edit

config.ini


[database]
engine = mysql # or sqlite, postgresql
host = localhost
database = retrosheet
schema = retrosheet
user = test
password = ****** # your password

(4) Finally

Then download the data with download.py and import it into SQL with parse.py!

$ python download.py -y 2014  #Specify the year to download in the Christian era, no option is possible
$ python parse.py -y 2014  #Import into SQL

Check the contents of SQL with this, and if it can be imported properly, it ends. All you have to do is analyze the data as you like!

mysql> select * from retrosheet.teams;

+---------+-------+---------------+--------------+
| TEAM_ID | LG_ID | LOC_TEAM_TX   | NAME_TEAM_TX |
+---------+-------+---------------+--------------+
| ANA     | A     | Anaheim       | Angels       |
| BAL     | A     | Baltimore     | Orioles      |
| BOS     | A     | Boston        | Red Sox      |
| CHA     | A     | Chicago       | White Sox    |
| CLE     | A     | Cleveland     | Indians      |
| DET     | A     | Detroit       | Tigers       |
| HOU     | A     | Houston       | Astros       |
| KCA     | A     | Kansas City   | Royals       |
| MIN     | A     | Minnesota     | Twins        |
| NYA     | A     | New York      | Yankees      |
| OAK     | A     | Oakland       | Athletics    |
| SEA     | A     | Seattle       | Mariners     |
| TBA     | A     | Tampa Bay     | Rays         |
| TEX     | A     | Texas         | Rangers      |
| TOR     | A     | Toronto       | Blue Jays    |
| ARI     | N     | Arizona       | Diamondbacks |
| ATL     | N     | Atlanta       | Braves       |
| CHN     | N     | Chicago       | Cubs         |
| CIN     | N     | Cincinnati    | Reds         |
| COL     | N     | Colorado      | Rockies      |
| LAN     | N     | Los Angeles   | Dodgers      |
| MIA     | N     | Miami         | Marlins      |
| MIL     | N     | Milwaukee     | Brewers      |
| NYN     | N     | New York      | Mets         |
| PHI     | N     | Philadelphia  | Phillies     |
| PIT     | N     | Pittsburgh    | Pirates      |
| SDN     | N     | San Diego     | Padres       |
| SFN     | N     | San Francisco | Giants       |
| SLN     | N     | St. Louis     | Cardinals    |
| WAS     | N     | Washington    | Nationals    |
+---------+-------+---------------+--------------+

Recommended Posts

Try importing MLB data on Mac and Python
Install selenium on Mac and try it with python
python on mac
Notes on building Python and pyenv on Mac
Install Python on Mac
Install Python 3 on Mac
Try working with Mongo in Python on Mac
Install Python 3.4 on Mac
Send and receive binary data via serial communication with python3 (on mac)
Try deepdream on Mac
Install Python3 on Mac and build environment [Definitive Edition]
Installing Python 3 on Mac and checking basic operation Part 1
Python on Ruby and angry Ruby on Python
Install pygame on python3.4 on mac
[Python] Notes on data analysis
Handling of python on mac
Try "100 knocks on data science" ①
Update python on Mac to 3.7-> 3.8
Install pandas 0.14 on python3.4 [on Mac]
Notes on installing Python on Mac
Catalina on Mac and pyenv
Notes on importing data from MySQL or CSV with Python
Building a Python environment on a Mac and using Jupyter lab
Build a python data analysis environment on Mac (El Capitan)
Building a Python environment on Mac
If python on mac goes missing
Notes on Python and dictionary types
Python 3.6 on Windows ... and to Xamarin.
MQTT on Raspberry Pi and Mac
Create a Python environment on Mac (2017/4)
Python environment construction memo on Mac
Install Python 3.7 Anaconda on MAC, but Python 2
Install python3 on Mac (El Capitan)
Integrate Modelica and Python on Windows
Hashing data in R and Python
Environment construction of python3.8 on mac
Install fabric on Ubuntu and try
If Python 3.5.0 installation fails on Mac
Try running Jupyter Notebook on Mac
Call C / C ++ from Python on Mac
Update Python on Mac from 2 to 3
Build a Python environment on your Mac with Anaconda and PyCharm
Error and solution when installing python3 with homebrew on mac (catalina 10.15)
Get data from MySQL on a VPS with Python 3 and SQLAlchemy
A python script that deletes ._DS_Store and ._ * files created on Mac
Data pipeline construction with Python and Luigi
Python and Bash on Cisco Catalyst IOS-XE
Build Python3 and OpenCV environment on Ubuntu 18.04
Python virtual environment and packages on Ubuntu
Try working with binary data in Python
Introducing MAMP + Python + SQLAlchemy (Mysql-Python) on Mac
Create a python environment on your Mac
Try python
A memo with Python2.7 and Python3 on CentOS
Python data structure and internal implementation ~ List ~
Follow active applications on Mac with Python
How to erase Python 2.x on Mac.
Python data structure and operation (Python learning memo ③)
Build Python environment with Anaconda on Mac
Put MeCab binding for Python with pip on Windows, mac and Linux
Easily graph data in shell and Python