What i want to say

** The Python script, which took a little over 20 minutes to acquire and create match data for one year, has been improved to the point where data for 15 years can be created in 10 minutes. ** **

Please see this repository for the results.

Shinichi-Nakagawa/py-retrosheet

We have succeeded in improving the performance by about 20 times compared to the original!

However, I haven't used a lot of tech so much, so I'm waiting for your opinion.

This story is a sequel to the PyCon JP 2015 baseball story.

Match start ~ After PyCon JP announcement

This is Qiita's first post. I look forward to working with you.

[Baseball Hack! ~ Data analysis and visualization using Python] at PyCon JP 2015 (http://www.slideshare.net/shinyorke/hackpython-pyconjp "Baseball Hack! ~ Data analysis and visualization using Python") I downloaded and migrated MLB data and analyzed and visualized it using IPython notebook and pandas! I introduced the story.

Among them, "[py-retrosheet](https://github.com/wellsoliver/py-retrosheet" py-retrosheet ")"

** Download MLB match & player performance data "[RETROSHEET](http://retrosheet.org/" retrosheet.org ")" and migrate to your favorite database! ** **

I introduced the library with a doy face, but I found various unpleasant things in this library, although it is not a bug.

pinch! ~ The darkness of py-retrosheet

Without py-retrosheet, I couldn't make it to PyCon, and I couldn't play with Joey Votto or Adam Dunn. I am very grateful to the authors and contributors of the library, but when I tried using it, I had the following non-negligible problems.

Anyway, the process is slow!
The maintainability of the code itself is very poor!
Python2 system, commonly known as "Legacy Python" only, Python 3 is not supported

Retrosheet data consists of records of about 190,000 to 200,000 lines per season, but where are you taking a detour?

** It took 20 minutes from downloading the data to creating the database. ** **

Measurement results on Macbook Pro 13inch (Late 2013, 2.8GHz Intel Core i7) (the same applies below), DB created on Docker (MySQL 5.7)

It takes 1 minute to create 10,000 rows of data, which is impossible.

If you read the code for a moment, you can do extra scraping, issue a mysterious sql statement (such as throwing a select statement with a unique key check to create a database from a fresh state) for all records, and behave. I used to insert lines one by one, and so on.

** And almost copy and paste! ** **

Also, since I personally want to make Python 3 the main development, I thought that a library that only supports Python 2 commonly known as "Legacy Python" would not be cool, so I decided to improve the library.

Improvement! ~ Python 3 support and remake

So, I immediately forked the original repository and contributed myself. -Nakagawa / py-retrosheet "py-retrosheet") Ready and started to improve. Improvements were made in the following order.

Supports Python 3 while maintaining the appearance of the original code
~~ Refactor the appearance of the original code ~~ (give up) Create a new py-retrosheet for Python 3.5

The problem of slow processing is more serious than anything else, and the original order is to start from there, but I decided to aim to understand the contents of the code accurately and find the bottleneck, and first the original code is Python 3 I started from the corresponding place. In addition to grasping the specifications, the test conditions were clear (it is OK if the same movement as the original is done).

After that, I intended to refactor the original code, improve maintainability, and attack the bottleneck of speed delay.

** The code was so crazy that I decided to remake it. ** **

Supports Python 3

Convert the code using 2to3, which is a standard when migrating from Python 2 to 3, and respond while steadily eliminating Exceptions. Did. By the way, I fixed as much as possible where it doesn't follow pep8 and it seems better (or can be fixed) to fix it.

The hard part was that such an import statement was unavoidably burst.

`scripts/download.py`


import os
import requests
try:
    # Python 3.x
    from configparser import ConfigParser
    import queue
except ImportError:
    # Python 2.x (Ngo I don't want to write...orz)
    from ConfigParser import ConfigParser
    import Queue as queue

I managed to escape by changing the unit of import and cutting the alias.

After this renovation was completed, I issued Pull request to the original author.

Create a New py-retrosheet for Python 3.5

After supporting Python 3, I thought about writing a test and refactoring it, but I was angry and decided to rewrite it because it was difficult to write a test and the code was not good in the first place.

py-retrosheet

Download the data from RETROSHEET
Use a library called Chadwick to recreate the data into a migrating format.
Implementation of Maigure

It's really simple to do, but I felt that I was doing this simple thing intentionally, so at the same time as reviewing the configuration itself, I added the specifications added from Python 3.5 (because I want to try it personally). Let's make it a utilized library! So I made some improvements.

In addition, the original script was designed to acquire and create data for only one season (one year) per execution, but since there was a request for data for two years or more of the actual problem, we decided to support this as well. ..

First of all, the download looks like this. This is actually almost the same as the original code.

`scripts/retrosheet_download.py`


#!/usr/bin/env python
# -*- coding: utf-8 -*-
"""
Download .
Python 3.5.0+ (don't know about 3.4- and 2.x, sorry)
MySQL 5.6.0+ (don't know about 5.5- , sorry)
"""

import os
import click
from configparser import ConfigParser
from classes.fetcher import Fetcher
from queue import Queue

__author__ = 'Shinichi Nakagawa'


class RetrosheetDownload(object):

    FILES = (
        {
            'name': 'eventfiles',
            'config_flg': 'dl_eventfiles',
            'url': 'eventfiles_url',
            'pattern': r'({year})eve\.zip',
            'download_url': 'http://www.retrosheet.org/events/{year}eve.zip',
        },
        {
            'name': 'gamelogs',
            'config_flg': 'dl_gamelogs',
            'url': 'gamelogs_url',
            'pattern': r'gl({year})\.zip',
            'download_url': 'http://www.retrosheet.org/gamelogs/gl{year}.zip',
        }
    )

    def __init__(self, configfile: str):
        """
        initialize
        :param configfile: configuration file
        """
        # configuration
        self.config = ConfigParser()
        self.config.read(configfile)
        self.num_threads = self.config.getint('download', 'num_threads')
        self.path = self.config.get('download', 'directory')
        self.absolute_path = os.path.abspath(self.path)

    def download(self, queue):
        """
        Download & a Archives
        :param from_year: Season(from)
        :param to_year: Season(to)
        :param configfile: Config file
        """
        threads = []
        for i in range(self.num_threads):
            t = Fetcher(queue, self.absolute_path, {'verbose': self.config.get('debug', 'verbose')})
            t.start()
            threads.append(t)
        for thread in threads:
            thread.join()

    @classmethod
    def run(cls, from_year: int, to_year: int, configfile: str):
        """
        :param from_year: Season(from)
        :param to_year: Season(to)
        :param configfile: Config file
        """
        client = RetrosheetDownload(configfile)

        if not os.path.exists(client.absolute_path):
            print("Directory %s does not exist, creating..." % client.absolute_path)
            os.makedirs(client.absolute_path)

        urls = Queue()
        for year in range(from_year, to_year + 1):
            for _file in RetrosheetDownload.FILES:
                urls.put(_file['download_url'].format(year=year))

        client.download(urls)


@click.command()
@click.option('--from_year', '-f', default=2001, help='From Season')
@click.option('--to_year', '-t', default=2014, help='To Season')
@click.option('--configfile', '-c', default='config.ini', help='Config File')
def download(from_year, to_year, configfile):
    """
    :param from_year: Season(from)
    :param to_year: Season(to)
    :param configfile: Config file
    """
    # from <= to check
    if from_year > to_year:
        print('not From <= To({from_year} <= {to_year})'.format(from_year=from_year, to_year=to_year))
        raise SystemExit

    RetrosheetDownload.run(from_year, to_year, configfile)


if __name__ == '__main__':
    download()

I stopped getting unnecessary pages and scraping.

In addition, regarding the problem of "complex handling of arguments around the command line and config", which was a maintenance issue for py-retrosheet in general, the former is click, the latter was solved by aggregating the config acquisition location in the initial processing of the class.

This also applies to data acquisition and creation, which will be introduced later.

Next is the part to get the data using chadwick.

`scripts/parse_csv.py`


#!/usr/bin/env python
# -*- coding: utf-8 -*-
"""
Parse to Event Files, Game Logs and Roster.
Python 3.5.0+ (don't know about 3.4- and 2.x, sorry)
"""
import os
import subprocess
import click
from configparser import ConfigParser

__author__ = 'Shinichi Nakagawa'


class ParseCsv(object):

    CW_EVENT = '{chadwick_path}cwevent'
    CW_GAME = '{chadwick_path}cwgame'
    CW_EVENT_CMD = '{chadwick_path}cwevent -q -n -f 0-96 -x 0-62 -y {year} {year}*.EV* > {csvpath}/events-{year}.csv'
    CW_GAME_CMD = '{chadwick_path}cwgame -q -n -f 0-83 -y {year} {year}*.EV* > {csvpath}/games-{year}.csv'
    EV_FILE_PATTERN = '{path}/{year}*.EV*'
    EVENT_FILE = '{csvpath}/events-{year}.csv'
    GAME_FILE = '{csvpath}/games-{year}.csv'
    CSV_PATH = 'csv'

    def __init__(self):
        pass

    @classmethod
    def exists_chadwick(cls, chadwick_path: str):
        """
        exists chadwick binary
        :param chadwick_path: chadwick path
        :return: True or False
        """
        if os.path.exists(chadwick_path) \
            & os.path.exists(cls.CW_EVENT.format(chadwick_path=chadwick_path)) \
                & os.path.exists(cls.CW_GAME.format(chadwick_path=chadwick_path)):
            return True
        return False

    @classmethod
    def generate_files(
            cls,
            year: int,
            cmd_format: str,
            filename_format: str,
            chadwick_path: str,
            verbose: bool,
            csvpath: str
    ):
        """
        Generate CSV file
        :param year: Season
        :param cmd_format: Command format
        :param filename_format: Filename format
        :param chadwick_path: Chadwick Command Path
        :param verbose: Debug flg
        :param csvpath: csv output path
        """
        cmd = cmd_format.format(csvpath=csvpath, year=year, chadwick_path=chadwick_path)
        filename = filename_format.format(csvpath=csvpath, year=year)
        if os.path.isfile(filename):
            os.remove(filename)
        if verbose:
            print('calling {cmd}'.format(cmd=cmd))
        subprocess.call(cmd, shell=True)

    @classmethod
    def generate_retrosheet_files(
            cls,
            from_year: int,
            to_year: int,
            chadwick_path: str,
            verbose: str,
            csvpath: str
    ):
        """
        Generate CSV file
        :param from_year: Season(from)
        :param to_year: Season(to)
        :param chadwick_path: Chadwick Command Path
        :param verbose: Debug flg
        :param csvpath: csv output path
        """
        # generate files
        for year in [year for year in range(from_year, to_year + 1)]:
            # game
            ParseCsv.generate_files(
                year=year,
                cmd_format=ParseCsv.CW_GAME_CMD,
                filename_format=ParseCsv.GAME_FILE,
                chadwick_path=chadwick_path,
                verbose=verbose,
                csvpath=csvpath
            )
            # event
            ParseCsv.generate_files(
                year=year,
                cmd_format=ParseCsv.CW_EVENT_CMD,
                filename_format=ParseCsv.EVENT_FILE,
                chadwick_path=chadwick_path,
                verbose=verbose,
                csvpath=csvpath
            )

    @classmethod
    def run(cls, from_year: int, to_year: int, configfile: str):
        """
        :param from_year: Season(from)
        :param to_year: Season(to)
        :param configfile: Config file
        """
        config = ConfigParser()
        config.read(configfile)
        verbose = config.getboolean('debug', 'verbose')
        chadwick = config.get('chadwick', 'directory')
        path = os.path.abspath(config.get('download', 'directory'))
        csv_path = '{path}/csv'.format(path=path)

        # command exists check
        if not cls.exists_chadwick(chadwick):
            print('chadwick does not exist in {chadwick} - exiting'.format(chadwick=chadwick))
            raise SystemExit

        # make directory
        os.chdir(path)
        if not os.path.exists(ParseCsv.CSV_PATH):
            os.makedirs(ParseCsv.CSV_PATH)

        # generate files
        cls.generate_retrosheet_files(
            from_year=from_year,
            to_year=to_year,
            chadwick_path=chadwick,
            verbose=verbose,
            csvpath=csv_path
        )

        # change directory
        os.chdir(os.path.dirname(os.path.abspath(__file__)))


@click.command()
@click.option('--from_year', '-f', default=2001, help='From Season')
@click.option('--to_year', '-t', default=2014, help='To Season')
@click.option('--configfile', '-c', default='config.ini', help='Config File')
def create_retrosheet_csv(from_year, to_year, configfile):
    """
    :param from_year: Season(from)
    :param to_year: Season(to)
    :param configfile: Config file
    """
    # from <= to check
    if from_year > to_year:
        print('not From <= To({from_year} <= {to_year})'.format(from_year=from_year, to_year=to_year))
        raise SystemExit
    ParseCsv.run(from_year, to_year, configfile)


if __name__ == '__main__':
    create_retrosheet_csv()

The game data (game log) and the event data (events log) during the game such as at-bats and stolen bases were processed separately, but they were doing the same thing (only the command used and the output destination were different). The process is grouped into methods and the definition is taken from the outside. This is the same as the implementation of data creation.

`scripts/retrosheet_mysql.py`


#!/usr/bin/env python
# -*- coding: utf-8 -*-
"""
Migrate Database, Game Logs and Roster.
Python 3.5.0+ (don't know about 3.4- and 2.x, sorry)
MySQL 5.6.0+ (don't know about 5.5- , sorry)
"""
import os
import csv
import click
import sqlalchemy
from glob import glob
from configparser import ConfigParser, NoOptionError

__author__ = 'Shinichi Nakagawa'


class RetrosheetMySql(object):

    DATABASE_ENGINE = 'mysql+pymysql'
    ENGINE = '{engine}://{user}:{password}@{host}/{database}'
    TABLES = (
        {
            'name': 'teams',
            'header': False,
            'year': False,
            'mask': '{path}/TEAM{year}*',
            'select': "SELECT * FROM teams WHERE team_id = '{key_0}'",
            'where_index': [0],
            'insert': 'INSERT INTO teams VALUES {values}',
        },
        {
            'name': 'rosters',
            'header': False,
            'year': True,
            'mask': '{path}/*{year}*.ROS',
            'select': "SELECT * FROM rosters WHERE year = {year} AND player_id = '{key_0}' AND team_tx = '{key_1}'",
            'where_index': [0, 5],
            'insert': 'INSERT INTO rosters VALUES {values}',
        },
        {
            'name': 'games',
            'header': True,
            'year': False,
            'mask': '{path}/csv/games-{year}*.csv',
            'select': "SELECT * FROM games WHERE game_id = '{key_0}'",
            'where_index': [0],
            'insert': 'INSERT INTO games({columns}) VALUES {values}',
        },
        {
            'name': 'events',
            'header': True,
            'year': False,
            'mask': '{path}/csv/events-{year}*.csv',
            'select': "SELECT * FROM events WHERE game_id = '{key_0}' AND event_id = '{key_1}'",
            'where_index': [0, 96],
            'insert': 'INSERT INTO events({columns}) VALUES {values}',
        },
    )

    def __init__(self, configfile: str):
        """
        initialize
        :param configfile: configuration file
        """
        # configuration
        config = ConfigParser()
        config.read(configfile)
        self.path = os.path.abspath(config.get('download', 'directory'))
        self.record_check = config.getboolean('retrosheet_mysql', 'record_check')
        self.multiple_insert_rows = config.getint('retrosheet_mysql', 'multiple_insert_rows')
        # connection
        self.connection = self._generate_connection(config)

    @classmethod
    def _generate_connection(cls, config: ConfigParser):
        """
        generate database connection
        :param config: ConfigParser object
        :return:
        """

        try:
            database_engine = cls.DATABASE_ENGINE
            database = config.get('database', 'database')

            host = config.get('database', 'host')
            user = config.get('database', 'user')
            password = config.get('database', 'password')
        except NoOptionError:
            print('Need to define engine, user, password, host, and database parameters')
            raise SystemExit

        db = sqlalchemy.create_engine(
            cls.ENGINE.format(
                engine=database_engine,
                user=user,
                password=password,
                host=host,
                database=database,
            )
        )

        return db.connect()

    @classmethod
    def _exists_record(cls, year: int, table: dict, csv_row: list, connection):
        where = {'key_{i}'.format(i=i): csv_row[v] for i, v in enumerate(table['where_index'])}
        where['year'] = year
        sql = table['select'].format(**where)
        res = connection.execute(sql)
        if res.rowcount > 0:
            return True
        return False

    def _multiple_insert(self, query: str, columns: list, values: list):
        params = {
            'columns': columns,
            'values': ', '.join(values),
        }
        sql = query.format(**params)
        try:
            self.connection.execute(sql)
        except Exception as e:
            print(e)
            raise SystemExit

    def _create_record(self, year: int, csv_file: str, table: dict):
        reader = csv.reader(open(csv_file))
        headers = []
        values = []
        if table['header']:
            headers = next(reader)
        columns = ', '.join(headers)
        for row in reader:
            # Record Exists Check
            if self.record_check and RetrosheetMySql._exists_record(year, table, row, self.connection):
                continue
            # append record values
            if table['year']: row.insert(0, str(year))
            values.append('("{rows}")'.format(rows='", "'.join(row)))
            if len(values) == self.multiple_insert_rows:
                # inserts
                self._multiple_insert(table['insert'], columns, values)
                values = []
        if len(values) > 0:
            # inserts
            self._multiple_insert(table['insert'], columns, values)

    def execute(self, year: int):
        for table in self.TABLES:
            for csv_file in glob(table['mask'].format(path=self.path, year=year)):
                self._create_record(year, csv_file, table)

    @classmethod
    def run(cls, from_year: int, to_year: int, configfile: str):
        """
        :param from_year: Season(from)
        :param to_year: Season(to)
        :param configfile: Config file
        """
        client = RetrosheetMySql(configfile)
        for year in range(from_year, to_year + 1):
            client.execute(year)
        client.connection.close()


@click.command()
@click.option('--from_year', '-f', default=2001, help='From Season')
@click.option('--to_year', '-t', default=2014, help='To Season')
@click.option('--configfile', '-c', default='config.ini', help='Config File')
def migration(from_year, to_year, configfile):
    """
    :param from_year: Season(from)
    :param to_year: Season(to)
    :param configfile: Config file
    """
    # from <= to check
    if from_year > to_year:
        print('not From <= To({from_year} <= {to_year})'.format(from_year=from_year, to_year=to_year))
        raise SystemExit

    RetrosheetMySql.run(from_year, to_year, configfile)


if __name__ == '__main__':
    migration()

The most improvements in this implementation,

Change insert to multi-line insert (mulutiple insert)
OFF unique key check in select statement by default
Separate definition (setting linked to table definition) and implementation (create / insert data)

The select statement check is implemented for the time being, but it is not used (it can be used by changing the config setting).

The speed has improved dramatically just by changing the insert statement, which was well-behaved line by line, to 1000 lines (which can also be adjusted with config).

Also, the overall outlook for the code has improved, so it's easy to handle the addition of all-star and playoff data that you haven't used yet!

benchmark

After improving so far, I tried to benchmark the data of 2014 (about 190,000 lines).

The target is

Original code compatible with Python 3
Python 3 compatible + remade version

is. Measurement is

Macbook Pro 13inch(Late 2013、2.8GHz Intel Core i7)
Python 3.5.0 (installed on pyenv)
MySQL 5.7 (installed on Docker, run on Mac)

I did it on the condition.

#Before improvement
$ time python download.py -y 2014
Queuing up Event Files for download (2014 only).
Queuing up Game Logs for download (2014 only).
Fetching 2014eve.zip
Fetching gl2014.zip
Zip file detected. Extracting gl2014.zip
Zip file detected. Extracting 2014eve.zip

real	0m5.816s
user	0m0.276s
sys	0m0.066s
$ time python parse.py -y 2014
calling '/usr/local/bin//cwevent -q -n -f 0-96 -x 0-62 -y 2014 2014*.EV* > /Users/shinyorke_mbp/PycharmProjects/py-retrosheet/scripts/files/csv/events-2014.csv'
calling '/usr/local/bin//cwgame -q -n -f 0-83 -y 2014 2014*.EV* > /Users/shinyorke_mbp/PycharmProjects/py-retrosheet/scripts/files/csv/games-2014.csv'
processing TEAM2014
processing ANA2014.ROS
processing ARI2014.ROS
processing ATL2014.ROS
processing BAL2014.ROS
processing BOS2014.ROS
processing CHA2014.ROS
processing CHN2014.ROS
processing CIN2014.ROS
processing CLE2014.ROS
processing COL2014.ROS
processing DET2014.ROS
processing HOU2014.ROS
processing KCA2014.ROS
processing LAN2014.ROS
processing MIA2014.ROS
processing MIL2014.ROS
processing MIN2014.ROS
processing NYA2014.ROS
processing NYN2014.ROS
processing OAK2014.ROS
processing PHI2014.ROS
processing PIT2014.ROS
processing SDN2014.ROS
processing SEA2014.ROS
processing SFN2014.ROS
processing SLN2014.ROS
processing TBA2014.ROS
processing TEX2014.ROS
processing TOR2014.ROS
processing WAS2014.ROS
processing /Users/shinyorke_mbp/PycharmProjects/py-retrosheet/scripts/files/csv/games-2014.csv
processing /Users/shinyorke_mbp/PycharmProjects/py-retrosheet/scripts/files/csv/events-2014.csv

real	19m53.224s
user	13m16.074s
sys	0m26.294s

#After improvement
$ time python retrosheet_download.py -f 2014 -t 2014
Fetching 2014eve.zip
Fetching gl2014.zip
Zip file detected. Extracting gl2014.zip
Zip file detected. Extracting 2014eve.zip

real	0m4.630s
user	0m0.217s
sys	0m0.055s
$ time python parse_csv.py -f 2014 -t 2014
calling /usr/local/bin/cwgame -q -n -f 0-83 -y 2014 2014*.EV* > /Users/shinyorke_mbp/PycharmProjects/py-retrosheet/scripts/files/csv/games-2014.csv
calling /usr/local/bin/cwevent -q -n -f 0-96 -x 0-62 -y 2014 2014*.EV* > /Users/shinyorke_mbp/PycharmProjects/py-retrosheet/scripts/files/csv/events-2014.csv

real	0m8.713s
user	0m8.321s
sys	0m0.221s
$ time python retrosheet_mysql.py -f 2014 -t 2014

real	0m21.435s
user	0m3.580s
sys	0m0.416s

** The process that originally took 20 minutes has been improved to less than 1 minute! ** **

With this, even if you want a large amount of game / turn at bat data, you can easily obtain or make it.

Try to make data for 15 years

It's getting interesting, so I decided to make data for 15 years from 2000 to 2014.

The total number of records is about 2.9 million, not about 3 million.

I have newly transcribed the script for executing all at once.

`scripts/migration.py`


#!/usr/bin/env python
# -*- coding: utf-8 -*-
"""
Migrate Retrosheet Database.
Python 3.5.0+ (don't know about 3.4- and 2.x, sorry)
MySQL 5.6.0+ (don't know about 5.5- , sorry)
"""
import logging
import click
from retrosheet_download import RetrosheetDownload
from retrosheet_mysql import RetrosheetMySql
from parse_csv import ParseCsv

__author__ = 'Shinichi Nakagawa'


@click.command()
@click.option('--from_year', '-f', default=2010, help='From Season')
@click.option('--to_year', '-t', default=2014, help='To Season')
@click.option('--configfile', '-c', default='config.ini', help='Config File')
def main(from_year, to_year, configfile):
    """
    :param from_year: Season(from)
    :param to_year: Season(to)
    :param configfile: Config file
    """
    # from <= to check
    if from_year > to_year:
        print('not From <= To({from_year} <= {to_year})'.format(from_year=from_year, to_year=to_year))
        raise SystemExit

    # logging setting
    logging.basicConfig(
        level=logging.INFO,
        format="Time:%(asctime)s.%(msecs)03d\t message:%(message)s",
        datefmt="%Y-%m-%d %H:%M:%S"
    )

    # Download
    logging.info('File Download Start({from_year}-{to_year})'.format(from_year=from_year, to_year=to_year))
    RetrosheetDownload.run(from_year, to_year, configfile)
    logging.info('File Download End')

    # Parse Csv
    logging.info('Csv Parse Start({from_year}-{to_year})'.format(from_year=from_year, to_year=to_year))
    ParseCsv.run(from_year, to_year, configfile)
    logging.info('Csv Parse End')

    # Migrate MySQL Database
    logging.info('Migrate Database Start({from_year}-{to_year})'.format(from_year=from_year, to_year=to_year))
    RetrosheetMySql.run(from_year, to_year, configfile)
    logging.info('Migrate Database End')


if __name__ == '__main__':
    main()

Click here for results

$ python migration.py -f 2000 -t 2014
Time:2015-11-15 15:23:48.291	 message:File Download Start(2000-2014)
Directory /Users/shinyorke_mbp/PycharmProjects/hatteberg/py-retrosheet/scripts/files does not exist, creating...
Time:2015-11-15 15:23:59.673	 message:File Download End
Time:2015-11-15 15:23:59.673	 message:Csv Parse Start(2000-2014)
Time:2015-11-15 15:26:37.219	 message:Csv Parse End
Time:2015-11-15 15:26:37.220	 message:Migrate Database Start(2000-2014)
Time:2015-11-15 15:32:50.070	 message:Migrate Database End

** It took less than 10 minutes from data acquisition to completion of migure! !! !! ** **

Summary

Don't swallow a person's code
The importance of preaching from the most important issues
However, the spirit of "hurry up" is also important

This will make a little more fun baseball hack!

A Python script that stores 15 years of MLB game data in MySQL in 10 minutes (Baseball Hack!)