[PYTHON] The transition of baseball as seen from the data

I considered from the data how baseball has changed as an output.

Data

Library loading

Load the required library appropriately.

import pandas as pd
import scipy as sp
import matplotlib.pyplot as plt
import seaborn as sns

Read data

Please put them in the same directory. Upload if colaboratry

team = pd.read_csv('Teams.csv')

Feature selection

If you read it with seaborn without selecting the feature amount, it will be a huge amount, so select the feature amount.

teams  = teams[['yearID','teamID', 'Rank', 'W','L', 'H','SB','SO','RA','SV', 'BB', 'HBP', 'AB', 'SF', 'HR', '2B', '3B']]

Correlation matrix

Visualize the correlation matrix with a seaborn heatmap.

colormap = plt.cm.RdBu
plt.figure(figsize=(16,10))
plt.title('Pearson Correlation of Features', y=1.05, size=15)
sns.heatmap(teams2.corr(),linewidths=0.1,vmax=1.0, 
            square=True, cmap=colormap, linecolor='white', annot=True)

download.png

Consideration

It can be confirmed that HR (number of home runs) and SO (strikeout rate) are increasing with each passing year. What can be considered from this is the flyball revolution. It is a well-known fact that the flyball revolution has increased the strikeout rate and the number of home runs. This can also be confirmed from this data. Also, SV (the increase in the number of saves shows that the pitcher division of labor system was established by the times)

Next, we will consider the relationship between the total number of annual bars and the number of wins.

By clarifying the relationship between the total number of years and the number of wins, I would like to investigate whether there is a difference between the past and the present, which is the most efficient choice.

Read the year bar data

salaries = pd.read_csv('Salaries.csv')

Data shaping

teams = teams.set_index(['yearID'])
teams.head()
salaries_by_yearID_teamID = salaries.groupby(['yearID', 'teamID'])['salary'].sum()
teams = teams.join(salaries_by_yearID_teamID)

Visualization

plt.subplot(1,2,1)
plt.scatter(teams['salary'][2001], teams['W'][2001])
plt.title('2001')
plt.subplot(1,2,2)
plt.scatter(teams['salary'][2013], teams['W'][2013])
plt.title('2013')

download-1.png

Consideration

A positive correlation can be confirmed in 2013 than in 2001. This may be because the technology to detect the abilities of athletes has evolved from 2001.

Summary

I was able to recall the evolutionary history of Sabermetrics while analyzing it with data. The flyball revolution had a major impact on baseball.

Recommended Posts

The transition of baseball as seen from the data
A collection of important AI research themes as seen from a review of the Google AI Blog
The story of copying data from S3 to Google's TeamDrive
Existence from the viewpoint of Python
Studying web scraping for the purpose of extracting data from Filmarks # 2
Important unit seen from the Python lecture materials of Kyoto University
Explain the mechanism of PEP557 data class
Comparing R, Python, SAS, SPSS from the perspective of European data scientists
The story of verifying the open data of COVID-19
Get the column list & data list of CASTable
Omit BOM from the beginning of the string
Acquisition of plant growth data Acquisition of data from sensors
The importance of Lint as Pythonista thinks
Learning notes from the beginning of Python 2
Let's check the population transition of Matsue City, Shimane Prefecture with open data
Python C / C ++ Extensions: Pass some of the data as np.array to Python (set stride)
Get the contents of git diff from python
I searched for railway senryu from the data
Transition animation of the most popular programming languages (#programming languages #popular)
Change the decimal point of logging from, to.
Finding the beginning of Abenomics from NT magnification 2
The story of reading HSPICE data in Python
Detect General MIDI data from large amounts of MIDI
Extract only complete from the result of Trinity
Finding the beginning of Abenomics from NT magnification 1
Factfulness of the new coronavirus seen in Splunk
From the introduction of pyethapp to the execution of contract
The story of moving from Pipenv to Poetry
Check the status of your data using pandas_profiling
Download the wind data of the Japan Meteorological Agency
Scraping the winning data of Numbers using Docker
The guy who gets fitbit data from the server
Determine the number of classes using the Starges formula
The transition of baseball as seen from the data
Check the status of your data using pandas_profiling
Scraping the winning data of Numbers using Docker
I tried using the API of the salmon data project
[Python] I tried collecting data using the API of wikipedia
Try to create a battle record table with matplotlib from the data of "Schedule-kun"
How strong is your Qiita? Statistics on the number of Contributes seen in the data
The story of launching a Minecraft server from Discord
The wall of changing the Django service from Python 2.7 to Python 3
Get the variable name of the variable as a character string.
Used from the introduction of Node.js in WSL environment
[Python] Get the text of the law from the e-GOV Law API
Open Chrome version of LINE from the command line [Linux]
Calculation of the minimum required number of votes from turnout
Kaggle competition process from the perspective of score transitions
Interrupt registration procedure as seen in the source code
Generate a vertical image of a novel from text data
The idea of Tensorflow learned from potato chip manufacturing
Study from the beginning of Python Hour1: Hello World
Mathematical understanding of principal component analysis from the beginning
Get the return code of the Python script from bat
Python points from the perspective of a C programmer
About the inefficiency of data transfer in luigi on-memory
The one that graphs the one that got the fitbit data from the server
Looking back on the transition of the Qiita Advent calendar
Not being aware of the contents of the data in python
I tried using the API of the salmon data project
DataNitro, implementation of function to read data from sheet
Let's use the open data of "Mamebus" in Python
Tweet the probability of precipitation as part of the function of the bot
Study from the beginning of Python Hour8: Using packages
Output the output result of sklearn.metrics.classification_report as a CSV file
[Basics of data science] Collecting data from RSS with python