[Python] I tried to analyze the pitcher who achieved no hit no run

Overview

On June 18, 2014, Dodgers pitcher Clayton Kershaw pitched nine times in the Colorado Rockies vs. Los Angeles Dodgers match, achieving 15 strikeouts and no-hitters. This time, we will compare it with the pitcher of the opponent Rockies and analyze why Clayton Kershaw was able to achieve a no-hitter no-run.

environment

・ Python 3.7.5 ・ Windows10 ・ Jupyter Notebook (Anaconda3)

Start analysis (play ball)

First, launch Jupyter Notebook with Anaconda Prompt

$ jupyter notebook 

Then import the required libraries

baseball_analysis.ipynb


%matplotlib inline  
import requests
import xml.etree.ElementTree as ET
import os
import pandas as pd

I will create a data frame for analysis from now on

baseball_analysis.ipynb


#Data frame creation
pitchDF = pd.DataFrame(columns = ['pitchIdx', 'inning', 'frame', 'ab', 'abIdx', 'batter', 'stand', 'speed', 
                                       'pitchtype', 'px', 'pz', 'szTop', 'szBottom', 'des'], dtype=object)

#Creating a ball type dictionary
pitchDictionary = { "FA":"fastball", "FF":"4-seam fb", "FT": "2-seam fb", "FC": "fb-cutter", "":"unknown", None: "none",
                    "FS":"fb-splitter", "SL":"slider", "CH":"changeup","CU":"curveball","KC":"knuckle-curve",
                    "KN":"knuckleball","EP":"eephus", "UN":"unidentified", "PO":"pitchout", "SI":"sinker", "SF":"split-finger"
                    }

# top=Table, bottom=back
frames = ["top", "bottom"]

Acquisition of player information

baseball_analysis.ipynb


#Read player information distributed by MLB Advanced Media
url = 'https://gd2.mlb.com/components/game/mlb/year_2014/month_06/day_18/gid_2014_06_18_colmlb_lanmlb_1/players.xml'
resp = requests.get(url) 
xmlfile = "myplayers.xml"

with open(xmlfile, mode='wb') as f:
    f.write(resp.content)
statinfo = os.stat(xmlfile)

#Parse xml file
tree = ET.parse(xmlfile)
game = tree.getroot()
teams = game.findall("./team")
playerDict = {}

for team in teams:
    players = team.findall("./player")
    for player in players:
        #Add player ID and player name to dictionary
        playerDict[ player.attrib.get("id") ] = player.attrib.get("first") + " " + player.attrib.get("last") 

Data acquisition for each inning

baseball_analysis.ipynb


#Read the data for each inning distributed by MLB Advanced Media
url = 'https://gd2.mlb.com/components/game/mlb/year_2014/month_06/day_18/gid_2014_06_18_colmlb_lanmlb_1/inning/inning_all.xml'
resp = requests.get(url) 
xmlfile = "mygame.xml"

with open(xmlfile, 'wb') as f: 
    f.write(resp.content)
statinfo = os.stat(xmlfile) 

#Parse xml file
tree = ET.parse(xmlfile)
root = tree.getroot()
innings = root.findall("./inning")

totalPitchCount = 0
topPitchCount = 0
bottomPitchCount = 0

for inning in innings:
    for i in range(len(frames)):
        fr = inning.find(frames[i])
        if fr is not None:
            for ab in fr.iter('atbat'):
                battername = playerDict[ab.get('batter')]
                standside = ab.get('stand')
                abIdx = ab.get('num')
                abPitchCount = 0
                pitches = ab.findall("pitch")
                for pitch in pitches:
                    if pitch.attrib.get("start_speed") is None:
                        speed == 0
                    else:
                        speed = float(pitch.attrib.get("start_speed"))

                    pxFloat = 0.0 if pitch.attrib.get("px") == None else float('{0:.2f}'.format(float(pitch.attrib.get("px"))))
                    pzFloat = 0.0 if pitch.attrib.get("pz") == None else float('{0:.2f}'.format(float(pitch.attrib.get("pz"))))
                    szTop = 0.0 if pitch.attrib.get("sz_top") == None else float('{0:.2f}'.format(float(pitch.attrib.get("sz_top"))))
                    szBot = 0.0 if pitch.attrib.get("sz_bot") == None else float('{0:.2f}'.format(float(pitch.attrib.get("sz_bot"))))

                    abPitchCount = abPitchCount + 1
                    totalPitchCount = totalPitchCount + 1
                    
                    if frames[i]=='top':
                        topPitchCount = topPitchCount + 1
                    else:
                        bottomPitchCount = bottomPitchCount + 1
                                  
                    inn = inning.attrib.get("num")
                    
                    verbosePitch = pitchDictionary[pitch.get("pitch_type")]

                    desPitch = pitch.get("des")
                    
                    #Add to data frame
                    pitchDF.loc[totalPitchCount] = [float(totalPitchCount), inn, frames[i], abIdx, abPitchCount, battername, standside, speed,
                                               verbosePitch, pxFloat, pzFloat, szTop, szBot, desPitch]

Data frame confirmation

baseball_analysis.ipynb


pitchDF

# pitchIdx=serial number
# inning=inning
# frame=Front and back
# ab=Batter ID
# abIdx=Number of balls per turn at bat
# batter=Batter name
# stand=At bat(R → right-handed, L → left-handed)
# speed=Ball speed
# pitchtype=Ball type
# px=Home base passing position(Left and right)(Right → positive, left → negative)
# pz=Home base passing position(High low)
# szTop=Distance from the ground to the highest batter's strike zone
# szBottom=Distance from the ground to the lowest batter's strike zone
# des=result

Strike zone creation

baseball_analysis.ipynb


import matplotlib.pyplot as plt
import matplotlib.patches as patches

#Draw a new window
fig1 = plt.figure()
#Add subplot
ax1 = fig1.add_subplot(111, aspect='equal')

#Strike zone width is 17 inches= 1.4 feet
#Strike zone height is 1.5~3.5 feet
#Baseball ball size is 3 inches= 0.25 feet
#How to find feet=inch/ 12

#Strike zone creation
#The blue frame is the strike zone
platewidthInFeet = 17 / 12
szHeightInFeet = 3.5 - 1.5

#Create a strike zone outside one ball
#The light blue frame is a strike zone outside one ball
expandedPlateInFeet = 20 / 12
ballInFeet = 3 / 12
halfBallInFeet = ballInFeet / 2

ax1.add_patch(patches.Rectangle((expandedPlateInFeet/-2, 1.5 - halfBallInFeet), expandedPlateInFeet, szHeightInFeet + ballInFeet, color='lightblue'))
ax1.add_patch(patches.Rectangle((platewidthInFeet/-2, 1.5), platewidthInFeet, szHeightInFeet))

plt.ylim(0, 5)
plt.xlim(-2, 2)
plt.show()

strike.png

Added strike ball judgment to data frame

baseball_analysis.ipynb


uniqDesList = pitchDF.des.unique()
ballColList = [] 
strikeColList = []
ballCount = 0
strikeCount = 0

for index, row in pitchDF.iterrows():
    des = row['des']
    if row['abIdx'] == 1:
        ballCount = 0
        strikeCount = 0
    
    ballColList.append(ballCount)
    strikeColList.append(strikeCount)

    if 'Ball' in des:
        ballCount = ballCount + 1
    elif 'Foul' in des:
        if strikeCount is not 2:
            strikeCount = strikeCount + 1
    elif 'Strike' in des:
        strikeCount = strikeCount + 1

#Add to data frame
pitchDF['ballCount'] = ballColList
pitchDF['strikeCount'] = strikeColList

Data frame confirmation

baseball_analysis.ipynb


pitchDF

Clayton Kershaw (Dodgers) pitching tendency

baseball_analysis.ipynb


df= pitchDF.loc[pitchDF['frame']=='top']

ax1 = df.plot(kind='scatter', x='px', y='pz', marker='o', color='red', figsize=[8,8], ylim=[0,4], xlim=[-2,2])
ax1.set_xlabel('horizontal location')
ax1.set_ylabel('vertical location')
ax1.set_title('Clayton Kershaw's pitching tendency')
ax1.set_aspect(aspect=1)

platewidthInFeet = 17 / 12
expandedPlateInFeet = 20 / 12
szTop = df["szTop"].iloc[0]
szBottom = df["szBottom"].iloc[0]
szHeightInFeet = szTop - szBottom
ballInFeet = 3 / 12
halfBallInFeet = ballInFeet / 2

outrect = ax1.add_patch(patches.Rectangle((expandedPlateInFeet/-2, szBottom - halfBallInFeet), expandedPlateInFeet, szHeightInFeet + ballInFeet, color='lightblue'))
rect = ax1.add_patch(patches.Rectangle((platewidthInFeet/-2, szBottom), platewidthInFeet, szHeightInFeet))
outrect.zorder=-2 
rect.zorder=-1 
    
plt.ylim(0, 5)
plt.xlim(-2.5, 2.5)
plt.show()

pitch_dodgers.png

Rockies pitching tendency

baseball_analysis.ipynb


df= pitchDF.loc[pitchDF['frame']=='bottom']

ax1 = df.plot(kind='scatter', x='px', y='pz', marker='o', color='red', figsize=[8,8], ylim=[0,4], xlim=[-2,2])
ax1.set_xlabel('horizontal location')
ax1.set_ylabel('vertical location')
ax1.set_title('Rockies pitching tendency')
ax1.set_aspect(aspect=1)
        
platewidthInFeet = 17 / 12
expandedPlateInFeet = 20 / 12
szTop = df["szTop"].iloc[0]
szBottom = df["szBottom"].iloc[0]
szHeightInFeet = szTop - szBottom
ballInFeet = 3 / 12
halfBallInFeet = ballInFeet / 2

outrect = ax1.add_patch(patches.Rectangle((expandedPlateInFeet/-2, szBottom - halfBallInFeet), expandedPlateInFeet, szHeightInFeet + ballInFeet, color='lightblue'))
rect = ax1.add_patch(patches.Rectangle((platewidthInFeet/-2, szBottom), platewidthInFeet, szHeightInFeet))
outrect.zorder=-2 
rect.zorder=-1 
    
plt.ylim(0, 5)
plt.xlim(-2.5, 2.5)
plt.show()

pitch_rockies.png

Comparing both pitchers, ** Clayton strike rate: 65% ** ** Rockies strike rate: 56% ** I found out that. I feel that Clayton has fewer laterally missed balls than Rockies pitchers. Is it the influence of the slider or the straight that hops that there is a lot of vertical variation?

Next, let's look at the tendency of the first ball.

Clayton Kershaw (Dodgers) first ball tendency

baseball_analysis.ipynb


df= pitchDF.loc[pitchDF['frame']=='top'].loc[pitchDF['abIdx']==1]

ax1 = df.plot(kind='scatter', x='px', y='pz', marker='o', color='red', figsize=[8,8], ylim=[0,4], xlim=[-2,2])
ax1.set_xlabel('horizontal location')
ax1.set_ylabel('vertical location')
ax1.set_title('Clayton Kershaw's first ball tendency')
ax1.set_aspect(aspect=1)
        
platewidthInFeet = 17 / 12
expandedPlateInFeet = 20 / 12
szTop = df["szTop"].iloc[0]
szBottom = df["szBottom"].iloc[0]
szHeightInFeet = szTop - szBottom
ballInFeet = 3 / 12
halfBallInFeet = ballInFeet / 2

outrect = ax1.add_patch(patches.Rectangle((expandedPlateInFeet/-2, szBottom - halfBallInFeet), expandedPlateInFeet, szHeightInFeet + ballInFeet, color='lightblue'))
rect = ax1.add_patch(patches.Rectangle((platewidthInFeet/-2, szBottom), platewidthInFeet, szHeightInFeet))
outrect.zorder=-2 
rect.zorder=-1 
    
plt.ylim(0, 5)
plt.xlim(-2.5, 2.5)
plt.show()

pitch_dodgers_firstball.png

Rockies' first ball tendency

baseball_analysis.ipynb


df= pitchDF.loc[pitchDF['frame']=='bottom'].loc[pitchDF['abIdx']==1]

ax1 = df.plot(kind='scatter', x='px', y='pz', marker='o', color='red', figsize=[8,8], ylim=[0,4], xlim=[-2,2])
ax1.set_xlabel('horizontal location')
ax1.set_ylabel('vertical location')
ax1.set_title('Rockies' first ball tendency')
ax1.set_aspect(aspect=1)
        
platewidthInFeet = 17 / 12
expandedPlateInFeet = 20 / 12
szTop = df["szTop"].iloc[0]
szBottom = df["szBottom"].iloc[0]
szHeightInFeet = szTop - szBottom
ballInFeet = 3 / 12
halfBallInFeet = ballInFeet / 2

outrect = ax1.add_patch(patches.Rectangle((expandedPlateInFeet/-2, szBottom - halfBallInFeet), expandedPlateInFeet, szHeightInFeet + ballInFeet, color='lightblue'))
rect = ax1.add_patch(patches.Rectangle((platewidthInFeet/-2, szBottom), platewidthInFeet, szHeightInFeet))
outrect.zorder=-2 
rect.zorder=-1 
    
plt.ylim(0, 5)
plt.xlim(-2.5, 2.5)
plt.show()

pitch_rockies_firstball.png

Comparing both pitchers, ** Clayton's first ball strike rate: 71% ** ** Rockies first ball strike rate: 64% ** I found out that.

Pitcher Clayton has a small number of balls and is ahead of the strike.

Next, let's look at the change in ball speed.

Clayton Kershaw (Dodgers) ball speed change

baseball_analysis.ipynb


df = pitchDF.loc[(pitchDF['frame']=='top')]

speed = df['speed']
print(sum(speed) / len(speed))
print(max(speed))
print(min(speed))
print(max(speed) - min(speed))

ax = df.plot(x='pitchIdx', y='speed', color='blue', figsize=[12,6])
ax.set_ylabel('speed')
ax.set_title('Rockies ball speed change')
plt.savefig('pitch_rockies_speed.png')
plt.show()
>>>>>>>>>>>>>>>>>>>>>>>>>
#Average ball speed: 87.88504672897201
#Fastest: 95.0
#The latest: 72.4
#Slow / fast difference: 22.599999999999994

pitch_dodgers_speed.png

Rockies ball speed change

baseball_analysis.ipynb


df = pitchDF.loc[(pitchDF['frame']=='bottom')]

speed = df['speed']
print(sum(speed) / len(speed))
print(max(speed))
print(min(speed))
print(max(speed) - min(speed))

ax = df.plot(x='pitchIdx', y='speed', color='blue', figsize=[12,6])
ax.set_ylabel('speed')
ax.set_title('Rockies ball speed change')
plt.savefig('pitch_rockies_speed.png')
plt.show()
>>>>>>>>>>>>>>>>>>>>>>>>>
#Average ball speed: 89.13599999999998
#Fastest: 96.3
#The latest: 71.8
#Slow / fast difference: 24.5

pitch_rockies_speed.png

Comparing both pitchers, Clayton ** Average ball speed: 87 miles ** ** Fastest: 95 miles ** ** Late: 72 miles ** ** Speed difference: 22 miles **

Rockies ** Average ball speed: 89 miles ** ** Fastest: 96 miles ** ** Late: 71 miles ** ** Speed difference: 24 miles ** I found out that.

Rockies has five pitchers, so it's natural that there is a difference in the tendency.

Next, let's look at the change in ball speed.

Clayton Kershaw (Dodgers) ball type ratio

baseball_analysis.ipynb


df = pitchDF.loc[(pitchDF['frame']=='top')]

df.pitchtype.value_counts().plot(kind='pie', autopct="%.1f%%", pctdistance=0.8)
plt.axis('equal')
plt.axis('off')
plt.title('Ball type ratio')
plt.show()

pitch_dodgers_ball.png

Clayton Kershaw (Dodgers) 4-seam results

baseball_analysis.ipynb


df = pitchDF.loc[(pitchDF['pitchtype']=='4-seam fb') & (pitchDF['frame']=='top')]

df.des.value_counts().plot(kind='pie', autopct="%.1f%%", pctdistance=0.8)
plt.axis('equal')
plt.axis('off')
plt.title('4-seam event results')
plt.show()

pitch_dodgers_4seam.png

Clayton Kershaw (Dodgers) slider results

baseball_analysis.ipynb


df = pitchDF.loc[(pitchDF['pitchtype']=='slider') & (pitchDF['frame']=='top')]

df.des.value_counts().plot(kind='pie', autopct="%.1f%%", pctdistance=0.8)
plt.axis('equal')
plt.axis('off')
plt.title('slider event result')
plt.show()

pitch_dodgers_slider.png

Clayton Kershaw (Dodgers) curve results

baseball_analysis.ipynb


df = pitchDF.loc[(pitchDF['pitchtype']=='curveball') & (pitchDF['frame']=='top')]

df.des.value_counts().plot(kind='pie', autopct="%.1f%%", pctdistance=0.8)
plt.axis('equal')
plt.axis('off')
plt.title('curveball event result')
plt.show()

pitch_dodgers_curve.png

Clayton Kershaw (Dodgers) change-up results

baseball_analysis.ipynb


df = pitchDF.loc[(pitchDF['pitchtype']=='changeup') & (pitchDF['frame']=='top')]

df.des.value_counts().plot(kind='pie', autopct="%.1f%%", pctdistance=0.8)
plt.axis('equal')
plt.axis('off')
plt.title('changeup event result')
plt.show()

pitch_dodgers_changeup.png

Comparing the out rates for each type of ball, ** 4 seams: 35.7% ** ** Slider: 18.8% ** ** Curve: 22.3% ** ** Changeup: 0% ** I found out that.

The four seams, which account for half of the number of pitches, are pretty good.

Next, let's look at the ball distribution by count.

Clayton Kershaw (Dodgers) ball distribution by count

baseball_analysis.ipynb


titleList = []
dataList = []

fig, axes = plt.subplots(4, 3, figsize=(12,16))

#Count creation
for b in range(4):
    for s in range(3):
        df = pitchDF.loc[(pitchDF['ballCount']==b) & (pitchDF['strikeCount']==s) & (pitchDF['frame']=='top')]
        title = "Count:" + str(b) + "-" + str(s) + " (" + str(len(df)) + ")"
        titleList.append(title)
        dataList.append(df)

for i, ax in enumerate(axes.flatten()):
    x = dataList[i].pitchtype.value_counts()
    l = dataList[i].pitchtype.unique()

    ax.pie(x, autopct="%.1f%%", pctdistance=0.9, labels=l)
    ax.set_title(titleList[i])

plt.show()

Well, almost 4 seams.

Next, let's look at the results by count.

a.png

Clayton Kershaw (Dodgers) Count Results

baseball_analysis.ipynb


titleList = []
dataList = []

fig, axes = plt.subplots(4, 3, figsize=(12,16))

for b in range(4):
    for s in range(3):
        df = pitchDF.loc[(pitchDF['ballCount']==b) & (pitchDF['strikeCount']==s) & pitchDF['des'] & (pitchDF['frame']=='top')]
        title = "Count:" + str(b) + "-" + str(s) + " (" + str(len(df)) + ")"
        titleList.append(title)
        dataList.append(df)

for i, ax in enumerate(axes.flatten()):
    x = dataList[i].des.value_counts()
    l = dataList[i].des.unique()

    ax.pie(x, autopct="%.1f%%", pctdistance=0.9, labels=l)
    ax.set_title(titleList[i])

plt.show()

You can see that there is a high probability of a strike judgment and In play outs (out as a result of the ball flying to the field) at any count. result.png

Conclusion

--There are many first-ball strikes, and we have a favorable count (we have taken quite a bit before it became advantageous).

--There is a strong tendency for four seams to be distributed

--Be careful of sliders that come unexpectedly (probably vertical cracks)

Summary

I was familiar with the characteristics of pitcher Clayton to some extent, but I couldn't understand the reason why he got a no-hitter no-run without comparing with other games. You will also need a record of past battles with batters. Pitcher Clayton had good control and pitched only 107 pitches in this match. MLB has more games than NPB and throws through the season in the middle of the 4th, so even if the pitcher is a good pitcher, there is a tendency to drop at around 120 pitches due to pitching restrictions. Therefore, good control may be the most important factor in achieving a no-hitter no-run in major leagues. It's been a long time, but thank you for reading this far. If you find any mistakes, I would be very grateful if you could point them out in the comments.

Recommended Posts

[Python] I tried to analyze the pitcher who achieved no hit no run
I tried to teach Python to those who have no programming experience
I tried web scraping to analyze the lyrics.
When I tried to run Python, it was skipped to the Microsoft Store
(Python) I tried to analyze 1 million hands ~ I tried to estimate the number of AA ~
I tried to verify and analyze the acceleration of Python by Cython
Qiita Job I tried to analyze the job offer
I tried to analyze the New Year's card by myself using python
I tried to graph the packages installed in Python
I tried to touch the CSV file with Python
I tried to solve the soma cube with python
[Python] I tried to graph the top 10 eyeshadow rankings
I tried to solve the problem with Python Vol.1
I tried to analyze J League data with Python
I tried to summarize the string operations of Python
I tried to find the entropy of the image with python
I tried to simulate how the infection spreads with Python
I tried to analyze the whole novel "Weathering with You" ☔️
[Python] I tried to visualize the follow relationship of Twitter
I tried to implement the mail sending function in Python
I tried to enumerate the differences between java and python
I tried changing the python script from 2.7.11 to 3.6.0 on windows10
I tried to divide the file into folders with Python
I tried to touch Python (installation)
I tried to move the ball
I tried to estimate the interval.
[Pandas] I tried to analyze sales data with Python [For beginners]
I tried to display the video playback time (OpenCV: Python version)
I tried to improve the efficiency of daily work with Python
I tried to summarize Python exception handling
I tried to implement PLSA in Python
I tried to summarize the umask command
I tried to implement permutation in Python
I tried to recognize the wake word
I tried to implement PLSA in Python 2
Python3 standard input I tried to summarize
I want to analyze logs with Python
I tried to summarize the graphical modeling.
I tried to implement ADALINE in Python
I tried to estimate the pi stochastically
I tried to touch the COTOHA API
I tried to implement PPO in Python
Python: I tried the traveling salesman problem
[Python] I tried to calculate TF-IDF steadily
I tried to touch Python (basic syntax)
I tried the Python Tornado Testing Framework
[Python] I tried to visualize the night on the Galactic Railroad with WordCloud!
[Python] I tried to summarize the set type (set) in an easy-to-understand manner.
I tried to refer to the fun rock-paper-scissors poi for beginners with Python
Python -I tried to restore the dictionary comprehensive notation to its original form-
I tried to get the authentication code of Qiita API with Python.
I tried with the top 100 PyPI packages> I tried to graph the packages installed on Python
I want to analyze the emotions of people who want to meet and tremble
I tried to analyze the negativeness of Nono Morikubo. [Compare with Posipa]
I tried to streamline the standard role of new employees with Python
I tried to get the movie information of TMDb API with Python
Using COTOHA, I tried to follow the emotional course of Run, Melos!
I tried to get and analyze the statistical data of the new corona with Python: Data of Johns Hopkins University
I tried "smoothing" the image with Python + OpenCV
[Python] I tried substituting the function name for the function name
vprof --I tried using the profiler for Python