Overview

On June 18, 2014, Dodgers pitcher Clayton Kershaw pitched nine times in the Colorado Rockies vs. Los Angeles Dodgers match, achieving 15 strikeouts and no-hitters. This time, we will compare it with the pitcher of the opponent Rockies and analyze why Clayton Kershaw was able to achieve a no-hitter no-run.

environment

・ Python 3.7.5 ・ Windows10 ・ Jupyter Notebook (Anaconda3)

Start analysis (play ball)

First, launch Jupyter Notebook with Anaconda Prompt

$ jupyter notebook

Then import the required libraries

`baseball_analysis.ipynb`


%matplotlib inline  
import requests
import xml.etree.ElementTree as ET
import os
import pandas as pd

I will create a data frame for analysis from now on

`baseball_analysis.ipynb`


#Data frame creation
pitchDF = pd.DataFrame(columns = ['pitchIdx', 'inning', 'frame', 'ab', 'abIdx', 'batter', 'stand', 'speed', 
                                       'pitchtype', 'px', 'pz', 'szTop', 'szBottom', 'des'], dtype=object)

#Creating a ball type dictionary
pitchDictionary = { "FA":"fastball", "FF":"4-seam fb", "FT": "2-seam fb", "FC": "fb-cutter", "":"unknown", None: "none",
                    "FS":"fb-splitter", "SL":"slider", "CH":"changeup","CU":"curveball","KC":"knuckle-curve",
                    "KN":"knuckleball","EP":"eephus", "UN":"unidentified", "PO":"pitchout", "SI":"sinker", "SF":"split-finger"
                    }

# top=Table, bottom=back
frames = ["top", "bottom"]

Acquisition of player information

`baseball_analysis.ipynb`


#Read player information distributed by MLB Advanced Media
url = 'https://gd2.mlb.com/components/game/mlb/year_2014/month_06/day_18/gid_2014_06_18_colmlb_lanmlb_1/players.xml'
resp = requests.get(url) 
xmlfile = "myplayers.xml"

with open(xmlfile, mode='wb') as f:
    f.write(resp.content)
statinfo = os.stat(xmlfile)

#Parse xml file
tree = ET.parse(xmlfile)
game = tree.getroot()
teams = game.findall("./team")
playerDict = {}

for team in teams:
    players = team.findall("./player")
    for player in players:
        #Add player ID and player name to dictionary
        playerDict[ player.attrib.get("id") ] = player.attrib.get("first") + " " + player.attrib.get("last")

Data acquisition for each inning

`baseball_analysis.ipynb`


#Read the data for each inning distributed by MLB Advanced Media
url = 'https://gd2.mlb.com/components/game/mlb/year_2014/month_06/day_18/gid_2014_06_18_colmlb_lanmlb_1/inning/inning_all.xml'
resp = requests.get(url) 
xmlfile = "mygame.xml"

with open(xmlfile, 'wb') as f: 
    f.write(resp.content)
statinfo = os.stat(xmlfile) 

#Parse xml file
tree = ET.parse(xmlfile)
root = tree.getroot()
innings = root.findall("./inning")

totalPitchCount = 0
topPitchCount = 0
bottomPitchCount = 0

for inning in innings:
    for i in range(len(frames)):
        fr = inning.find(frames[i])
        if fr is not None:
            for ab in fr.iter('atbat'):
                battername = playerDict[ab.get('batter')]
                standside = ab.get('stand')
                abIdx = ab.get('num')
                abPitchCount = 0
                pitches = ab.findall("pitch")
                for pitch in pitches:
                    if pitch.attrib.get("start_speed") is None:
                        speed == 0
                    else:
                        speed = float(pitch.attrib.get("start_speed"))

                    pxFloat = 0.0 if pitch.attrib.get("px") == None else float('{0:.2f}'.format(float(pitch.attrib.get("px"))))
                    pzFloat = 0.0 if pitch.attrib.get("pz") == None else float('{0:.2f}'.format(float(pitch.attrib.get("pz"))))
                    szTop = 0.0 if pitch.attrib.get("sz_top") == None else float('{0:.2f}'.format(float(pitch.attrib.get("sz_top"))))
                    szBot = 0.0 if pitch.attrib.get("sz_bot") == None else float('{0:.2f}'.format(float(pitch.attrib.get("sz_bot"))))

                    abPitchCount = abPitchCount + 1
                    totalPitchCount = totalPitchCount + 1
                    
                    if frames[i]=='top':
                        topPitchCount = topPitchCount + 1
                    else:
                        bottomPitchCount = bottomPitchCount + 1
                                  
                    inn = inning.attrib.get("num")
                    
                    verbosePitch = pitchDictionary[pitch.get("pitch_type")]

                    desPitch = pitch.get("des")
                    
                    #Add to data frame
                    pitchDF.loc[totalPitchCount] = [float(totalPitchCount), inn, frames[i], abIdx, abPitchCount, battername, standside, speed,
                                               verbosePitch, pxFloat, pzFloat, szTop, szBot, desPitch]

Data frame confirmation

`baseball_analysis.ipynb`


pitchDF

# pitchIdx=serial number
# inning=inning
# frame=Front and back
# ab=Batter ID
# abIdx=Number of balls per turn at bat
# batter=Batter name
# stand=At bat(R → right-handed, L → left-handed)
# speed=Ball speed
# pitchtype=Ball type
# px=Home base passing position(Left and right)(Right → positive, left → negative)
# pz=Home base passing position(High low)
# szTop=Distance from the ground to the highest batter's strike zone
# szBottom=Distance from the ground to the lowest batter's strike zone
# des=result

Strike zone creation

`baseball_analysis.ipynb`


import matplotlib.pyplot as plt
import matplotlib.patches as patches

#Draw a new window
fig1 = plt.figure()
#Add subplot
ax1 = fig1.add_subplot(111, aspect='equal')

#Strike zone width is 17 inches= 1.4 feet
#Strike zone height is 1.5～3.5 feet
#Baseball ball size is 3 inches= 0.25 feet
#How to find feet=inch/ 12

#Strike zone creation
#The blue frame is the strike zone
platewidthInFeet = 17 / 12
szHeightInFeet = 3.5 - 1.5

#Create a strike zone outside one ball
#The light blue frame is a strike zone outside one ball
expandedPlateInFeet = 20 / 12
ballInFeet = 3 / 12
halfBallInFeet = ballInFeet / 2

ax1.add_patch(patches.Rectangle((expandedPlateInFeet/-2, 1.5 - halfBallInFeet), expandedPlateInFeet, szHeightInFeet + ballInFeet, color='lightblue'))
ax1.add_patch(patches.Rectangle((platewidthInFeet/-2, 1.5), platewidthInFeet, szHeightInFeet))

plt.ylim(0, 5)
plt.xlim(-2, 2)
plt.show()

Added strike ball judgment to data frame

`baseball_analysis.ipynb`


uniqDesList = pitchDF.des.unique()
ballColList = [] 
strikeColList = []
ballCount = 0
strikeCount = 0

for index, row in pitchDF.iterrows():
    des = row['des']
    if row['abIdx'] == 1:
        ballCount = 0
        strikeCount = 0
    
    ballColList.append(ballCount)
    strikeColList.append(strikeCount)

    if 'Ball' in des:
        ballCount = ballCount + 1
    elif 'Foul' in des:
        if strikeCount is not 2:
            strikeCount = strikeCount + 1
    elif 'Strike' in des:
        strikeCount = strikeCount + 1

#Add to data frame
pitchDF['ballCount'] = ballColList
pitchDF['strikeCount'] = strikeColList

Data frame confirmation

`baseball_analysis.ipynb`


pitchDF

Clayton Kershaw (Dodgers) pitching tendency

`baseball_analysis.ipynb`


df= pitchDF.loc[pitchDF['frame']=='top']

ax1 = df.plot(kind='scatter', x='px', y='pz', marker='o', color='red', figsize=[8,8], ylim=[0,4], xlim=[-2,2])
ax1.set_xlabel('horizontal location')
ax1.set_ylabel('vertical location')
ax1.set_title('Clayton Kershaw's pitching tendency')
ax1.set_aspect(aspect=1)

platewidthInFeet = 17 / 12
expandedPlateInFeet = 20 / 12
szTop = df["szTop"].iloc[0]
szBottom = df["szBottom"].iloc[0]
szHeightInFeet = szTop - szBottom
ballInFeet = 3 / 12
halfBallInFeet = ballInFeet / 2

outrect = ax1.add_patch(patches.Rectangle((expandedPlateInFeet/-2, szBottom - halfBallInFeet), expandedPlateInFeet, szHeightInFeet + ballInFeet, color='lightblue'))
rect = ax1.add_patch(patches.Rectangle((platewidthInFeet/-2, szBottom), platewidthInFeet, szHeightInFeet))
outrect.zorder=-2 
rect.zorder=-1 
    
plt.ylim(0, 5)
plt.xlim(-2.5, 2.5)
plt.show()

Rockies pitching tendency

`baseball_analysis.ipynb`


df= pitchDF.loc[pitchDF['frame']=='bottom']

ax1 = df.plot(kind='scatter', x='px', y='pz', marker='o', color='red', figsize=[8,8], ylim=[0,4], xlim=[-2,2])
ax1.set_xlabel('horizontal location')
ax1.set_ylabel('vertical location')
ax1.set_title('Rockies pitching tendency')
ax1.set_aspect(aspect=1)
        
platewidthInFeet = 17 / 12
expandedPlateInFeet = 20 / 12
szTop = df["szTop"].iloc[0]
szBottom = df["szBottom"].iloc[0]
szHeightInFeet = szTop - szBottom
ballInFeet = 3 / 12
halfBallInFeet = ballInFeet / 2

outrect = ax1.add_patch(patches.Rectangle((expandedPlateInFeet/-2, szBottom - halfBallInFeet), expandedPlateInFeet, szHeightInFeet + ballInFeet, color='lightblue'))
rect = ax1.add_patch(patches.Rectangle((platewidthInFeet/-2, szBottom), platewidthInFeet, szHeightInFeet))
outrect.zorder=-2 
rect.zorder=-1 
    
plt.ylim(0, 5)
plt.xlim(-2.5, 2.5)
plt.show()

Comparing both pitchers, ** Clayton strike rate: 65% ** ** Rockies strike rate: 56% ** I found out that. I feel that Clayton has fewer laterally missed balls than Rockies pitchers. Is it the influence of the slider or the straight that hops that there is a lot of vertical variation?

Next, let's look at the tendency of the first ball.

Clayton Kershaw (Dodgers) first ball tendency

`baseball_analysis.ipynb`


df= pitchDF.loc[pitchDF['frame']=='top'].loc[pitchDF['abIdx']==1]

ax1 = df.plot(kind='scatter', x='px', y='pz', marker='o', color='red', figsize=[8,8], ylim=[0,4], xlim=[-2,2])
ax1.set_xlabel('horizontal location')
ax1.set_ylabel('vertical location')
ax1.set_title('Clayton Kershaw's first ball tendency')
ax1.set_aspect(aspect=1)
        
platewidthInFeet = 17 / 12
expandedPlateInFeet = 20 / 12
szTop = df["szTop"].iloc[0]
szBottom = df["szBottom"].iloc[0]
szHeightInFeet = szTop - szBottom
ballInFeet = 3 / 12
halfBallInFeet = ballInFeet / 2

outrect = ax1.add_patch(patches.Rectangle((expandedPlateInFeet/-2, szBottom - halfBallInFeet), expandedPlateInFeet, szHeightInFeet + ballInFeet, color='lightblue'))
rect = ax1.add_patch(patches.Rectangle((platewidthInFeet/-2, szBottom), platewidthInFeet, szHeightInFeet))
outrect.zorder=-2 
rect.zorder=-1 
    
plt.ylim(0, 5)
plt.xlim(-2.5, 2.5)
plt.show()

Rockies' first ball tendency

`baseball_analysis.ipynb`


df= pitchDF.loc[pitchDF['frame']=='bottom'].loc[pitchDF['abIdx']==1]

ax1 = df.plot(kind='scatter', x='px', y='pz', marker='o', color='red', figsize=[8,8], ylim=[0,4], xlim=[-2,2])
ax1.set_xlabel('horizontal location')
ax1.set_ylabel('vertical location')
ax1.set_title('Rockies' first ball tendency')
ax1.set_aspect(aspect=1)
        
platewidthInFeet = 17 / 12
expandedPlateInFeet = 20 / 12
szTop = df["szTop"].iloc[0]
szBottom = df["szBottom"].iloc[0]
szHeightInFeet = szTop - szBottom
ballInFeet = 3 / 12
halfBallInFeet = ballInFeet / 2

outrect = ax1.add_patch(patches.Rectangle((expandedPlateInFeet/-2, szBottom - halfBallInFeet), expandedPlateInFeet, szHeightInFeet + ballInFeet, color='lightblue'))
rect = ax1.add_patch(patches.Rectangle((platewidthInFeet/-2, szBottom), platewidthInFeet, szHeightInFeet))
outrect.zorder=-2 
rect.zorder=-1 
    
plt.ylim(0, 5)
plt.xlim(-2.5, 2.5)
plt.show()

Comparing both pitchers, ** Clayton's first ball strike rate: 71% ** ** Rockies first ball strike rate: 64% ** I found out that.

Pitcher Clayton has a small number of balls and is ahead of the strike.

Next, let's look at the change in ball speed.

Clayton Kershaw (Dodgers) ball speed change

`baseball_analysis.ipynb`


df = pitchDF.loc[(pitchDF['frame']=='top')]

speed = df['speed']
print(sum(speed) / len(speed))
print(max(speed))
print(min(speed))
print(max(speed) - min(speed))

ax = df.plot(x='pitchIdx', y='speed', color='blue', figsize=[12,6])
ax.set_ylabel('speed')
ax.set_title('Rockies ball speed change')
plt.savefig('pitch_rockies_speed.png')
plt.show()
>>>>>>>>>>>>>>>>>>>>>>>>>
#Average ball speed: 87.88504672897201
#Fastest: 95.0
#The latest: 72.4
#Slow / fast difference: 22.599999999999994

Rockies ball speed change

`baseball_analysis.ipynb`


df = pitchDF.loc[(pitchDF['frame']=='bottom')]

speed = df['speed']
print(sum(speed) / len(speed))
print(max(speed))
print(min(speed))
print(max(speed) - min(speed))

ax = df.plot(x='pitchIdx', y='speed', color='blue', figsize=[12,6])
ax.set_ylabel('speed')
ax.set_title('Rockies ball speed change')
plt.savefig('pitch_rockies_speed.png')
plt.show()
>>>>>>>>>>>>>>>>>>>>>>>>>
#Average ball speed: 89.13599999999998
#Fastest: 96.3
#The latest: 71.8
#Slow / fast difference: 24.5

Comparing both pitchers, Clayton ** Average ball speed: 87 miles ** ** Fastest: 95 miles ** ** Late: 72 miles ** ** Speed difference: 22 miles **

Rockies ** Average ball speed: 89 miles ** ** Fastest: 96 miles ** ** Late: 71 miles ** ** Speed difference: 24 miles ** I found out that.

Rockies has five pitchers, so it's natural that there is a difference in the tendency.

Next, let's look at the change in ball speed.

Clayton Kershaw (Dodgers) ball type ratio

`baseball_analysis.ipynb`


df = pitchDF.loc[(pitchDF['frame']=='top')]

df.pitchtype.value_counts().plot(kind='pie', autopct="%.1f%%", pctdistance=0.8)
plt.axis('equal')
plt.axis('off')
plt.title('Ball type ratio')
plt.show()

Clayton Kershaw (Dodgers) 4-seam results

`baseball_analysis.ipynb`


df = pitchDF.loc[(pitchDF['pitchtype']=='4-seam fb') & (pitchDF['frame']=='top')]

df.des.value_counts().plot(kind='pie', autopct="%.1f%%", pctdistance=0.8)
plt.axis('equal')
plt.axis('off')
plt.title('4-seam event results')
plt.show()

Clayton Kershaw (Dodgers) slider results

`baseball_analysis.ipynb`


df = pitchDF.loc[(pitchDF['pitchtype']=='slider') & (pitchDF['frame']=='top')]

df.des.value_counts().plot(kind='pie', autopct="%.1f%%", pctdistance=0.8)
plt.axis('equal')
plt.axis('off')
plt.title('slider event result')
plt.show()

Clayton Kershaw (Dodgers) curve results

`baseball_analysis.ipynb`


df = pitchDF.loc[(pitchDF['pitchtype']=='curveball') & (pitchDF['frame']=='top')]

df.des.value_counts().plot(kind='pie', autopct="%.1f%%", pctdistance=0.8)
plt.axis('equal')
plt.axis('off')
plt.title('curveball event result')
plt.show()

Clayton Kershaw (Dodgers) change-up results

`baseball_analysis.ipynb`


df = pitchDF.loc[(pitchDF['pitchtype']=='changeup') & (pitchDF['frame']=='top')]

df.des.value_counts().plot(kind='pie', autopct="%.1f%%", pctdistance=0.8)
plt.axis('equal')
plt.axis('off')
plt.title('changeup event result')
plt.show()

Comparing the out rates for each type of ball, ** 4 seams: 35.7% ** ** Slider: 18.8% ** ** Curve: 22.3% ** ** Changeup: 0% ** I found out that.

The four seams, which account for half of the number of pitches, are pretty good.

Next, let's look at the ball distribution by count.

Clayton Kershaw (Dodgers) ball distribution by count

`baseball_analysis.ipynb`


titleList = []
dataList = []

fig, axes = plt.subplots(4, 3, figsize=(12,16))

#Count creation
for b in range(4):
    for s in range(3):
        df = pitchDF.loc[(pitchDF['ballCount']==b) & (pitchDF['strikeCount']==s) & (pitchDF['frame']=='top')]
        title = "Count:" + str(b) + "-" + str(s) + " (" + str(len(df)) + ")"
        titleList.append(title)
        dataList.append(df)

for i, ax in enumerate(axes.flatten()):
    x = dataList[i].pitchtype.value_counts()
    l = dataList[i].pitchtype.unique()

    ax.pie(x, autopct="%.1f%%", pctdistance=0.9, labels=l)
    ax.set_title(titleList[i])

plt.show()

Well, almost 4 seams.

Next, let's look at the results by count.

Clayton Kershaw (Dodgers) Count Results

`baseball_analysis.ipynb`


titleList = []
dataList = []

fig, axes = plt.subplots(4, 3, figsize=(12,16))

for b in range(4):
    for s in range(3):
        df = pitchDF.loc[(pitchDF['ballCount']==b) & (pitchDF['strikeCount']==s) & pitchDF['des'] & (pitchDF['frame']=='top')]
        title = "Count:" + str(b) + "-" + str(s) + " (" + str(len(df)) + ")"
        titleList.append(title)
        dataList.append(df)

for i, ax in enumerate(axes.flatten()):
    x = dataList[i].des.value_counts()
    l = dataList[i].des.unique()

    ax.pie(x, autopct="%.1f%%", pctdistance=0.9, labels=l)
    ax.set_title(titleList[i])

plt.show()

You can see that there is a high probability of a strike judgment and In play outs (out as a result of the ball flying to the field) at any count.

Conclusion

--There are many first-ball strikes, and we have a favorable count (we have taken quite a bit before it became advantageous).

--There is a strong tendency for four seams to be distributed

--Be careful of sliders that come unexpectedly (probably vertical cracks)

Summary

I was familiar with the characteristics of pitcher Clayton to some extent, but I couldn't understand the reason why he got a no-hitter no-run without comparing with other games. You will also need a record of past battles with batters. Pitcher Clayton had good control and pitched only 107 pitches in this match. MLB has more games than NPB and throws through the season in the middle of the 4th, so even if the pitcher is a good pitcher, there is a tendency to drop at around 120 pitches due to pitching restrictions. Therefore, good control may be the most important factor in achieving a no-hitter no-run in major leagues. It's been a long time, but thank you for reading this far. If you find any mistakes, I would be very grateful if you could point them out in the comments.

[Python] I tried to analyze the pitcher who achieved no hit no run

Overview

environment

Start analysis (play ball)

First, launch Jupyter Notebook with Anaconda Prompt

Then import the required libraries

baseball_analysis.ipynb

I will create a data frame for analysis from now on

baseball_analysis.ipynb

Acquisition of player information

baseball_analysis.ipynb

Data acquisition for each inning

baseball_analysis.ipynb

Data frame confirmation

baseball_analysis.ipynb

Strike zone creation

baseball_analysis.ipynb

Added strike ball judgment to data frame

baseball_analysis.ipynb

Data frame confirmation

baseball_analysis.ipynb

Clayton Kershaw (Dodgers) pitching tendency

baseball_analysis.ipynb

Rockies pitching tendency

baseball_analysis.ipynb

Clayton Kershaw (Dodgers) first ball tendency

baseball_analysis.ipynb

Rockies' first ball tendency

baseball_analysis.ipynb

Clayton Kershaw (Dodgers) ball speed change

baseball_analysis.ipynb

Rockies ball speed change

baseball_analysis.ipynb

Clayton Kershaw (Dodgers) ball type ratio

baseball_analysis.ipynb

Clayton Kershaw (Dodgers) 4-seam results

baseball_analysis.ipynb

Clayton Kershaw (Dodgers) slider results

baseball_analysis.ipynb

Clayton Kershaw (Dodgers) curve results

baseball_analysis.ipynb

Clayton Kershaw (Dodgers) change-up results

baseball_analysis.ipynb

Clayton Kershaw (Dodgers) ball distribution by count

baseball_analysis.ipynb

Clayton Kershaw (Dodgers) Count Results

baseball_analysis.ipynb

Conclusion

Summary

`baseball_analysis.ipynb`

`baseball_analysis.ipynb`

`baseball_analysis.ipynb`

`baseball_analysis.ipynb`

`baseball_analysis.ipynb`

`baseball_analysis.ipynb`

`baseball_analysis.ipynb`

`baseball_analysis.ipynb`

`baseball_analysis.ipynb`

`baseball_analysis.ipynb`

`baseball_analysis.ipynb`

`baseball_analysis.ipynb`

`baseball_analysis.ipynb`

`baseball_analysis.ipynb`

`baseball_analysis.ipynb`

`baseball_analysis.ipynb`

`baseball_analysis.ipynb`

`baseball_analysis.ipynb`

`baseball_analysis.ipynb`

`baseball_analysis.ipynb`

`baseball_analysis.ipynb`