[PYTHON] It seems that a new lottery with a total score will start in the sports lottery,

background

Type Target match Voting type probability %
BIG 14 1,0,2 1/4,782,969 0.00000020907
MEGA BIG 12 1,2,3,4 1/16,777,216 0.0000000596

Main subject

Process flow

  1. Get data from the schedule / results page of Jleague Data Site from 2014 to 2019
  2. Read the CSV file obtained in 1) and create a DataFrame.
  3. In addition, create a separate DataFrame for J1, J2, and J3 only.
  4. Create 4 graphs with matplotlib.
  5. Aggregate the "total score" to 1 point or less, 2 points, 3 points, 4 points or more of "MEGA BIG".
  6. Graph the aggregated results.

code

① Get data from the schedule / result page of Jleague Data Site from 2014 to 2019

Year Tournament Section Match day K / O time Home Score Away Stadium Attendees TV broadcast
0 2014 J1 Section 1 Day 1 03/01 (Sat) 14:04 C Osaka 0-1 Hiroshima Yanmar 37079 SKY PerfecTV! / SKY PerfecTV! Premium Service / NHK General
1 2014 J1 Section 1 Day 1 03/01 (Sat) 14:04 Nagoya 2-3 Shimizu Toyota Su 21657 SKY PerfecTV! / SKY PerfecTV! Premium Service / NHK Nagoya / NHK Shizuoka
2 2014 J1 Section 1 Day 1 03/01 (Sat) 14:05 Tosu 5-0 Tokushima Bearsta 14296 SKY PerfecTV! / SKY PerfecTV! Premium Service / NHK Tokushima / NHK Saga
3 2014 J1 Section 1 Day 1 03/01 (Sat) 14:05 Kofu 0-4 Kashima National 13809 SKY PerfecTV! / SKY PerfecTV! Premium Service / NHK Kofu / NHK Mito
4 2014 J1 Section 1 Day 1 03/01 (Sat) 14:05 Sendai 1-2 Niigata Yurtec 15852 SKY PerfecTV! / SKY PerfecTV! Premium Service / NHK Sendai / NHK Niigata

(2) Read the CSV file obtained in (1) and create a DataFrame.

col_name = ['year','Tournament','section','Match day','K/O time','home','Score','Away','Stadium','Visitors','TV broadcast']
results = pd.DataFrame(index=[], columns=col_name)

for f in files:
    tmp_data = pd.read_csv(f, sep=',', encoding='utf-8')
    results = results.append(tmp_data, ignore_index=True, sort=False)

③ In addition, create DataFrame separately for J1, J2, and J3 only.

#Total score of data for J1, J2, and J3 only
score_J1 = score_data[score_data['Tournament'] == 'J1']
idx_J1 = sorted(score_J1['Total score'].unique())
scoreJ1 = pd.DataFrame({'Total score':idx_J1, 'cnt':score_J1['Total score'].value_counts()}, index=idx_J1)
scoreJ1 = scoreJ1.reset_index().drop('index', axis=1)

score_J2 = score_data[score_data['Tournament'] == 'J2']
idx_J2 = sorted(score_J2['Total score'].unique())
scoreJ2 = pd.DataFrame({'Total score':idx_J2, 'cnt':score_J2['Total score'].value_counts()}, index=idx_J2)
scoreJ2 = scoreJ2.reset_index().drop('index', axis=1)

score_J3 = score_data[score_data['Tournament'] == 'J3']
idx_J3 = sorted(score_J3['Total score'].unique())
scoreJ3 = pd.DataFrame({'Total score':idx_J3, 'cnt':score_J3['Total score'].value_counts()}, index=idx_J3)
scoreJ3 = scoreJ3.reset_index().drop('index', axis=1)

④ Create 4 graphs with matplotlib.

#Graph J1, J2, J3, and the whole
fig = plt.figure(figsize=(16,9),dpi=144)
fig.subplots_adjust(hspace=0.4)

#Original graph style settings
plt.style.use("mystyle")
plt.rcParams["font.family"] = "IPAexGothic"

#For storing graph objects
axes = []
score_list = [scoreJ1['Total score'], scoreJ2['Total score'], scoreJ3['Total score'], score_all['Total score']]
cnt_list = [scoreJ1['cnt'], scoreJ2['cnt'], scoreJ3['cnt'], score_all['cnt']]
cat_list = ['J1', 'J2', 'J3', 'ALL']

#Loop through 4 graphs of J1, J2, J3, ALL
for i in range(4):
    axes.append(fig.add_subplot(4,1,i+1))
    axes[i].bar(score_list[i], cnt_list[i])
    [axes[i].text(score_list[i][s], cnt_list[i][s]+25, str(score), size=12, color='r', ha='center') for s, score in enumerate(cnt_list[i])]
    axes[i].set_xticks(np.arange(0,16,1))
    axes[i].set_ylabel(cat_list[i])
    axes[i].set_ylim(0,1500)
    axes[i].text(15-1, 1500-200, 'n:'+str(sum(cnt_list[i])))

plt.xlabel('Total score')

txt1 = 'I tried to visualize the total score of the match in the J League.'
fig.text(.05, .9, txt1, fontsize=32, horizontalalignment="left")
txt2 = "Source: JLeague Data Site"
fig.text(.9, .05, txt2, fontsize=14, horizontalalignment="right")

plt.savefig('./img/score.png')
plt.show()
score.png

⑤ Add "Mega" column to "Total score" with 1 point or less, 2 points, 3 points, 4 points or more of "MEGA BIG"

#Make MEGA score classification
def mega(df):
    if df in (2, 3):
        return df
    elif df <= 1:
        return 1
    elif df >=4:
        return 4

score_data['Mega'] = score_data['Total score'].apply(mega)

⑥ Make a graph of the aggregated results. mega_plot.png

Summary

Recommended Posts

It seems that a new lottery with a total score will start in the sports lottery,
It seems that some RHEL will be free with a big boo for the end of CentOS
[Python] Leave only the elements that start with a specific character string in the array
A programming language that young people will need in the future
[Python] A program that rounds the score
Word count that counts only words that start with a capital letter in python
I tried to predict the horses that will be in the top 3 with LightGBM
If you guys in the scope kitchen can do it with a margin ~ ♪
A model that identifies the guitar with fast.ai
[Python] Get the files in a folder with Python
Create a new page in confluence with Python
It seems that unidic-lite is required in mecab-python3
The story that fits in with pip installation
[Selenium] Open the link in a new tab and move it [Python / Chrome Driver]
Create a flag in settings that will be True only when testing with Django
A story that struggled with the common set HTTP_PROXY = ~
Tasks at the start of a new python project
A server that returns the number of people in front of the camera with bottle.py and OpenCV
About the case that it became a Chinese font after updating with Linux (correction method)
Recursively get the Excel list in a specific folder with python and write it to Excel.
A story that makes it easier to see Model debugging in the Django + SQLAlchemy environment
[VLC] How to deal with the problem that it is not in the foreground during playback