[PYTHON] Basic calculation of pandas to enjoy Hakone Ekiden while competing with the best members of all time

Trigger

"Speaking of the fun of Japanese New Year, Hakone Ekiden seen in the living room! Isn't everyone excited about this New Year? " Not really for Hakone Ekiden fans.

Still, while watching TV with my enthusiastic family, "A fast university changes every year. It's difficult for one university to have 10 fast players in the same year." I was a little intrigued by that point.

This year, I happened to be at XX University (hidden words to prevent spoilers), but The fastest members of all time, who are called "mountain gods" or "somehow gods" of each university, If you gathered in the same age and ran the Hakone Ekiden against the university Which university will be number one?

I would like to search for "Taraba" from past data.

environment

Libraries that were taken care of


import sys
import os
import glob
import requests
from bs4 import BeautifulSoup
from collections import defaultdict

import numpy
import pandas

Way of thinking

For each university, calculate the fastest record in history for each section at [^ 1] per hour, If you apply it to the course of the latest 92nd competition, you will find the fastest university! ??

[^ 1]: The section distance changes for each tournament, so for comparison.

Target

-"31st-91st Tournament" --Calculate the maximum speed for each university and section with "Section distance" / "Time required" ↓ ――Run the 92nd tournament at the speed calculated for each section

procedure

Get html

From "Hakone Ekiden Official Website"

--Obtain the html of the record page and check the participating schools for each tournament

get_html.py


#!/usr/bin/env python
# -*- coding: utf-8 -*-

import os
import glob
import requests
from bs4 import BeautifulSoup

# User-Agent
headers = {
    'User-Agent': 'Mozilla/5.0 (Windows NT 6.2; WOW64; rv:19.0) Gecko/20100101 Firefox/19.0'
    #'From': '[email protected]'  # This is another valid field
}

"""
# URL example
http://www.hakone-ekiden.jp/data/data_race.php?racenum=31
"""

def crawl_site(site_url):
    #Generate crawl target URL
    tocrawl = [site_url+'?racenum='+str(i) for i in xrange(31, 92)]
    return tocrawl

def prepare_html(tocrawl, html_path, identifier_prefix):
    crawled = []
    for i,site_url in enumerate(tocrawl):
        print 'prepare_html:', site_url
        r = requests.get(site_url, headers=headers)
        filename = html_path + identifier_prefix + site_url.split('=')[1] + '.html'
        f = open(filename, 'w')
        f.write(r.content)
        f.close()

if __name__ == '__main__':

    # settings
    site_url = 'http://www.hakone-ekiden.jp/data/data_race.php'
    html_path = './html/'
    identifier_prefix = 'hakone'

    if not os.path.exists(html_path):
        os.mkdir(html_path)
  
    # prepare html files  
    num_html_files = len(glob.glob(html_path + identifier_prefix + '*.html'))
    print 'num_html_files: ' + str(num_html_files)
    if num_html_files == 0:
        tocrawl = crawl_site(site_url)
        prepare_html(tocrawl, html_path, identifier_prefix)

From the html that was acquired and downloaded locally, the participating schools for each tournament are requested. Next, download the html of "List of players by university" and

Example_31st Tournament_1955_Waseda University


http://www.hakone-ekiden.jp/data/data_univ.php?race=31&univ=50

From there, relying a little on Beautiful Soup,

――Obtain successive section records for each participating school

Correspondence between university and id, participating universities for each competition


print universities
print len(universities)
print
print universities_each_year
print len(universities_each_year)

{4: u'\u6771\u4eac\u6559\u80b2\u5927\u5b66', 9: u'\u95a2\u6771\u5b66\u9023\u9078\u629c', 10: u'\u65e5\u672c\u5b66\u9023\u9078\u629c', 11: u'\u9752\u5c71\u5b66\u9662\u5927\u5b66', 12: u'\u4e9c\u7d30\u4e9c\u5927\u5b66', 13: u'\u795e\u5948\u5ddd\u5927\u5b66', 15: u'\u95a2\u6771\u5b66\u9662\u5927\u5b66', 16: u'\u6176\u5fdc\u7fa9\u587e\u5927\u5b66', 17: u'\u570b\u5b78\u9662\u5927\u5b66', 18: u'\u56fd\u58eb\u8218\u5927\u5b66', 19: u'\u99d2\u6fa4\u5927\u5b66', 20: u'\u57fc\u7389\u5927\u5b66', 21: u'\u9806\u5929\u5802\u5927\u5b66', 22: u'\u57ce\u897f\u5927\u5b66', 24: u'\u5c02\u4fee\u5927\u5b66', 25: u'\u5927\u6771\u6587\u5316\u5927\u5b66', 26: u'\u62d3\u6b96\u5927\u5b66', 27: u'\u4e2d\u592e\u5b66\u9662\u5927\u5b66', 28: u'\u4e2d\u592e\u5927\u5b66', 29: u'\u7b51\u6ce2\u5927\u5b66', 30: u'\u5e1d\u4eac\u5927\u5b66', 31: u'\u6771\u6d77\u5927\u5b66', 32: u'\u6771\u4eac\u5b66\u82b8\u5927\u5b66', 33: u'\u6771\u4eac\u5927\u5b66', 34: u'\u6771\u4eac\u8fb2\u696d\u5927\u5b66', 36: u'\u6771\u6d0b\u5927\u5b66', 38: u'\u65e5\u672c\u5927\u5b66', 39: u'\u65e5\u672c\u4f53\u80b2\u5927\u5b66', 40: u'\u798f\u5ca1\u5927\u5b66', 41: u'\u5e73\u6210\u56fd\u969b\u5927\u5b66', 42: u'\u9632\u885b\u5927\u5b66', 43: u'\u6cd5\u653f\u5927\u5b66', 44: u'\u660e\u6cbb\u5927\u5b66', 45: u'\u5c71\u68a8\u5b66\u9662\u5927\u5b66', 46: u'\u6a2a\u6d5c\u5e02\u7acb\u5927\u5b66', 47: u'\u6a2a\u6d5c\u56fd\u7acb\u5927\u5b66', 48: u'\u7acb\u6559\u5927\u5b66', 49: u'\u7acb\u547d\u9928\u5927\u5b66', 50: u'\u65e9\u7a32\u7530\u5927\u5b66', 78: u'\u4e0a\u6b66\u5927\u5b66', 94: u'\u5275\u4fa1\u5927\u5b66', 95: u'\u95a2\u6771\u5b66\u751f\u9023\u5408'}
42

defaultdict(<type 'list'>, {31: [28, 38, 50, 4, 43, 24, 36, 48, 34, 39, 13, 32, 46, 47, 26], 32: [28, 38, 4, 43, 39, 50, 48, 24, 36, 34, 32, 13, 46, 44, 47], 33: [38, 28, 48, 4, 50, 43, 39, 24, 34, 36, 32, 18, 46, 44, 13], 34: [38, 28, 4, 39, 43, 50, 48, 36, 24, 21, 34, 18, 32, 13, 46], 35: [28, 38, 4, 48, 43, 50, 39, 36, 24, 34, 18, 21, 32, 20, 13, 16], 36: [28, 38, 36, 4, 50, 24, 48, 43, 39, 44, 18, 34, 21, 32, 13], 37: [28, 38, 24, 39, 44, 36, 50, 43, 34, 21, 18, 4, 32, 48, 42], 38: [28, 44, 39, 24, 38, 48, 34, 43, 36, 18, 21, 50, 4, 16, 13], 39: [28, 44, 38, 39, 21, 18, 43, 4, 24, 50, 36, 34, 16, 48, 42], 40: [28, 38, 18, 36, 21, 39, 50, 43, 44, 24, 4, 48, 34, 16, 46, 49, 40], 41: [38, 28, 21, 18, 39, 50, 24, 44, 36, 4, 48, 43, 34, 11, 13], 42: [21, 38, 39, 28, 18, 36, 50, 24, 43, 44, 48, 4, 11, 13, 16], 43: [38, 21, 18, 28, 39, 36, 24, 43, 4, 50, 12, 13, 19, 11, 44], 44: [38, 39, 21, 36, 28, 18, 11, 12, 43, 50, 48, 4, 24, 19, 25], 45: [39, 38, 21, 18, 36, 24, 25, 43, 11, 12, 28, 4, 19, 50, 44], 46: [39, 21, 38, 18, 25, 24, 36, 12, 28, 19, 4, 11, 43, 13, 26], 47: [39, 21, 38, 18, 12, 36, 25, 28, 19, 24, 4, 11, 50, 43, 16], 48: [39, 38, 25, 21, 18, 28, 24, 36, 19, 12, 34, 4, 11, 26, 44], 49: [39, 25, 38, 21, 18, 28, 12, 34, 19, 24, 36, 4, 11, 31, 50], 50: [38, 25, 21, 34, 39, 18, 28, 36, 31, 12, 24, 19, 4, 11, 43, 50, 44, 26, 16, 13], 51: [25, 21, 39, 34, 38, 28, 18, 36, 19, 31, 29, 12, 24, 11, 50], 52: [25, 39, 34, 28, 21, 38, 19, 18, 12, 36, 29, 24, 31, 43, 11], 53: [39, 34, 25, 21, 38, 31, 19, 28, 36, 24, 12, 18, 50, 43, 26], 54: [39, 21, 25, 34, 38, 50, 28, 43, 36, 31, 24, 18, 19, 26, 29], 55: [21, 39, 25, 50, 38, 36, 34, 26, 24, 18, 31, 19, 28, 43, 12], 56: [39, 21, 50, 25, 38, 34, 36, 29, 24, 31, 19, 18, 26, 43, 28], 57: [21, 39, 25, 34, 50, 29, 38, 19, 24, 36, 28, 31, 18, 43, 26], 58: [21, 39, 38, 25, 50, 36, 29, 28, 24, 18, 31, 12, 19, 26, 34], 59: [39, 50, 21, 38, 31, 25, 36, 29, 34, 28, 18, 19, 12, 24, 43], 60: [50, 39, 21, 25, 38, 31, 34, 36, 29, 24, 28, 19, 18, 12, 26, 43, 33, 44, 32, 16], 61: [50, 21, 39, 38, 25, 28, 34, 29, 36, 24, 19, 18, 31, 12, 44], 62: [21, 50, 25, 19, 34, 39, 31, 28, 18, 24, 29, 38, 36, 12, 44], 63: [21, 39, 28, 38, 25, 31, 24, 50, 29, 36, 18, 19, 34, 44, 45], 64: [21, 25, 39, 38, 28, 34, 18, 31, 50, 44, 45, 19, 36, 24, 29], 65: [21, 39, 28, 25, 38, 19, 45, 31, 34, 50, 44, 18, 29, 36, 43], 66: [25, 38, 28, 45, 21, 39, 18, 31, 50, 43, 24, 34, 36, 19, 12], 67: [25, 45, 28, 38, 39, 21, 34, 31, 19, 43, 50, 24, 18, 44, 36], 68: [45, 38, 21, 28, 25, 50, 24, 19, 34, 31, 39, 43, 18, 13, 12], 69: [50, 45, 28, 24, 38, 19, 43, 13, 21, 39, 36, 34, 12, 31, 25], 70: [45, 50, 21, 28, 31, 24, 13, 39, 38, 43, 19, 12, 34, 18, 36, 27, 15, 25, 16, 29], 71: [45, 50, 28, 38, 39, 13, 24, 31, 34, 36, 12, 27, 19, 25, 21], 72: [28, 50, 21, 31, 25, 43, 12, 34, 39, 24, 36, 19, 38, 45, 13], 73: [13, 45, 25, 28, 50, 19, 36, 31, 21, 39, 24, 12, 26, 43, 34], 74: [13, 19, 45, 28, 21, 50, 38, 26, 25, 36, 39, 24, 15, 31, 30], 75: [21, 19, 13, 28, 31, 45, 25, 38, 36, 50, 26, 39, 27, 43, 30], 76: [19, 21, 28, 30, 38, 50, 31, 13, 45, 43, 39, 25, 15, 26, 36], 77: [21, 19, 28, 43, 13, 25, 30, 38, 45, 50, 39, 26, 41, 17, 31], 78: [19, 21, 50, 28, 25, 13, 12, 30, 45, 38, 39, 15, 24, 31, 43], 79: [19, 45, 38, 25, 28, 36, 31, 21, 39, 27, 13, 26, 30, 17, 50, 43, 12, 15, 24, 9], 80: [19, 31, 12, 43, 21, 36, 28, 13, 39, 38, 27, 45, 25, 30, 34, 50, 18, 15, 22, 10], 81: [19, 39, 38, 28, 21, 31, 12, 43, 27, 13, 50, 25, 36, 45, 22, 30, 24, 44, 26, 9], 82: [12, 45, 38, 21, 19, 31, 43, 28, 39, 36, 22, 25, 50, 17, 24, 13, 27, 44, 18, 9], 83: [21, 38, 31, 39, 36, 50, 19, 28, 24, 12, 22, 45, 27, 25, 43, 44, 13, 17, 18, 9], 84: [19, 50, 27, 9, 12, 45, 28, 30, 38, 36, 22, 39, 18, 24, 13, 43, 34, 31, 25, 21], 85: [36, 50, 39, 25, 27, 45, 38, 44, 9, 28, 18, 34, 19, 24, 13, 12, 26, 31, 21, 30, 78, 11, 22], 86: [36, 19, 45, 28, 34, 22, 50, 11, 39, 44, 30, 31, 27, 78, 38, 9, 24, 25, 43, 12], 87: [50, 36, 19, 31, 44, 28, 26, 39, 11, 17, 22, 45, 30, 34, 13, 27, 24, 9, 78, 38], 88: [36, 19, 44, 50, 11, 22, 21, 28, 45, 17, 18, 31, 30, 26, 13, 78, 9, 27, 39, 34], 89: [39, 36, 19, 30, 50, 21, 44, 11, 43, 27, 45, 25, 9, 17, 38, 13, 34, 78, 22, 28], 90: [36, 19, 39, 50, 11, 44, 38, 30, 26, 25, 43, 27, 31, 34, 28, 21, 17, 13, 22, 78, 24, 18, 45], 91: [11, 19, 36, 44, 50, 31, 22, 27, 45, 25, 30, 21, 38, 17, 39, 26, 13, 78, 28, 94, 95]})
61

It seems that there are 42 unique participating schools in the past (* However, the universities before and after the name change are counted separately (because it seems that a different id is assigned on the official page)).

Convert a character string such as "1 hour 06 minutes 49 seconds" read from the table. (There seems to be a good library, but ...)

python


def convert_hour(hour_in_ja):
    minutes_per_hour = 60.0
    seconds_per_minute = 60.0
    try:
        hour = int(hour_in_ja.split(u'time')[0])
        minute = int(hour_in_ja.split(u'time')[1].split(u'Minutes')[0])
        second = int(hour_in_ja.split(u'time')[1].split(u'Minutes')[1].split(u'Seconds')[0])
    except:
        print 'inside except:', hour_in_ja
        return float('NaN')

    hour_in_num = hour + minute/minutes_per_hour + second/(minutes_per_hour*seconds_per_minute)
    return hour_in_num

Store the historical section record for each participating school in the dict of pandas.DataFrame (unit: h)

python


dict_of_dataframe = {}
for a_univ_id in universities.keys():
    # make data frame for each univ
    df = pandas.DataFrame(df_array, index=df_index, columns=df_columns)
    ....
    
    # assign df to a univ
    dict_of_dataframe[a_univ_id] = df

For example

python


#Meiji University
print dict_of_dataframe[44]

          1         2         3         4         5         6         7   \
32  1.280556  1.247500  1.321389  1.288056  1.772222  1.436111  1.356944   
33  1.241389  1.172500  1.278889  1.241389  1.689167  1.406944  2.146944   
36  1.190556  1.114167  1.371389  1.180278  1.515556  1.280833  1.136111   
37  1.150556  1.313333  1.094444  1.145833  1.498611  1.265556  1.157778   
38  1.139444  1.417778  1.135000  1.184444  1.479722  1.229444  1.154722   
39  1.133056  1.362500  1.168889  1.096667  1.470000  1.188056  1.073889   
40  1.117222  1.338611  1.129444  1.082222  1.419444  1.169167  1.142222   
41  1.150278  1.333611  1.095556  1.122222  1.500278  1.178056  1.121389   
42  1.109444  1.344444  1.132778  1.145556  1.453333  1.114167  1.174722   
43  1.145833  1.365556  1.128333  1.648889  1.328889  1.017222  1.265278   
45  1.256667  1.367778  1.303611  1.499167  1.517500  1.093889  1.183056   
48  1.226667  1.473611  1.197500  1.200556  1.325556  1.087222  1.261667   
50  1.160000  1.403889  1.173889  1.216111  1.417222  1.161944  1.229444   
60  1.121667  1.316111  1.211667  1.204444  1.348889  1.084722  1.224444   
61  1.145278  1.203889  1.190833  1.168611  1.346389  1.053611  1.207500   
62  1.158056  1.244722  1.177222  1.150833  1.299444  1.135278  1.173889   
63  1.094444  1.224444  1.125833  1.173333  1.296111  1.079444  1.155833   
64  1.093611  1.221667  1.125556  1.152778  1.318611  1.052222  1.133889   
65  1.124722  1.233056  1.129167  1.153333  1.304722  1.039167  1.150833   
67  1.099722  1.148333  1.088889  1.135833  1.282778  1.134444  1.198889   
81  1.070556  1.179167  1.093889  1.078611  1.267500  1.024444  1.133889   
82  1.066667  1.206111  1.088611  0.952778  1.399444  1.030556  1.094167   
83  1.097778  1.156667  1.095000  0.971111  1.344167  1.018056  1.120278   
85  1.081389  1.163056  1.081389  0.924444  1.399167  0.993333  1.086111   
86  1.040833  1.148889  1.052222  0.932500  1.454722  1.020000  1.108611   
87  1.075278  1.126667  1.056389  0.946944  1.331111  1.028889  1.091111   
88  1.046111  1.146389  1.068333  0.914722  1.326111  1.005278  1.069722   
89  1.062778  1.195278  1.107500  0.972500  1.405556  0.971944  1.083611   
90  1.033889  1.154167  1.057500  0.926667  1.428056  0.971111  1.078611   
91  1.035278  1.132222  1.044722  0.916667  1.353611  0.998333  1.095833   

          8         9         10  
32  1.347222  1.340278  1.350833  
33  1.424444  1.257500  1.465556  
36  1.361667  1.160833  1.299167  
37  1.130833  1.319167  1.218056  
38  1.139722  1.383333  1.173333  
39  1.136667  1.332222  1.112500  
40  1.136389  1.353889  1.141111  
41  1.145556  1.322778  1.122778  
42  1.156111  1.324167  1.155000  
43  1.234167  1.436944  1.178889  
45  1.140278  1.263889  1.221944  
48  1.343056  1.547500  1.305833  
50  1.321667  1.518611  1.300833  
60  1.213889  1.373889  1.220556  
61  1.168889  1.273611  1.151389  
62  1.295833  1.277778  1.170556  
63  1.155833  1.248611  1.167778  
64  1.150556  1.270556  1.120833  
65  1.191389  1.258889  1.137222  
67  1.207778  1.233056  1.273889  
81  1.129444  1.206111  1.289444  
82  1.161389  1.216111  1.244722  
83  1.165278  1.244722  1.252778  
85  1.124167  1.231944  1.219444  
86  1.139722  1.229722  1.238611  
87  1.116944  1.186667  1.177500  
88  1.090278  1.194167  1.186111  
89  1.127222  1.258056  1.236111  
90  1.089167  1.192778  1.237222  
91  1.120556  1.149444  1.185833

Obtain successive section distances (unit: km)

--Create pandas.DataFrame manually from "Transition of Hakone Ekiden Distance" --pandas.DataFrame. Row: 61 tournaments (31st-91st tournament), columns: 10 sections (1-10 wards)

load_distance.py


#!/usr/bin/env python
# -*- coding: utf-8 -*-

import numpy
import pandas

df_array = numpy.array([
 [22.3, 21.2, 22.1, 21.0, 25.1, 25.1, 21.0, 22.1, 21.2, 22.3],
 [22.3, 21.2, 22.1, 21.0, 25.1, 25.1, 21.0, 22.1, 21.2, 22.3],
 [22.3, 21.2, 22.1, 21.0, 25.1, 25.1, 21.0, 22.1, 21.2, 22.3],
 [22.3, 21.2, 22.1, 21.0, 25.1, 25.1, 21.0, 22.1, 21.2, 22.3],
 [22.3, 21.2, 22.1, 21.0, 25.1, 25.1, 21.0, 22.1, 21.2, 22.3],
 [22.3, 20.5, 24.7, 20.1, 25.1, 25.1, 20.1, 24.7, 20.5, 22.3],
 [22.7, 23.7, 24.7, 20.1, 25.1, 25.1, 20.1, 24.7, 23.7, 22.7],
 [22.7, 23.7, 19.9, 20.7, 25.1, 25.1, 20.7, 19.9, 23.7, 22.7],
 [22.7, 23.7, 21.4, 19.4, 25.1, 25.1, 19.4, 21.4, 23.7, 22.7],
 [22.7, 23.7, 21.4, 20.4, 24.7, 24.7, 20.2, 21.4, 23.7, 22.7],
 [22.7, 23.7, 21.4, 20.4, 24.7, 24.7, 20.2, 21.4, 23.7, 22.7],
 [22.7, 23.7, 21.4, 21.4, 23.7, 23.7, 21.2, 21.4, 23.7, 22.7],
 [22.7, 23.7, 21.4, 21.4, 21.9, 21.9, 21.2, 21.4, 23.7, 22.7],
 [22.7, 23.7, 21.4, 21.4, 21.9, 21.9, 21.2, 21.4, 23.7, 22.7],
 [21.7, 24.7, 20.2, 23.2, 21.9, 21.9, 23.0, 20.2, 24.7, 21.7],
 [21.6, 24.7, 21.5, 23.2, 22.0, 22.0, 21.7, 21.5, 24.7, 21.6],
 [21.6, 24.7, 21.5, 23.2, 22.0, 22.0, 21.7, 21.5, 24.7, 21.6],
 [21.8, 25.2, 22.2, 21.9, 21.4, 21.4, 21.9, 22.2, 25.2, 21.8],
 [21.8, 25.2, 22.2, 21.9, 21.4, 21.4, 21.9, 22.2, 25.2, 21.8],
 [21.8, 25.2, 22.2, 21.9, 21.4, 21.4, 21.9, 22.2, 25.2, 21.8],
 [21.8, 25.2, 22.2, 21.9, 21.4, 21.4, 21.9, 22.2, 25.2, 21.8],
 [21.8, 25.2, 22.2, 21.9, 21.4, 21.4, 21.9, 22.2, 25.2, 21.8],
 [21.8, 25.2, 22.2, 21.9, 21.4, 21.4, 21.9, 22.2, 25.2, 21.8],
 [21.8, 25.2, 22.2, 21.9, 21.4, 21.4, 21.9, 22.2, 25.2, 21.8],
 [21.8, 24.4, 22.2, 21.9, 21.4, 21.4, 21.9, 22.2, 24.4, 21.8],
 [21.8, 24.4, 22.2, 21.9, 21.4, 21.4, 21.9, 22.2, 24.4, 21.8],
 [21.8, 24.4, 22.2, 21.9, 21.4, 21.4, 21.9, 22.2, 24.4, 21.8],
 [21.8, 24.4, 22.2, 21.9, 21.4, 21.4, 21.9, 22.2, 24.4, 21.8],
 [21.8, 24.4, 22.2, 21.0, 20.5, 21.4, 21.9, 22.2, 24.4, 21.8],
 [21.8, 24.4, 22.2, 21.0, 20.5, 20.5, 21.3, 22.2, 24.4, 21.8],
 [21.8, 24.4, 22.2, 21.0, 20.5, 20.5, 21.3, 22.2, 24.4, 21.8],
 [21.8, 22.7, 22.2, 21.0, 20.6, 20.6, 21.3, 22.2, 22.7, 21.8],
 [21.8, 22.7, 22.2, 21.0, 20.6, 20.6, 21.3, 22.2, 22.7, 21.8],
 [21.8, 22.7, 22.2, 21.0, 20.6, 20.6, 21.3, 22.2, 22.7, 21.8],
 [21.8, 22.7, 22.2, 21.0, 20.6, 20.6, 21.3, 22.2, 22.7, 21.8],
 [21.8, 22.7, 22.2, 21.0, 20.6, 20.6, 21.3, 22.2, 22.7, 21.8],
 [21.8, 22.7, 22.2, 21.0, 20.6, 20.6, 21.3, 22.2, 22.7, 21.3],
 [21.8, 22.7, 22.2, 21.0, 20.6, 20.6, 21.3, 22.2, 22.7, 21.3],
 [21.8, 22.7, 22.2, 21.0, 20.6, 20.6, 21.3, 22.2, 22.7, 21.3],
 [21.8, 22.7, 22.2, 21.0, 20.6, 20.6, 21.3, 22.2, 22.7, 21.3],
 [21.8, 22.7, 22.2, 21.0, 20.6, 20.6, 21.3, 22.2, 22.7, 21.3],
 [21.8, 22.7, 22.2, 21.0, 20.6, 20.6, 21.3, 22.2, 22.7, 21.3],
 [21.8, 22.7, 22.2, 21.0, 20.6, 20.6, 21.3, 22.2, 22.7, 21.3],
 [21.8, 22.7, 22.2, 21.0, 20.6, 20.6, 21.3, 22.2, 22.7, 21.3],
 [21.3, 23.0, 21.3, 20.9, 20.7, 20.7, 21.2, 21.3, 23.0, 23.0],
 [21.3, 23.0, 21.3, 20.9, 20.7, 20.7, 21.2, 21.3, 23.0, 23.0],
 [21.3, 23.0, 21.3, 20.9, 20.7, 20.7, 21.2, 21.3, 23.0, 23.0],
 [21.3, 23.0, 21.3, 20.9, 20.7, 20.7, 21.2, 21.3, 23.0, 23.0],
 [21.3, 23.0, 21.3, 20.9, 20.7, 20.7, 21.2, 21.3, 23.0, 23.0],
 [21.3, 23.0, 21.3, 20.9, 20.7, 20.7, 21.2, 21.3, 23.0, 23.0],
 [21.3, 23.0, 21.3, 21.0, 20.9, 20.7, 21.2, 21.3, 23.0, 23.0],
 [21.4, 23.2, 21.5, 21.0, 23.4, 20.7, 21.2, 21.5, 23.2, 23.1],
 [21.4, 23.2, 21.5, 21.0, 23.4, 20.7, 21.2, 21.5, 23.2, 23.1],
 [21.4, 23.2, 21.5, 21.0, 23.4, 20.7, 21.2, 21.5, 23.2, 23.1],
 [21.4, 23.2, 21.5, 21.0, 23.4, 20.7, 21.2, 21.5, 23.2, 23.1],
 [21.4, 23.2, 21.5, 21.0, 23.4, 20.7, 21.2, 21.5, 23.2, 23.1],
 [21.4, 23.2, 21.5, 21.0, 23.4, 20.7, 21.2, 21.5, 23.2, 23.1],
 [21.4, 23.2, 21.5, 21.0, 23.4, 20.7, 21.2, 21.5, 23.2, 23.1],
 [21.4, 23.2, 21.5, 21.0, 23.4, 20.7, 21.2, 21.5, 23.2, 23.1],
 [21.4, 23.2, 21.5, 21.0, 23.4, 20.7, 21.2, 21.5, 23.2, 23.1],
 [21.3, 23.1, 21.4, 18.5, 23.2, 20.8, 21.3, 21.4, 23.1, 23.0]
])


#print df_array

def load_distance_df():
    df_index = numpy.arange(31,92)
    df_columns = numpy.arange(1,11)
    df = pandas.DataFrame(df_array, index=df_index, columns=df_columns)
    return df

Calculate the section distance / speed for each element and apply the max () function to obtain the highest speed in history for each section for each participating school (unit: km / h)

--pandas.Series. Line: 10 sections (1-10 wards) --Pandas. Convenient division between DataFrames --Treat elements for which no record exists as NaN (Not a Number) --Convert elements with infinite distance / time results to NaN --NaN can be ignored when taking the maximum value for each column (= interval)

make_pandas.py


df_distance = load_distance.load_distance_df()

for univ in dict_of_dataframe:
    # replace inf and -inf
    dict_of_dataframe_velocity[univ] = (df_distance / dict_of_dataframe[univ] ).replace([numpy.inf, -numpy.inf], numpy.nan)

Aside from post-processing, basically df_distance / dict_of_dataframe[univ] It is convenient to be able to calculate the section distance / required time for all tournaments for each university in one line.

make_pandas.py


########
#92nd
# km/(km/h) = h
########

#92nd Tournament Distance for each section(km)
distance_92th = numpy.array([21.3, 23.1, 21.4, 18.5, 23.2, 20.8, 21.3, 21.4, 23.1, 23.0])
series_dist = pandas.Series(distance_92th, index = numpy.arange(1,11))

########
# km/h
########

hour_ranking = {}

for univ in universities:
    max_velocity_for_each_section = dict_of_dataframe_velocity[univ].max()
    #University
    print 'University name: ', universities[univ], 'id: ', univ
    print 'Maximum speed(km/h): ', max_velocity_for_each_section
    print 'The shortest travel time for each section(h): ', series_dist / max_velocity_for_each_section
    print 'Total time required(h): ', sum(series_dist / max_velocity_for_each_section)
    print

    # add data
    hour_ranking[univ] = sum(series_dist / max_velocity_for_each_section)

Excerpt from the print result ...

python


University name:Kanto Gakuren Selection id:  9
Maximum speed(km/h):  1     20.380952
2     20.222760
3     20.145757
4     22.540250
5     17.379823
6     20.891505
7     19.624582
8     19.383922
9     19.473071
10    19.562456
dtype: float64
The shortest travel time for each section(h):  1     1.045093
2     1.142277
3     1.062258
4     0.820754
5     1.334881
6     0.995620
7     1.085373
8     1.104008
9     1.186254
10    1.175722
dtype: float64
Total time required(h):  10.9522406546

Compete in total travel time

For each participating school, using the highest speed ever in each section Run the course of the 92nd tournament (2016) and compare the total required time.

make_pandas.py


for k, v in sorted(hour_ranking.items(), key=lambda x:x[1]):
    print 'University name: ', universities[k], 'Total time required(h): ', v

result

python


University name:Nihon University Total Time required(h):  10.2453132331
University name:Chuo University Total Time required(h):  10.3591586209
University name:Nippon Sport Science University Total Time required(h):  10.3759579539
University name:Waseda University Total time required(h):  10.3766071561
University name:Juntendo University Total Time required(h):  10.4208490227
University name:Senshu University Total time required(h):  10.4760277397
University name:Toyo University Total Time required(h):  10.4816180955
University name:Kokushikan University Total time required(h):  10.5138632405
University name:Meiji University Total Time required(h):  10.5169890222
University name:Hosei University Total Time required(h):  10.6055941866
University name:Tokyo University of Agriculture Total time required(h):  10.6470086075
University name:Daito Bunka University Total Time required(h):  10.6603316782
University name:Komazawa University Total time required(h):  10.6967026896
University name:Yamanashi Gakuin University Total time required(h):  10.7075345827
University name:Tokai University Total Time required(h):  10.7491609299
University name:Kanagawa University Total time required(h):  10.8168044546
University name:Chuo Gakuin University Total time required(h):  10.8412400185
University name:Asia University Total Time required(h):  10.85684661
University name:Josai University Total time required(h):  10.8746325761
University name:Teikyo University Total Time required(h):  10.9042462208
University name:Takushoku University Total Time required(h):  10.9080803745
University name:Kokugakuin University Total time required(h):  10.90929883
University name:Aoyama Gakuin University Total time required(h):  10.9442540793
University name:Kanto Gakuren Selection Total Time required(h):  10.9522406546
University name:University of Tsukuba Total Time required(h):  10.9629683463
University name:Jobu University Total Time required(h):  11.069379971
University name:Tokyo University of Education Total time required(h):  11.1392697521
University name:Rikkyo University Total Time required(h):  11.2065269496
University name:Kanto Gakuin University Total time required(h):  11.2972643755
University name:Kanto Student Union Total time required(h):  11.32
University name:Total time required for selection by Japanese studies(h):  11.3482872531
University name:Keio University Total time required(h):  11.5006612565
University name:Soka University Total Time required(h):  11.5277777778
University name:Heisei Kokusai University Total time required(h):  11.6349612092
University name:Ritsumeikan University Total time required(h):  11.7104078624
University name:Fukuoka University Total time required(h):  11.7658274859
University name:Tokyo Gakugei University Total time required(h):  11.8272461166
University name:The University of Tokyo Total Time required(h):  12.1214438077
University name:Yokohama City University Total time required(h):  12.4901800886
University name:National Defense Academy Total Time required(h):  12.542251172
University name:Saitama University Total time required(h):  13.028733025
University name:Yokohama National University Total time required(h):  13.1778038298

--Is Nihon University 1st? --Aoyama Gakuin is in the middle?

However, I don't feel like actively considering it because of the following problems.

Excuse or problem with the name TODO

--There is a certain amount of data that could not be read into pandas due to garbled html characters. --Some html encoded with CP51932 --Insufficient data? -Record of the 83rd tournament, Kanto Gakuren selection is Kawauchi in the 6th ward (that famous citizen) Runner!) Only

Caution

――This article was not written with the intention of manipulating the honor of successive participating schools. -Since there are problems as pointed out in [Problems], please consider the obtained results as a reference only.

Link

-Hakone Ekiden Official Website -Transition of Hakone Ekiden distance --For the 31st and subsequent tournaments, where all the distances of all sections are recorded. -Character code problem

footnote

Recommended Posts

Basic calculation of pandas to enjoy Hakone Ekiden while competing with the best members of all time
Reformat the timeline of the pandas time series plot with matplotlib
I wrote the basic operation of Pandas with Jupyter Lab (Part 1)
I wrote the basic operation of Pandas with Jupyter Lab (Part 2)
What you want to memorize with the basic "string manipulation" grammar of python
Type conversion of multiple columns of pandas DataFrame with astype at the same time
Make it easy to specify the time of AWS CloudWatch Events with CDK.
Note that the calculation of average pairwise correlation was very easy with pandas