[PYTHON] Looking back on the data M-1 Grand Prix 2020

Hi, I'm Miyuki Platinum, who is a comedian at Qiita.

The Advent calendar is nearing its end, and it's Christmas Eve, so I think many people in the IT world are buying Mac M1 chips as gifts for themselves. On the other hand, speaking of M1, the M1 Grand Prix 2020 was held on December 20th (Sun) in the human world, which is a tournament to decide the best entertainer in Japan.

I also thought about participating in the R1 Grand Prix in the past (I ended up not participating after thinking about the stage name and the story for 2 minutes), so I am looking forward to the M1 Grand Prix every year. However, at the M1 Grand Prix 2020, which is the tournament this time, there were many entertainers who did not see in the finals in normal times due to the absence of wagyu beef and kamai-tachi, and the decrease in theatrical performances of entertainers due to the corona. I thought it might be.

Although it is such an M1 Grand Prix, discussions are confusing every year due to the how to score between judges and the evaluation between judges and viewers .

In this article, I will consider these dissociations from a data perspective.

content

  1. Standardization of judges' scores
  2. Classification of similar judges by correlation coefficient
  3. Audience tweet statistics on Twitter


First, I would like to introduce the entertainers and judges who participated in this tournament.

Contestants

    1. Indians
Formed The office Final story
2010 Yoshimoto Kogyo The old days when it was a yankee
    1. Tokyo Hoteison
Formed The office Final story
2015 Grapecompany To solve a mystery
    1. New York
Formed The office Final story
2010 Yoshimoto Kogyo Minor crime
  1. Sketch
Formed The office Final story
2007 Yoshimoto Kogyo manager
  1. Come on, Yasuko
Formed The office Final story
2019 Yoshimoto Kogyo Song material
  1. Magical Lovely
Formed The office Final story
2007 Yoshimoto Kogyo Luxury french
  1. Oswald
Formed The office Final story
2014 Yoshimoto Kogyo Renamed
  1. Akina
Formed The office Final story
year 2012 Yoshimoto Kogyo Favorite child
  1. Nishikigoi
Formed The office Final story
year 2012 Sony Music Artists pachinko
  1. Westland
Formed The office Final story
2008 Titan revenge

judge

    1. All giant
age The office Birthplace
69 years old Yoshimoto Kogyo Osaka-shi, Osaka
    1. Sandwichman Tomizawa
age The office Birthplace
46 years old Grapecompany Sendai, Miyagi Prefecture
    1. Knights
age The office Birthplace
42 years old Maseki Geinosha Saga City, Saga Prefecture
  1. Shiraku Tatekawa
age The office Birthplace
57 years old Watanabe Entertainment Setagaya-ku, Tokyo
  1. Reiji Nakagawa
age The office Birthplace
48 years old Yoshimoto Kogyo Moriguchi City, Osaka Prefecture
  1. Hitoshi Matsumoto
age The office Birthplace
57 years old Yoshimoto Kogyo Amagasaki City, Hyogo Prefecture
  1. Emiko Kaminuma
age The office Birthplace
65 years old Kaminuma office Mihara District, Hyogo Prefecture

1. 1. Standardization of judges' scores

First, let's look at the score table of this tournament.

m1_202001.png Source: M1 Grand Prix 2020 Score Table

Judging from the results, the winner is Magical Lovely and the bottom is Tokyo Hoteison, but this scoring table raises one question. That is the score divergence between judges . For example, Mr. Knights has a difference of 10 points between the minimum score and the maximum score, but Emiko Kaminuma has a difference of only 3 points. In this case, even the same point will have different weights for each individual. In order to standardize the difference in scores, we will calculate the basic statistics in the above table. Since the amount of data is small, the table below is a simple summary in Excel.

Table 1. Score table m1_202002.png

Knights Makoto has a maximum standard deviation of 4.18 Emiko Kaminuma has the minimum standard deviation of 1.04 You can see that it is. Now let's standardize this and see if there is a difference in ranking.

Table 2. Standardized score table m1_202003.png

It's a little difficult to understand, but I hope you can see the ranking on the right. There is a slight change in the ranking. Now, let's check where there is a ranking difference.

    1. Sketch: 3rd place → 2nd place Magical Lovely: 2nd → 3rd

Although the advancement to the final round has not changed, there have been fluctuations among the top three. I think it was because Emiko Kaminuma's highest evaluation was a sketch. Since the sketches are highly evaluated by the Giants, I think that the Kansai judges have a strong attachment to comic storytelling .

    1. New York: 5th → 4th Oswald: 5th → 6th Nishikigoi: 4th → 5th

New York overtook Nishikigoi to move up to 4th place. Oswald and Nishikigoi are the highest evaluations of Mr. Knights, so it seems that the ranking of the middle class has changed due to the standardization. We will take a closer look at the exactly carp streamer -like momentum of Masanori Nishikigoi, who is expected to have a next break next year. (Abbreviated Masanori )

    1. Akina: 8th → 9th Westland: 8th → 8th

The last is the fluctuation of the lower layer. In Akina and Westland, which have the same rate, when standardized, Akina got a lower score. I like Akina, but I felt that to say the least it was slippery. It's always more interesting though.

Now, if you're pushing Westland when you argue with a friend about which was more interesting, Akina or Westland, "Even if the score is the same, there are statistics! The judges and the statistics win the statistics! Judges, statistics, statistics. Statistics! That's why! Statistics can't be stopped by anyone!" You can get the mount (statistics win as much as awareness and ego)

Neta order problem

Although not analyzed in the main content, there is a clear significant difference in the order of M1 material. There is an article analyzing this issue and the results are quoted below.

・ Show the story The combination in the first half loses almost two ranks compared to the combination in the second half ・ The tendency of disadvantages in the first half and advantages in the second half has been particularly remarkable since 2007, and the advantage of the last group (≒ loser resurrection group) showing the story is overwhelming. (Omitted) With that in mind, the Emikuji system introduced this year seems to have been a pretty good attempt. The disadvantages of the first half of the story show have not been improved at all, but the advantage of the revival of the loser has been eliminated and it is fair.

It is very big that the combination in the first half loses two ranks. With that in mind, it seems that Indians and New York had a chance to advance to the final round if their turn was later. The issue of material order has been debated for a long time, and even considering that only the Nakagawake of the first tournament won the championship in the top batter, the combination in the first half You can see how difficult it is to break into the upper layers. As mentioned earlier, it is Emikuji that equalizes this unfairness in the order of material by luck. Indeed, luck is also in your ability.

Please refer to the following articles for details on the material order problem. Mathematical counterargument to the M-1 Grand Prix 2017 examination "1 point gap" "Top batter is disadvantageous"

2. 2. Classification of similar judges by correlation coefficient

By the way, although it is a continuation of Chapter 1, I tried to classify the judges who have similar tastes by the correlation coefficient from the scores given by the judges. This is also a small amount of data, so I calculated it quickly with Excel.

Table 3. Correlation coefficient table between judges m1_202004.png

Since the table of correlation coefficients is a combination, half of them have the same value, so they are omitted. The three pairs with the highest correlation coefficient were marked in red, and the three pairs with the lowest correlation coefficient were marked in blue.

First of all, the combination was the highest, but in order from the highest, it became as follows.

    1. [Knights, Reiji Nakagawa]
    1. [Knights, Shiraku Tatekawa]
    1. [Hitoshi Matsumoto, Emiko Kaminuma]

Next, the combination that was low in ascending order.

    1. [All Kyojin, Sand Tomizawa]
    1. [All Kyojin, Knights]
    1. [Shiraku Tatekawa, Hitoshi Matsumoto]

From this result, it can be said that the evaluation is roughly divided between Kansai and Kanto . It is often said that "Kansai laughter does not work in Tokyo" or "Kansai dialect is advantageous for comedy". Among the above combinations, there are high combinations of Kanto [Knights, Shiraku Tatekawa] and Kansai [Hitoshi Matsumoto, Emiko Kaminuma], and low combinations of [All Kyojin, Shiraku Tatekawa] and [Shiraku Tatekawa]. Raku, Hitoshi Matsumoto] can be seen. I think these are good indications of the different sensibilities in the east and west. Also, overall, All Kyojin and Sand Tomizawa scored relatively uncorrelated with anyone. Since Mr. Sand Tomizawa was born in Sendai, it can be expected that the evaluation will be east-west, but it is surprising that All Kyojin has a weak correlation with Kansai people.

3. 3. Audience Tweet Statistics on Twitter

Finally, let's consider the divergence between the judges and viewers' ratings using Twitter's tweet statistics. Regarding statistics using Twitter API, there was an article with similar contents in the past, so I refer to this. Reference: M1 Grand Prix 2017 seen in the data-Which manzai was really the most interesting-

By the way, before starting the tweet analysis, GYAO was investigating the popularity of each entertainer and the combination of triplets just before the event, so let's take a look here.

m1_202005.png Source: Triple Single Ranking Forecast Campaign

As for the popularity in advance, you can see that the comedian Akina, who has a relatively large amount of media exposure, is popular in New York. You would have expected New York, which is the most popular in advance, to talk about the laughter episode.

On the contrary, as unpopular, Westland is followed by Magical Lovely and Nishikigoi. Since the offices of Westland and Nishikigoi are not Yoshimoto, I wonder if that also led to less media exposure. Regarding Magical Lovely (hereinafter referred to as Magical Love), Emiko Kaminuma was at the bottom of the tournament three years ago. From the results, it can be speculated that the expected value in advance was low. think.

Now that we have confirmed the popularity in advance, let's return to the main subject. First, the working environment and Code are shown below.

environment

Language: Python 3.9.0 Data shaping: Pandas 1.1.5 Drawing: Matplotlib 3.3.3

All Codes are stored in the repository below, so please have a look if you are interested. github.com/KamiHitoe/m12020

3-1. Analysis by the amount of tweets during the tournament

As a method to measure the evaluation of viewers, first, for one week from 22:05 on December 20th at the end of M1 Grand Prix 2020, get tweets with the names of each entertainer (word search with Twitter API specifications) Acquisition in the past week is the upper limit) The acquired tweets are stored as a csv file.

get_search.py




from requests_oauthlib import OAuth1Session
import json
import datetime, time, sys
from abc import ABCMeta, abstractmethod
import pandas as pd
from pandas import Series, DataFrame
from dateutil.parser import parse
import config
from tqdm import tqdm

CK = config.CONSUMER_KEY
CS = config.CONSUMER_SECRET
AT = config.ACCESS_TOKEN
ATS = config.ACCESS_TOKEN_SECRET

class TweetsGetter(object):
    __metaclass__ = ABCMeta

    def __init__(self):
        self.session = OAuth1Session(CK, CS, AT, ATS)

    @abstractmethod
    def specifyUrlAndParams(self, keyword):
        '''
Returns the callee URL and parameters
        '''

    @abstractmethod
    def pickupTweet(self, res_text, includeRetweet):
        '''
        res_Take a tweet from text, set it in an array and return it
        '''

    @abstractmethod
    def getLimitContext(self, res_text):
        '''
Get information on the number of times limit (at startup)
        '''

    def collect(self, total = -1, onlyText = False, includeRetweet = False):
        '''
Start getting tweets
        '''

        #----------------
        #Check the number of times limit
        #----------------
        self.checkLimit()

        #----------------
        #URL, parameters
        #----------------
        url, params = self.specifyUrlAndParams()
        params['include_rts'] = str(includeRetweet).lower()
        # include_rts is statuses/user_parameter of timeline. search/Invalid for tweets

        #----------------
        #Get Tweets
        #----------------
        cnt = 0
        unavailableCnt = 0
        while True:
            res = self.session.get(url, params = params)
            if res.status_code == 503:
                # 503 : Service Unavailable
                if unavailableCnt > 10:
                    raise Exception('Twitter API error %d' % res.status_code)

                unavailableCnt += 1
                print ('Service Unavailable 503')
                self.waitUntilReset(time.mktime(datetime.datetime.now().timetuple()) + 30)
                continue

            unavailableCnt = 0

            if res.status_code != 200:
                raise Exception('Twitter API error %d' % res.status_code)

            tweets = self.pickupTweet(json.loads(res.text))
            if len(tweets) == 0:
                # len(tweets) != params['count']I want to
                #Since count seems to be the maximum value, it cannot be used for judgment.
                # ⇒  "== 0"To
                # https://dev.twitter.com/discussions/7513
                break

            for tweet in tweets:
                if (('retweeted_status' in tweet) and (includeRetweet is False)):
                    pass
                else:
                    if onlyText is True:
                        yield tweet['text']
                    else:
                        yield tweet

                    cnt += 1
                    if cnt % 100 == 0:
                        print ('%d cases' % cnt)

                    if total > 0 and cnt >= total:
                        return

            params['max_id'] = tweet['id'] - 1

            #Header confirmation (number of times limit)
            # X-Rate-Limit-Check because it is rare that Remaining is not included
            if ('X-Rate-Limit-Remaining' in res.headers and 'X-Rate-Limit-Reset' in res.headers):
                if (int(res.headers['X-Rate-Limit-Remaining']) == 0):
                    self.waitUntilReset(int(res.headers['X-Rate-Limit-Reset']))
                    self.checkLimit()
            else:
                print ('not found  -  X-Rate-Limit-Remaining or X-Rate-Limit-Reset')
                self.checkLimit()

    def checkLimit(self):
        '''
Query the limit and wait until it becomes accessible
        '''
        unavailableCnt = 0
        while True:
            url = "https://api.twitter.com/1.1/application/rate_limit_status.json"
            res = self.session.get(url)

            if res.status_code == 503:
                # 503 : Service Unavailable
                if unavailableCnt > 10:
                    raise Exception('Twitter API error %d' % res.status_code)

                unavailableCnt += 1
                print ('Service Unavailable 503')
                self.waitUntilReset(time.mktime(datetime.datetime.now().timetuple()) + 30)
                continue

            unavailableCnt = 0

            if res.status_code != 200:
                raise Exception('Twitter API error %d' % res.status_code)

            remaining, reset = self.getLimitContext(json.loads(res.text))
            if (remaining == 0):
                self.waitUntilReset(reset)
            else:
                break

    def waitUntilReset(self, reset):
        '''
sleep until reset time
        '''
        seconds = reset - time.mktime(datetime.datetime.now().timetuple())
        seconds = max(seconds, 0)
        print ('\n     =====================')
        print ('     == waiting %d sec ==' % seconds)
        print ('     =====================')
        sys.stdout.flush()
        time.sleep(seconds + 10)  #Just in case+10 seconds

    @staticmethod
    def bySearch(keyword):
        return TweetsGetterBySearch(keyword)

    @staticmethod
    def byUser(screen_name):
        return TweetsGetterByUser(screen_name)

class TweetsGetterBySearch(TweetsGetter):
    '''
Search for tweets by keyword
    '''
    def __init__(self, keyword):
        super(TweetsGetterBySearch, self).__init__()
        self.keyword = keyword
        
    def specifyUrlAndParams(self):
        '''
Returns the callee URL and parameters
        '''
        url = 'https://api.twitter.com/1.1/search/tweets.json?'
        params = {'q':self.keyword, 'count':100}
        return url, params

    def pickupTweet(self, res_text):
        '''
        res_Take a tweet from text, set it in an array and return it
        '''
        results = []
        for tweet in res_text['statuses']:
            results.append(tweet)

        return results

    def getLimitContext(self, res_text):
        '''
Get information on the number of times limit (at startup)
        '''
        remaining = res_text['resources']['search']['/search/tweets']['remaining']
        reset     = res_text['resources']['search']['/search/tweets']['reset']

        return int(remaining), int(reset)


keyword_list = ['Akina','Oswald','Sketch','Come on, Yasuko','Nishikigoi']

for keyword in keyword_list:
    #Get by keyword
    getter = TweetsGetter.bySearch(keyword+' AND until:2020-12-20_22:05:00_JST')

    #Get by specifying user (screen_name)
    #getter = TweetsGetter.byUser('AbeShinzo')

    cnt = 0
    created_at = []
    text = []
    for tweet in getter.collect(total = 1000000):
        #cnt += 1
        #print ('------ %d' % cnt)
        #print ('{} {} {}'.format(tweet['id'], tweet['created_at'], '@'+tweet['user']['screen_name']))
        #print (tweet['text'])
        created_at.append(tweet['created_at'])
        text.append(tweet['text'])

    created_at = Series(created_at)
    text = Series(text)

    #Data frame for each series
    m1_df = pd.concat([created_at, text],axis=1)

    #Column name
    m1_df.columns=['created_at','text']

    #Save as csv file
    m1_df.to_csv('data/m12020_'+keyword+'.csv', sep = '\t',encoding='utf-16')


Next, take out the tweets during the tournament time (19:00-22:05) from the obtained csv file and graph them for each entertainer.

resample.py




import codecs
import shutil
import pandas as pd
import time
import datetime
import pytz
from matplotlib import pyplot as plt
from matplotlib import dates as mdates
from matplotlib.dates import date2num
from matplotlib.dates import DateFormatter
from matplotlib import rcParams

rcParams['font.family'] = 'sans-serif'
rcParams['font.sans-serif'] = ['Hiragino Maru Gothic Pro', 'Yu Gothic', 'Meirio', 'Takao', 'IPAexGothic', 'IPAPGothic', 'VL PGothic', 'Noto Sans CJK JP']

keyword_list = ['Indians','Tokyo Hoteison','New York','Sketch','Come on, Yasuko','Magical Lovely','Oswald','Akina','Nishikigoi','Westland']

# str -> datetime

def typechange(x):
    st = time.strptime(x, '%a %b %d %H:%M:%S +0000 %Y')
    utc_time = datetime.datetime(st.tm_year, st.tm_mon,st.tm_mday, st.tm_hour,st.tm_min,st.tm_sec, tzinfo=datetime.timezone.utc)
    jst_time = utc_time.astimezone(pytz.timezone('Asia/Tokyo'))
    # str_time = jst_time.strftime('%a %b %d %H:%M:%S +0900 %Y')
    return jst_time
    # return datetime.datetime.strptime(x, '%a %b %d %H:%M:%S +0000 %Y')

def make_df_re(keyword):
    df = pd.read_csv('data/m12020_'+keyword+'.csv', encoding='utf-16', sep='\t', header=0)
    df['count'] = 1

    df['datetime'] = df['created_at'].map(typechange)

    #Get deta resampled every minute
    df_date = pd.concat([df['datetime'], df['count']], axis=1)

    df_re = df_date.reset_index().set_index('datetime').resample('T').sum()
    # df_re.to_csv('data/re_'+keyword+'.csv', encoding='utf-16', sep='\t')
    df_re = df_re.reset_index()
    return df_re

df_list = []
for keyword in keyword_list:
    df_re = make_df_re(keyword)
    df_list.append(df_re)

#Graph creation
with plt.style.context('seaborn-darkgrid', after_reset=True):
    plt.rcParams['font.family'] = 'Noto Sans CJK JP'
    figure = plt.figure(1, figsize=(8,4))
    axes = figure.add_subplot(111)

    x0 = df_list[0]['datetime']
    x1 = df_list[1]['datetime']
    x2 = df_list[2]['datetime']
    x3 = df_list[3]['datetime']
    x4 = df_list[4]['datetime']
    x5 = df_list[5]['datetime']
    x6 = df_list[6]['datetime']
    x7 = df_list[7]['datetime']
    x8 = df_list[8]['datetime']
    x9 = df_list[9]['datetime']
    y0 = df_list[0]['count']
    y1 = df_list[1]['count']
    y2 = df_list[2]['count']
    y3 = df_list[3]['count']
    y4 = df_list[4]['count']
    y5 = df_list[5]['count']
    y6 = df_list[6]['count']
    y7 = df_list[7]['count']
    y8 = df_list[8]['count']
    y9 = df_list[9]['count']
    start_time = datetime.datetime(2020, 12, 20, 10, 0)
    end_time = datetime.datetime(2020, 12, 20, 13, 0)

    axes.plot(x0, y0, color='#d52f25')
    axes.plot(x1, y1, color='#691c0d')
    axes.plot(x2, y2, color='#fff000')
    axes.plot(x3, y3, color='#f0821e')
    axes.plot(x4, y4, color='#00a0dc')
    axes.plot(x5, y5, color='#ff2599')
    axes.plot(x6, y6, color='#ffcc00')
    axes.plot(x7, y7, color='#193278')
    axes.plot(x8, y8, color='#9944cc')
    axes.plot(x9, y9, color='#d3c1af')
    axes.set_xlim(
        date2num([
            start_time,
            x9.max()])
    )
    axes.set_ylabel('Number of tweets/Minutes')
    xticks = [datetime.datetime(2020, 12, 20, 10, 0), datetime.datetime(2020, 12, 20, 10, 30), datetime.datetime(2020, 12, 20, 11, 00), datetime.datetime(2020, 12, 20, 11, 30), datetime.datetime(2020, 12, 20, 12, 00), datetime.datetime(2020, 12, 20, 12, 30), datetime.datetime(2020, 12, 20, 13, 00)]
    xaxis = axes.xaxis
    xaxis.set_ticklabels(['19:00', '19:30', '20:00', '20:30', '21:00', '21:30', '22:00'])
    plt.legend(('Indians','Tokyo Hoteison','New York','Sketch','Come on, Yasuko','Magical Lovely','Oswald','Akina','Nishikigoi','Westland'),
    bbox_to_anchor=(1, 1), loc='upper left', borderaxespad=0, fontsize=10)
    # axes.xaxis.set_major_formatter(DateFormatter('%H:%M'))
    plt.savefig('data/fig.png')
    plt.show()


The graph created in this way is as follows.

fig01.png Figure 1. Number of tweets including the name of each entertainer on 12/24 19: 00-22: 05

How about now? I think you can see various things from this result. First of all, regarding maximum number of tweets in a moment , the Indians around 19:20 is the time decided by the loser resurrection group, so this is excluded as an outlier.

Excluding the above, it is the sketch of the 4th place in the first round that recorded the maximum number of tweets at the most moment. It is the sketch that was the second most popular in the preliminary forecast, but you can see that it was the most noticed during the production.

Now, the problem is next. The second largest number of tweets at the moment is Nishikigoi .

No way, Nishikigoi. And overwhelming Nishikigoi.

It is a degree of attention as to what was the 8th popularity in advance. The 9th most popular Madi Love and the 10th most popular Westland both had a maximum number of tweets of about 2,500 at the moment, while Nishikigoi recorded 5,500 , which is more than double that.

It's a carp streamer. For short, Masanori.

m1_202006.jpeg Figure 2. Mr. Masanori Hasegawa in charge of Nishikigoi bokeh

"As expected, Masanori-san!" "I can do things that we can't do in a straightforward manner. I'm afraid there! I long for it!"

By the way, the above graph shows the degree of attention as pre-popular except for Nishikigoi, and the sketch, Nishikigoi is followed by pre-popular New York and Akina.

Looking at this graph, it seems no coincidence that Nishikigoi was only 4th in the final round.

Maybe it's almost time for Maki to become a pachinko machine? I can't wait for the release of CR Masanori.

3-2. Calculation of pseudo-support rating by positive word

Now, I know the degree of attention based on the absolute amount of tweets, but I still don't know the entertainers that the viewers find or like to be really interesting. Now, let's measure that value as the ratio of positive words in the form of a pseudo-support rating. First of all, regarding positive words, positive words are a group of favorable words included in tweets containing the names of entertainers, and in this case, the following words are defined as positive words.

Positive word: It was fun|Was interesting|It was interesting|Was good|Was good|Laughed|Like|Like

Graph how many of these positive words were seen during the first round of the final as a percentage of the total number of tweets during the tournament.

process_lan.py




import pandas as pd
import time
import datetime
import pytz
from matplotlib import pyplot as plt
from matplotlib import rcParams
import matplotlib.ticker as ticker

rcParams['font.family'] = 'Noto Sans CJK JP'
rcParams['font.sans-serif'] = 'Noto Sans CJK JP'
keyword_list = ['Indians','Tokyo Hoteison','New York','Sketch','Come on, Yasuko','Magical Lovely','Oswald','Akina','Nishikigoi','Westland']

def typechange(x):
    st = datetime.datetime.strptime(x, '%a %b %d %H:%M:%S +0000 %Y')
    # utc_time = datetime.datetime(st.tm_year, st.tm_mon,st.tm_mday, st.tm_hour,st.tm_min,st.tm_sec, tzinfo=datetime.timezone.utc)
    # jst_time = utc_time.astimezone(pytz.timezone('Asia/Tokyo'))
    # str_time = jst_time.strftime('%a %b %d %H:%M:%S +0900 %Y')
    return st

def replace(x):
    y = x.replace(tzinfo=None)
    return y

sum_list = []
rate_list = []
for keyword in keyword_list:
    # def make_df_re(keyword):
    df = pd.read_csv('data/m12020_'+keyword+'.csv', encoding='utf-16', sep='\t', header=0)
    df['count'] = 1
    df['datetime'] = df['created_at'].map(typechange)
    # print(df.info())

    from_dt = datetime.datetime(2020, 12, 20, 10, 20)
    to_dt = datetime.datetime(2020, 12, 20, 12, 40)

    df = df[from_dt <= df['datetime']]
    df = df[df['datetime'] <= to_dt]
    df_cut = pd.concat([df['datetime'], df['text'], df['count']], axis=1)

    # df_cut.to_csv('data/cut_'+keyword+'.csv', encoding='utf-16', sep='\t')

    df_result = df[df_cut.text.str.contains('Was funny|Was interesting|It was interesting|Was good|Was good|Laughed|Like|Like')]
    sum = len(df_result)
    print('sum :', keyword, len(df_result))
    rate = round(len(df_result)/(len(df_cut)), 2)
    print('rate', keyword, rate)
    sum_list.append(sum)
    rate_list.append(rate)

#Graph creation
with plt.style.context('seaborn-darkgrid', after_reset=True):
    plt.rcParams['font.family'] = 'Noto Sans CJK JP'
    figure = plt.figure(1, figsize=(8,4))
    axes1 = figure.add_subplot(111)

    s0 = 1
    s1 = 2
    s2 = 3
    s3 = 4
    s4 = 5
    s5 = 6
    s6 = 7
    s7 = 8
    s8 = 9
    s9 = 10
    axes1.bar(s0, width=0.5, height=sum_list[0], color='#d52f25')
    axes1.bar(s1, width=0.5, height=sum_list[1], color='#691c0d')
    axes1.bar(s2, width=0.5, height=sum_list[2], color='#fff000')
    axes1.bar(s3, width=0.5, height=sum_list[3], color='#f0821e')
    axes1.bar(s4, width=0.5, height=sum_list[4], color='#00a0dc')
    axes1.bar(s5, width=0.5, height=sum_list[5], color='#ff2599')
    axes1.bar(s6, width=0.5, height=sum_list[6], color='#ffcc00')
    axes1.bar(s7, width=0.5, height=sum_list[7], color='#193278')
    axes1.bar(s8, width=0.5, height=sum_list[8], color='#9944cc')
    axes1.bar(s9, width=0.5, height=sum_list[9], color='#d3c1af')

    axes2 = axes1.twinx()
    r0 = rate_list[0]
    r1 = rate_list[1]
    r2 = rate_list[2]
    r3 = rate_list[3]
    r4 = rate_list[4]
    r5 = rate_list[5]
    r6 = rate_list[6]
    r7 = rate_list[7]
    r8 = rate_list[8]
    r9 = rate_list[9]
    # axes2.axis('off')
    axes2.plot(s0, r0, 's', ms=7, color='#7acbe1')
    axes2.plot(s1, r1, 's', ms=7, color='#7acbe1')
    axes2.plot(s2, r2, 's', ms=7, color='#7acbe1')
    axes2.plot(s3, r3, 's', ms=7, color='#7acbe1')
    axes2.plot(s4, r4, 's', ms=7, color='#7acbe1')
    axes2.plot(s5, r5, 's', ms=7, color='#7acbe1')
    axes2.plot(s6, r6, 's', ms=7, color='#7acbe1')
    axes2.plot(s7, r7, 's', ms=7, color='#7acbe1')
    axes2.plot(s8, r8, 's', ms=7, color='#7acbe1')
    axes2.plot(s9, r9, 's', ms=7, color='#7acbe1')

    axes1.set_ylabel('Total number of tweets')
    axes2.set_ylabel('Positive ratio')
    axes1.set_axisbelow(True)
    axes2.set_axisbelow(True)
    xticks = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
    xaxis = axes1.xaxis
    xaxis.set_major_locator(ticker.FixedLocator(xticks))
    xaxis.set_ticklabels(['Indians','Tokyo Hoteison','New York','Sketch','Come on, Yasuko','Magical Lovely','Oswald','Akina','Nishikigoi','Westland'], rotation=45)
    # xaxis.set_ticklabels(['Indians','Tokyo Hoteison','New York','Sketch','Come on, Yasuko','Magical Lovely','Oswald','Akina','Nishikigoi','Westland'], rotation=45)
    # axes1.legend(('Indians','Tokyo Hoteison','New York','Sketch','Come on, Yasuko','Magical Lovely','Oswald','Akina','Nishikigoi','Westland'),
    # bbox_to_anchor=(1, 1), loc='upper left', borderaxespad=0, fontsize=10)
    # plt.savefig('data/fig.png')
    plt.show()


The obtained graph is as follows.

fig02.png Figure 3. Total number of tweets on 12/24 19: 20-21: 40 and pseudo approval rating

The bar graph shows the total number of tweets during the tournament time, and the light blue dot graph shows the ratio of positive words, that is, the pseudo-support rate.

Well, how about it? Again, I don't know if the Indians data is fair. Looking only at the results, it can be said that Indians is a comedian who is loved by viewers with the highest pseudo-approval rating, but is Indians simply interesting because it is a loser resurrection group, or is the impression of the loser resurrection flowing? Can't be read correctly. So, I'm sorry this time as well, but I will suspend the Indians once (although if you exclude this much, it seems that Yankee will be able to do it properly)

If you omit the Indians, the two outstanding pairs will stand out. That is Tokyo Hoteison and Oswald .

Tokyo Hoteison was unfortunately in the bottom of the ranking due to poor material order, but the viewers may not think that it was unexpectedly uninteresting. In addition, Nishikigoi is the oldest in this tournament, while Tokyo Hoteison is the youngest in this tournament. It has only been formed for 5 years, so you can expect great success in the future.

Next, Oswald. I get the impression that it is the 7th most popular in advance and is somehow hidden behind New York and the sketch, but the data is surprisingly positive. I was also impressed with the style of digging deep into the theme of renaming. So, everyone actually likes Oswald more than New York and sketches, right? The question, Here, we call it Oswald's favorite hypothesis . Let's look at this Oswald's favorite hypothesis later with different data.

3-3. Post-observation of Youtube views

By the way, all the final material of M1 Grand Prix 2020 is on Youtube. M-1 Grand Prix Channel

Lastly, let's look at the number of views on Youtube as of December 24 and answer how correct the analysis so far seems to be.

Now let's take a look at the aggregated results.

m1_202008.png Figure 4. Youtube views (as of December 24)

First of all, you can see that the M1 final round advancers occupy the 1st, 2nd and 3rd place. But look. Who is in the third place with the same rate, alongside the scary floor plan in Kansai?

Yes, it's Nishikigoi!

Nishikigoi's Youtube views, which were overwhelming in terms of the number of tweets, were also overwhelming. I think this is definitely the next break next year.

And New York, Indians, Tokyo Hoteison, Oswald are lined up as the middle class of 5th place and below.

Tokyo Hoteison is lined up in New York, which is the most popular in advance. After all, it suggests that it is still early to brand Tokyo Hoteison as the lowest M1.

Well, on the other hand, Oswald. .. .. Youtube views are currently 8th.

8th place

that? Where was the pseudo approval rating? Was the positive word for Oswald the word sent to the actress Oswald's sister, not the entertainer's Oswald's brother? ??

Sketch Moriyama "And what is the hypothesis that you like Oswald !?"

m1_202007jpg.jpg Figure 5. Sari Ito, sister and actress in charge of Oswald Tsukkomi

in conclusion

Kuu ~ I'm tired w This is the end!

Couldn't it have been analyzed reasonably well? If this article makes you enjoy the M1 Grand Prix 2020 twice, I would be delighted.

Also, I think it would be interesting to analyze the viewer's birthplace, age, and gender for next year, but it seems a little difficult because it is personal information. After that, it was set at 22:00 at the end of this aggregation period M1, but I thought that the true approval rating could be obtained by totaling until the day after the date of the event.

Well then, thank you for staying with us for a long time.

I'm hoping that some nice Santa will give me a Mac M1 chip and I'll spend Christmas slowly.

Then everyone, have a good year--

Done

reference

[1] Mathematical counterargument to the M-1 Grand Prix 2017 examination "1 point gap" "Top batter is disadvantageous" [2] M1 Grand Prix 2017 seen in the data-Which manzai was really the most interesting- [3] M-1 Grand Prix looking back on principal component analysis

TL; DR bonus

From here, I'm looking at the tweets I got all night long until morning, and I'd like to introduce some of my favorite ones.

・ Akina

"Akina was relaxed when she saw Yamana."

"Isn't the air obviously heavier after Akina?"

・ Indians

"Mr. Indians in the final had a lot of momentum. The pre-program of M-1 said," The repechage has nothing to lose, "but this year's Indians was exactly that, and it was cool. shelf"

"Ad lib, tempo, bokeh, smile are all recommended. Only Indians won."

・ Come and Yasuko

"Come on, Yasuko caused a miracle!"

・ Oswald

"By the way, Oswald is also a curse that you have to serve sushi in the final."

"People like Oswald Ito tend to be friends of friends"

·New York

"Maybe I should have done something to say hello to her parents' house in New York."

"Think, Keith We can be homo in a few hours! No one notices when you enter K MART with leather bread! If I wore leather bread in New York, it would be gay, but all the hot cowboys there are leather bread ... "

・ Majirab

"Magical Lovely Guess from Undressing Street Fighter"

"I want you to grab Magical Lovely and say,'I'm a Manzai King.'"

・ Nishikigoi

"It's a little interesting that only Nishikigoi is included in the trend."

"Nishikigoi can't be helped because it looks bad."

"Masanori Nishikigoi, I'm using it in the mouth without a sentence It's really interesting because I've actually made a million debts and experienced my own bankruptcy. "

・ Sketch

"One of the sketches has long hair and the wrap is amazing."

"By the way, why is the sketch a name?

・ Tokyo Hoteison

"I love Koike, I love Tokyo Hoteison."

"Ahn Mika Dragon attracted attention, and BEAMS shirts were sold out, but why Tokyo Hoteison was at the bottom?"

・ Westland

"Westland is nice 8th place in terms of fun, but 1st place in terms of ego. "

Recommended Posts

Looking back on the data M-1 Grand Prix 2020
Looking back on 2016 in the Crystal language
Looking back on ABC155
Looking back on the transition of the Qiita Advent calendar
Start data science on the cloud
Compare the scores of the M-1 Grand Prix in terms of deviation values
[Spotify API] Looking back on 2020 with playlists --Part.1 Acquisition of playlist data
Looking back on the 10 months before a programming beginner becomes a Kaggle Expert
Completely erase the data on the hard disk
[Spotify] Looking back on 2020 with playlists --Part.2 EDA (basic statistics), data preprocessing
Looking back on how the office worker changed jobs as an engineer (Part 2)
Looking back on the machine learning competition that I worked on for the first time
Looking back on the history of expressions that return sum of square to Pythonic
Use AppSync on the front and back ends
Looking back on learning with Azure Machine Learning Studio
Looking back on creating a web service with Django 1
Looking back on creating a web service with Django 2
3. Natural language processing with Python 3-4. A year of corona looking back on TF-IDF [Data creation]