[PYTHON] Compare the scores of the M-1 Grand Prix in terms of deviation values

Did you watch last year's M-1 Grand Prix? Is that really a manzai? ?? I heard that the discussion was crowded on SNS or not. Aside from that, at the M-1 Grand Prix two years ago (2019), it became a hot topic that Mr. Milkboy scored the highest score in M-1 history. With an overwhelming score (681 points/700 points) that all the judges scored 97 points or more, as a result, 6 out of 7 judges' votes were obtained in the final decisive battle, and they won the championship. The record up to that point was Untouchable in 2004, with 673 points/700 points.

By the way, I was wondering if the points of 681 and 673 can be simply compared? about it. It's hard to compare which one is more interesting because it's been 15 years free and the judges are completely different, but the scoring criteria (the M-1 Grand Prix shortly after the event was held) , There was a tendency for the scores to be low overall) and the variations in the scores were different, so there was a way to compare the scores with some other index rather than simply the magnitude of the scores. I would like to take up "deviation value".

The purpose of this article is not to compare Untouchable and Milkboy, but to introduce the deviation value.


Definition of deviation value

Deviation value is a statistical term that means the value of the data converted to data with an average of 50 and a standard deviation of 10. Since the mean and standard deviation (data variation) are aligned to the same value, it can be used to compare data with different means and standard deviations. However, it is assumed that the distribution of the data is close to the normal distribution, so it should be noted that it is not an appropriate index for data that follows an extreme distribution. The final match of the M-1 Grand Prix, which was the subject of this time, has 9 to 10 participants, so it is not always an appropriate situation setting, but I would like you to close your eyes to some extent in the details. I will.

Now, the deviation value can be expressed by a mathematical formula as follows.

T_{i} = \frac{10 \left( x_{i} - \mu \right)}{\sigma} + 50, \quad \mu = \frac{1}{N} \sum _{i} x_{i}, \quad \sigma = \sqrt{\frac{1}{N} \sum _{i} \left( x_{i} - \mu \right) ^{2}} . 

$ x_ {i} $ represents the score, $ \ mu $ is the average of all scores, and $ \ sigma $ is the standard deviation of all scores. $ T_ {i} $ is the score obtained by converting $ x_ {i} $ into a deviation value. If the score exactly matches the mean, the deviation will be 50. Also, if the score is higher than the average score by the standard deviation $ \ sigma $, the deviation value will be 60, and if it is lower by $ \ sigma $, it will be 40.

Convert M-1 Grand Prix score to deviation value

Now, let's actually convert the score of the M-1 Grand Prix into a deviation value. First is the result of 2019. The scores have been sorted.

scores = [681, 660, 654, 652, 649, 639, 638, 637, 632, 616]

In Python, it's easy if you use the numpy library, but first let's calculate it yourself as defined.

Average value

avg = 0.0
for score in scores:
  avg += float(score)

avg /= len(scores) 

The results are as follows.

>>> avg
645.8

standard deviation

Then calculate the standard deviation using the calculated mean. We need to take the square root, so we use the numpy library.

import numpy as np

var = 0.0
for score in scores:
  var += (score - avg) ** 2

var /= len(scores)
std = np.sqrt(var)

The results are as follows.

>>> std
16.791664598841887

Deviation value

Now that we have calculated the mean and standard deviation, we can calculate the deviation value according to the definition.

std_scores = []
for score in scores:
  value = 10 * (score - avg) / std + 50
  std_scores.append(value)

The result is as follows.

>>> std_scores
[70.96278173780804, 58.45657672377485, 54.88337529119394, 53.692308147000304, 51.905707430709846, 45.95037170974166, 45.35483813764484, 44.75930456554802, 41.78163670506393, 32.253099551514836]

result

Calculate the 2004 score in the same way and compare the results. Below are the scores for 2004. As an aside, there were 10 teams in the first tournament in 2001, but after 2002, the number of participants in the finals was 9 including the loser resurrection team.

scores = [673, 639, 634, 615, 615, 603, 587, 583, 582]

The result of conversion to the deviation value is as follows.

>>> std_scores
[70.46491218, 58.55946897, 56.8086685, 50.15562671, 50.15562671, 45.95370558, 40.35114407, 38.95050369, 38.6003436]

Comparing the deviation values ​​of the first decimal place is as follows (rounded to the second decimal place).

2004 2019
score 673 681
Deviation value 70.5 71.0

In 2004, the average score was low and the score of 673 points seemed overwhelming, but as a result of converting the deviation value in consideration of the size of the variation in the score, the score was 70.5. Although it is a very small difference, Milkboy's score, which was the highest score in M-1 history, exceeded Untouchable's score even when converted to deviation value.


bonus

I compared only the results of the 2004 and 2019 M-1 Grand Prix, but in fact it is possible that the number one score in other years is the number one in terms of deviation value. If you are interested, please check it out (~~ round throw ~~).

Keep in mind, I have no intention of saying which one is more interesting when comparing by deviation value. I watched it live in both 2004 and 2019, and both comics were very interesting. By the way, I was really happy that Untouchable was back in the duo [^ 1].

[^ 1]: I wonder if the radio will come back.

Lastly, I would like to introduce that the mean and standard deviation can be easily calculated using the numpy library.

import numpy as np

scores = np.array([673, 639, 634, 615, 615, 603, 587, 583, 582])
std_scores = 10 * (scores - np.average(scores)) / np.std(scores) + 50
>>> std_scores
array([70.96278174, 58.45657672, 54.88337529, 53.69230815, 51.90570743, 45.95037171, 45.35483814, 44.75930457, 41.78163671, 32.25309955])

Recommended Posts

Compare the scores of the M-1 Grand Prix in terms of deviation values
Find the average / standard deviation of the brightness values in the image
Looking back on the data M-1 Grand Prix 2020
Compare the fonts of jupyter-themes
[python] Get the rank of the values in List in ascending / descending order
The story of participating in AtCoder
The story of the "hole" in the file
The meaning of ".object" in Django
How to compare if the contents of the objects in scipy.sparse.csr_matrix are the same