[Python] Seriously think about the M-1 winning method.

1. Contents

2. This year's M-1 data

2019 M-1 data
data = pd.DataFrame([[98,97,96,97,97,99,97],
                     [95,95,94,93,95,95,93],
                     [96,94,92,94,91,94,93],
                     [92,92,93,91,96,96,92],
                     [94,91,93,91,94,92,94],
                     [94,90,93,90,89,90,93],
                     [94,90,94,91,89,89,91],
                     [92,89,91,90,92,91,92],
                     [94,88,92,90,87,89,92],
                     [90,82,88,88,90,91,87]],
                   columns=['kaminuma','matsumoto','reiji','tomizawa','shiraku','hanawa','kyojin'],
                   index=['milkboy','kamaitachi','pekopa','wagyu','mitorizu','karashi','ozuwarudo','suwe','indians','newyork'])

3. Data overview

Mean and standard deviation

data.describe()
kaminuma matsumoto reiji tomizawa shiraku hanawa kyojin
count 10.000000 10.000000 10.0000 10.00000 10.000000 10.000000 10.000000
mean 93.900000 90.800000 92.6000 91.50000 92.000000 92.600000 92.400000
std 2.233582 4.184628 2.1187 2.54951 3.366502 3.306559 2.503331
min 90.000000 82.000000 88.0000 88.00000 87.000000 89.000000 87.000000
25% 92.500000 89.250000 92.0000 90.00000 89.250000 90.250000 92.000000
50% 94.000000 90.500000 93.0000 91.00000 91.500000 91.500000 92.500000
75% 94.750000 93.500000 93.7500 92.50000 94.750000 94.750000 93.000000
max 98.000000 97.000000 96.0000 97.00000 97.000000 99.000000 97.000000

Correlation coefficient

sns.heatmap(data.corr(), annot=True)
plt.show()

save.png

data.corr().sum(axis=0)
judge Total correlation coefficient
kaminuma 5.251470
matsumoto 5.876156
reiji 5.272178
tomizawa 5.791959
shiraku 4.487326
hanawa 5.002257
kyojin 5.535278

Pair plot

sns.pairplot(data)
plt.show()

save.png

4. Model

strategy

Implementation

Regression analysis code
from sklearn.linear_model import LinearRegression
X = data.values
coefs = np.zeros((7,7))
V = np.zeros(7)
model = LinearRegression()
for i in range(X.shape[1]):
    residuals = np.zeros((6,10))
    coef = np.zeros(6)
    x = X[:,i].reshape(-1,1)
    y = np.delete(X, i, 1)
    for j in range(y.shape[1]):
        y_ = y[:,j].reshape(-1,1)
        model.fit(x, y_)
        coef[j] = model.coef_
        residuals[j,:] = (model.predict(x) - y_).flatten()
    coef = np.insert(coef, i, 1)
    coefs[i,:] = coef
    cov_mat = np.cov(residuals, bias=True)
    V[i] = cov_mat.sum()

To add a brief explanation,

At * for i in ..., the scores of other judges are regressed based on Kaminuma's score, and then Matsumoto's score is used as the basis for the loop. At * for j in ..., for example, based on Kaminuma's score, Matsumoto's score is regressed and Reiji's score is regressed. Also, the regression coefficient (corresponding to $ \ beta_ {person} $ in the above formula) is stored in an array called coef. In addition, the regression residuals are stored in an array called residuals. At the end of the * for j in ... loop, np.insert handles, for example, the coefficient of regression from Kaminuma to Kaminuma is 1. Also, the coefficients are stored in an array called coefs.

  • The sum of the variances of the residuals is stored in an array called V. The sum of the residuals covariance matrix corresponds to that.

result

coefs_df = pd.DataFrame(coefs, 
                        columns=['kaminuma','matsumoto','reiji','tomizawa','shiraku','hanawa','kyojin'],
                        index=['kaminuma','matsumoto','reiji','tomizawa','shiraku','hanawa','kyojin'])
sns.heatmap(coefs_df, annot=True)
plt.title('regression coefficient')
plt.show()

save.png

std_df = pd.DataFrame(pow(V,0.5),
                      columns=['residual std'],
                      index=['kaminuma','matsumoto','reiji','tomizawa','shiraku','hanawa','kyojin'])
residual std
kaminuma 9.332082
matsumoto 4.644780
reiji 9.057326
tomizawa 5.586553
shiraku 10.673552
hanawa 8.872448
kyojin 7.665711

5. Consideration

sensitivity_df = pd.DataFrame(data.std(axis=0)*coefs.sum(axis=1), columns=['sensitivity'])
sensitivity_df = pd.concat([sensitivity_df, std_df], axis=1)
sensitivity residual std
kaminuma 14.714779 9.332082
matsumoto 17.009344 4.644780
reiji 14.904319 9.057326
tomizawa 16.691662 5.586553
shiraku 13.664036 10.673552
hanawa 15.027370 8.872448
kyojin 15.747906 7.665711

6. Summary

Recommended Posts

[Python] Seriously think about the M-1 winning method.
Roughly think about the gradient descent method
Think about how to program Python on the iPad
Sort in Python. Next, let's think about the algorithm.
About the Python module venv
Think about architecture in python
About the ease of Python
About the enumerate function (python)
About the features of Python
Think about the minimum change problem
About the basics list of Python basics
Roughly think about the loss function
[Python Kivy] About changing the design theme
[Python] Summarize the rudimentary things about multithreading
The Python project template I think of.
My friend seems to do python, so think about the problem ~ fizzbuzz ~
Think about the analysis environment (Part 3: Install pyenv + Anaconda [Python, R] + jupyter + Rstudio)
About the accuracy of Archimedean circle calculation method
A Java programmer studied Python. (About the decorator)
Learn the design pattern "Template Method" in Python
I tried the least squares method in Python
Learn the design pattern "Factory Method" in Python
Think about depth-priority and width-priority searches in Python
About the difference between "==" and "is" in python
A note about the python version of python virtualenv
Think about building a Python 3 environment in a Mac environment
[Note] About the role of underscore "_" in Python
About the behavior of Model.get_or_create () of peewee in Python
Try implementing the Monte Carlo method in Python
About the * (asterisk) argument of python (and itertools.starmap)
Think about the selective interface on the command line
About python slices
About python comprehension
About the test
About Python tqdm.
About python yield
About python, class
About python inheritance
About python, range ()
About python decorators
About python reference
About Python decorators
[Python] About multi-process
Johnson method (python)
About the queue
[Python] Semi-Lagrange method
It's time to seriously think about the definition and skill set of data scientists
Destroy the intermediate expression of the sweep method with Python
Determine the threshold using the P tile method in python
Probably the most unhelpful Python implementation method on Qiita
Think about the next generation of Rack and WSGI
About the --enable-shared option when building Python on Linux
A reminder about the implementation of recommendations in Python
python memo (for myself): About the development environment virtualenv
python (2) requires self because the method is an instance method