Introduction

I had a vague understanding of the evaluation function (evaluation index), so I summarized the typical evaluation functions. We have briefly summarized the sample code when actually using what the evaluation function is, what each evaluation function means, and so on.

What is the evaluation function?

The evaluation function is a ** index that measures the goodness of the trained model **.

Difference from objective function

When studying machine learning, you will see various names such as objective function, loss function, and cost function. First, let's check the difference from the objective function.

--Objective function --Functions optimized for model training --Need to be able to differentiate

In other words, the objective function is optimized during learning, and the evaluation function is the index for confirming the goodness after learning.

The loss function, cost function, and error function are said to be part of the objective function. (There seems to be some controversy, but it seems that you can think of it as almost the same.)

Regression

First, I will summarize the evaluation function of the regression problem.

MAE(Mean Absolute Error)

Average absolute error. Represents the average of the absolute values of the difference between the true value and the predicted value.

--Evaluate the effect of outliers small

\textrm{MAE} = \frac{1}{N}\sum_{i=1}^{N}|y_{i}-\hat{y}_{i}|

N :Number of records\\
y_{i} :The true value of the i-th record\\
\hat{y}_{i} :Predicted value of the i-th record

from sklearn.metrics import mean_absolute_error
mean_absolute_error(y_true, y_pred)

MSE(Mean Squared Error)

Mean squared error. Represents the squared average of the difference between the true value and the predicted value.

--Great evaluation of the effects of outliers --The unit is the square of the objective variable

\textrm{MSE} = \frac{1}{N}\sum_{i=1}^{N}(y_{i}-\hat{y}_{i})^{2}

from sklearn.metrics import mean_squared_error
mean_squared_error(y_true, y_pred)

RMSE(Root Mean Squared Error)

Mean squared error. Represents the squared average of the difference between the true value and the predicted value.

--Great evaluation of the effects of outliers --Unit is the same as the objective variable

\textrm{RMSE} = \sqrt{\frac{1}{N}\sum_{i=1}^{N}(y_{i}-\hat{y}_{i})^{2}}

It cannot be calculated directly in the scikit-learn module, so it must be calculated from mean_squared_error.

from sklearn.metrics import mean_squared_error
from numpy as np
np.sqrt(mean_squared_error(y_true, y_pred))

RMSLE(Root Mean Squared Logarithmic Error)

It represents the average of the squared difference after taking each logarithm of the true value and the predicted value.

--Used for data with a wide range of possible values of the objective variable --Express the difference as a ratio

\textrm{RMSLE} = \sqrt{\frac{1}{N}\sum_{i=1}^{N}(\log (1+y_{i})-\log (1+\hat{y}_{i}))^{2}}

from sklearn.metrics import mean_squared_log_error
from numpy as np
np.sqrt(mean_squared_log_error(y_true, y_pred))

Coefficient of determination

\textrm{R}^{2}。 It shows the goodness of the regression analysis.

--Evaluable without depending on the scale of the objective variable --Take a value from 0 to 1

\textrm{R}^{2} = 1 - \frac{\sum_{i=1}^{N}(y_{i}-\hat{y_{i}})^{2}}{\sum_{i=1}^{N}(y_{i}-\bar{y_{i}})^{2}}

\bar{y} = \frac{1}{N}\sum_{i=1}^{N}y_{i}

from sklearn.metrics import r2_score
r2_score(y_true, y_pred)

Binary classification (when predicting positive or negative cases)

In the classification problem, we will summarize the evaluation functions handled in the problem of predicting whether it is a positive example or a negative example.

Mixed matrix

Although it is not an evaluation index, it is used as an evaluation index for predicting positive and negative cases, so I will explain it first.

The mixed matrix tabulates the following classification results.

--TP (True Positive): Correctly predict the positive example --TN (True Negative): Correctly predict negative examples --FP (False Positive): Predicted as a positive example --FN (False Negative): Falsely predicted as a negative example

from sklearn.metrics import confusion_matrix
confusion_matrix(y_true, y_pred)

It is output in the following format.

	Predicted value is negative	Predicted value is a positive example
True value is a negative example	TN	FP
True value is a positive example	FN	TP

Accuracy

Correct answer rate. The forecast represents the correct percentage of the overall forecast result.

Correct answer rate(\textrm{Accuracy}) = \frac{TP + TN}{TP + TN + FP + FN}

from sklearn.metrics import accuracy_score
accuracy_score(y_true, y_pred)

Precision

Conformity rate. Represents the percentage of correct examples predicted.

Compliance rate(\textrm{Precision}) = \frac{TP}{TP + FP}

from sklearn.metrics import precision_score
precision_score(y_true, y_pred)

Recall

Recall. Represents the percentage of correct examples predicted.

Recall(\textrm{Recall}) = \frac{TP}{TP + FN}

from sklearn.metrics import recall_score
recall_score(y_true, y_pred)

F1-score

F value. Represents an index that balances precision and recall.

It is calculated by the harmonic mean of precision and recall.

\textrm{F1-score} = \frac{2 \times \textrm{recall} \times \textrm{precision}}{\textrm{recall} + \textrm{precision}}

from sklearn.metrics import f1_score
f1_score(y_true, y_pred)

Binary classification (when predicting the probability of being a positive example)

Next, we will summarize the evaluation functions handled in the problem of predicting the probability, which is a positive example in the classification problem.

logloss

Also known as cross entropy. It shows how much the predicted probability distribution and the correct probability distribution are the same.

--Take a value from 0 to 1 ――It becomes smaller when you can predict correctly

\begin{align}
\textrm{logloss} &= -\frac{1}{N}\sum_{i=1}^{N}(y_{i}\log p_{i} + (1 - y_{i} \log (1 - p_{i}))) \\
&=-\frac{1}{N}\sum_{i=1}^{N} \log \acute{p}_{i}
\end{align}

y_{i} :Label indicating whether it is a correct example\\
p_{i} :Predictive probability that is a positive example\\
\acute{p}_{i} :Probability of predicting the true value

from sklearn.metrics import log_loss
log_loss(y_true, y_prob)

AUC

Represents the area at the bottom of the ROC curve.

--Random prediction is 0.5 --1.0 if all are predicted correctly --Used for classification with imbalanced data --Evaluation from the relationship between the predicted probability and the correct value (1 or 0)

from sklearn.metrics import roc_auc_score
roc_acu_score(y_true, y_prob)

What is the ROC curve?

It is a graph that plots the relationship between the true positive rate and the false positive rate when the threshold value with the predicted value as a positive example is moved from 0 to 1. It shows how the true positive rate and the false positive rate change when the threshold is changed.

--True positive rate: Percentage of all positive cases predicted to be positive --False positive rate: Percentage of all negative cases predicted to be positive

-(1.0, 1.0) are all correct examples and predicted -(0.0, 0.0) are all predicted to be negative cases -(0.0, 1.0) All above are predicted correctly --The straight line is a random prediction (AUC = 0.5)

Multi-class classification

Multi-class accuracy

It is an index that extends the accuracy of binary classification to multi-class classification. Represents the percentage of records that are correctly predicted.

from sklearn.metrics import accuracy_score
accuracy_score(y_true, y_pred)

mean-F1/macro-F1/micro-F1

It is an index that extends F1-score to multi-class classification.

--mean-F1: Average F1-score per record --macro-F1: Average F1-score for each class --micro-F1: Calculate TP / TN / FP / FN for each record x class pair to calculate F1-score

from sklearn.metrics import f1_score
f1_score(y_true, y_pred, average='samples')
f1_score(y_true, y_pred, average='macro')
f1_score(y_true, y_pred, average='micro')

Multi-class logloss

It is an index that extends the log loss of binary classification to multi-class classification.

\textrm{multiclass logloss} = -\frac{1}{N}\sum_{i=1}^{N}\sum_{m=1}^{M}y_{i,m}\log p_{i,m}

M :Number of classes\\
y_{i,m} :Label indicating whether record i belongs to class m\\
p_{i,m} :Predicted probability that record i belongs to class m\\

from sklearn.metrics import log_loss
log_loss(y_true, y_prob)

reference

sklearn.metrics: Metrics -[Data analysis technology that wins with Kaggle](https://www.amazon.co.jp/Kaggle%E3%81%A7%E5%8B%9D%E3%81%A4%E3%83%87%E3 % 83% BC% E3% 82% BF% E5% 88% 86% E6% 9E% 90% E3% 81% AE% E6% 8A% 80% E8% A1% 93-% E9% 96% 80% E8% 84% 87-% E5% A4% A7% E8% BC% 94 / dp / 4297108437) -[Mechanism of machine learning algorithm that can be understood by looking at it](https://www.amazon.co.jp/%E8%A6%8B%E3%81%A6%E8%A9%A6%E3% 81% 97% E3% 81% A6% E3% 82% 8F% E3% 81% 8B% E3% 82% 8B% E6% A9% 9F% E6% A2% B0% E5% AD% A6% E7% BF% 92% E3% 82% A2% E3% 83% AB% E3% 82% B4% E3% 83% AA% E3% 82% BA% E3% 83% A0% E3% 81% AE% E4% BB% 95% E7% B5% 84% E3% 81% BF-% E6% A9% 9F% E6% A2% B0% E5% AD% A6% E7% BF% 92% E5% 9B% B3% E9% 91% 91-% E7% A7% 8B% E5% BA% AD-% E4% BC% B8% E4% B9% 9F / dp / 4798155659)
What is a loss function in decision theory?

Postscript

** Added on June 25, 2020 ** The mixed matrix table was incorrect and has been fixed. Thank you, @Fizunimo.

[PYTHON] Summary of evaluation functions used in machine learning