[PYTHON] Performance evaluation index

Performance evaluation index

What is a confusion matrix?

A trained model built using training data I will touch on the evaluation index that determines how good it is.

First, the confusion matrix is the prediction result of the model for each test data. It is a table that summarizes the number of prediction results that apply to each of the four perspectives of True Positive, True Negative, False Positive, and False Negative.

“True or false” indicates whether the prediction was correct, and “Positive or negative” indicates the predicted class. In other words

(1) The number of true positives predicted to be in the positive class and the result was also in the positive class
(2) The number of true negatives predicted to be in the negative class and the result was also in the negative class
③ The number of false positives predicted to be in the positive class but the result was in the negative class
④ The number of false negatives predicted to be in the negative class but the result was in the positive class
Are shown respectively.

The machine learning model answers True Positive and True Negative correctly. False Positives and False Negatives indicate that the machine learning model is incorrect.

image.png

Implement a confusion matrix

sklearn.In the metrics module

confusion_Let's actually see the number of each component of the confusion matrix using the matrix function.

The confusion_matrix function can be used as follows.

from sklearn.metrics import confusion_matrix
confmat = confusion_matrix(y_true, y_pred)

y_true stores the actual class of correct data as an array y_pred contains the expected classes as an array. The way it is stored is as shown in the figure confirmed in the confusion matrix.

Correct answer rate

If you can actually build a classification model, is that classification model better than other classification models? We need a clear standard to evaluate whether it is good or not.

Check the correct answer rate. The correct answer rate means that the diagnosis results were correct in all the events. It is a percentage of the number (classified as TP / TN) and can be calculated as follows.

image.png

Conformance / accuracy and recall

Precision / precision is the percentage of data that is predicted to be positive that is actually positive. (Predicted success rate) Recall represents the percentage of actual positive data that can be predicted to be positive. (Practical, reliable rate)

image.png

image.png

F value

The F value is a combination of both the precision and recall (harmonic mean). image.png

Basically, check not only the correct answer rate but also the F value, accuracy, and recall rate. I will check if it is really reliable.

Implemented performance evaluation index

Let's use the performance evaluation index implemented in scikit-learn.

#Conformity rate, recall rate, F value
from sklearn.metrics import precision_score, recall_score, f1_score

#Stores data. This time 0 is positive and 1 is negative
y_true = [0,0,0,1,1,1]
y_pred = [1,0,0,1,1,1]

# y_true is the correct label, y_Pass each prediction result label to pred
print("Precision: {:.3f}".format(precision_score(y_true, y_pred)))
print("Recall: {:.3f}".format(recall_score(y_true, y_pred)))
print("F1: {:.3f}".format(f1_score(y_true, y_pred)))

PR curve

Relationship between recall and precision

image.png

The relationship between these two performance evaluation indexes is a trade-off relationship. The trade-off relationship is If you try to increase the recall rate, the precision rate will decrease. If you try to increase the precision rate, it means that the recall rate will decrease.

If many patients are positive in a strict examination at a hospital examination Higher recall, but lower accuracy, etc.

Select and use the recall rate, accuracy, and F value according to the basics and the content to be handled.

image.png

What is a PR curve?

A PR curve is a graph that plots data, with the horizontal axis representing recall and the vertical axis representing precision / precision.

Let me give you an example. For 10 patients who have undergone cancer screening After calculating the possibility of cancer for each, consider declaring the patient positive or negative based on it.

In this case, the precision is of the number of patients declared positive in the cancer screening. The percentage of patients who really have cancer Recall is among patients who are truly cancerous The rate at which cancer was declared.

The problem here is when 10 patients are listed in order of increasing likelihood of cancer. The top number of people to declare positive.

Depending on how many people are declared positive Both recall and precision / precision will change.

At this time, if only the first person is positive, if the second person is positive, and so on. The figure that calculates the precision / recall and plots them all It can be called a PR curve. The process of plotting is as follows.

image.png

The plot of these precision / recall and recall is as follows. Also, the shape of the PR curve changes depending on the result.

image.png

From the above figure, it can be said that the relationship between recall and precision / precision is a trade-off.

Evaluation of the model using the PR curve

Considering the maximum effective utilization by placing it on the PR curve, let's first review the two axes.

It's best to have high precision / recall and high recall. However, due to the trade-off relationship, if you try to raise either one, one will fall.

However, there is a point in the PR curve where the precision and recall match. This point

Break even point(BEP)Is called.

In this respect, it is an important point in business because it is possible to optimize costs and profits while maintaining a well-balanced relationship between precision / precision and recall. I touched on the evaluation index called F value, but you should keep the break even point as a similar concept.

image.png

Let's evaluate the model using the PR curve. The superiority and inferiority of the model based on the PR curve is as follows. In other words, it can be said that the better the model was built, the more BEP transitioned to the upper right. This is because the more the BEP moves to the upper right, the higher the precision / precision and recall at the same time.

image.png

Recommended Posts

Performance evaluation index
Basic level performance evaluation of programming languages