[PYTHON] About the confusion matrix

What is a confusion matrix?

A matrix that represents the number that was correctly determined and the number that was mistakenly determined between the prediction result passed through some model and the actual value.

When is the confusion matrix used?

Generally, binary classification.

Why is the confusion matrix used?

For example, when you want to predict whether you have cancer or not from a given image, The actual value is 98/100 for non-cancer people (0) 2/100 for people with cancer (1) Suppose it was.

At this time, if the predictions are all 0, the correct answer rate is 98%. This looks like a good number when viewed in terms of correct answer rate, Is this really a good evaluation? Isn't the two people who missed it a fatal mistake?

Even in such cases, the confusion matrix is used to make a successful evaluation.

Let's use the confusion matrix

In general, the horizontal axis is the prediction result of the model and the vertical axis is the actual value, and they are summarized by the combination of 2 × 2 = 4 as shown in the table below. スクリーンショット 2020-08-09 21.02.50.png

True: Results that can be predicted correctly False: Incorrectly predicted result positive: The result of determining that there is a disease (= 1) negative: Result of determining no disease (= 0)

matrix.py



from sklearn.metrics import confusion_matrix

#Creating a confusion matrix
cm = confusion_matrix(y_true=y_test, y_pred=y_pred)

# y_Passing to true is the objective variable data for evaluation
# y_X to pass to pred_test with predict()Results predicted using the function

#Dataframe the confusion matrix
df_cm = pd.DataFrame(np.rot90(cm, 2), index=["actual_Positive", "actual_Negative"], columns=["predict_Positive", "predict_Negative"])
print(df_cm)

#Visualization of confusion matrix with heatmap
sns.heatmap(df_cm, annot=True, fmt="2g", cmap='Blues')
plt.yticks(va='center')
plt.show()
スクリーンショット 2020-08-09 21.10.48.png

Consider the evaluation index that measures the performance of the model from here

Solution / Accuracy

First of all, check how correctly you could classify in the whole data

Accuracy = \dfrac{TP + TN}{TP + FP + FN + TN}

Accuracy

After getting a positive (1) result, check if you actually answered correctly

Presision=\dfrac{TP}{TP + FP}

Recall, True Positive Rate

The actual data is positive (1), how much Is the predicted data correctly inferred to be positive? The higher this value, the better the performance, and the less the wrong positive judgment is.

Recall=\dfrac{TP}{TP + FN}

True Negative Rate

The actual data is negative (0), how much Is the predicted data correctly estimated to be negative? The higher this value, the better the performance and the less false Negative judgments.

Recall=\dfrac{TN}{FP + TN}

False Negative Rate

The actual data is positive (1), how much Was the predicted data mistakenly presumed to be negative? The lower this value is, the better the performance is, and the less the wrong positive judgment is made.

False\ Negative\ Rate=\dfrac{FN}{TP + FN}

False Positive Rate

The actual data is negative (0), how much Was the predicted data mistakenly presumed to be positive? The lower this value, the better the performance, and the less false Negative judgments are made.

False\ Positive\ Rate=\dfrac{FP}{FP + TN}

Measure the true positive rate and the true negative rate in the example dealt with in the chapter "Why is the confusion matrix used?"

Positive prediction results Negative prediction results
Actual positive result 98 0
Actual negative result 2 0

Accuracy = \dfrac{98 + 0}{98 + 2 + 0 + 0}=0.98

98% correct answer rate

Recall=\dfrac{98}{98 + 0}=1

100% => This determines that all positives are correctly classified

Recall=\dfrac{0}{2 + 0}=0

0% => This determines that all negatives were classified incorrectly

Summary

To use a binary classification machine learning model in business, calculate an index to measure performance, It is important to understand and use the index value that suits the purpose

Recommended Posts

About the confusion matrix
About Confusion Matrix
About the test
About the queue
About the Unfold function
About the Visitor pattern
About scatter_matrix (scatter matrix)
About the Python module venv
About the ease of Python
About the enumerate function (python)
About the traveling salesman problem
About understanding the 3-point reader [...]
About the components of Luigi
About the features of Python
Get the index of each element of the confusion matrix in Python
Think about the minimum change problem
About the Ordered Traveling Salesman Problem
[Python] What is @? (About the decorator)
About the return value of pthread_mutex_init ()
About the return value of the histogram.
About the basic type of Go
About the upper limit of threads-max
About the average option in sklearn.metrics.f1_score
About the behavior of yield_per of SqlAlchemy
About the size of matplotlib points
About the basics list of Python basics
Roughly think about the loss function
[Python Kivy] About changing the design theme
About the behavior of enable_backprop of Chainer v2
About the virtual environment of python version 3.7
Run the Matrix to your boss's terminal!
Miscellaneous notes about the Django REST framework
Control the Matrix LED panel from ROS
Roughly think about the gradient descent method
[Python] Summarize the rudimentary things about multithreading
About the development environment you are using
About the arguments of the setup function of PyCaret
What about 2017 around the Crystal language? (Delusion)
About the relationship between Git and GitHub
About the Normal Equation of Linear Regression
A note about doing the Pyramid tutorial
Does the confusion matrix also need the ratio of each element to the row total?