[PYTHON] ROC curve for multiclass classification

Purpose

A memorandum for drawing ROC curves by reading predicted values of multiclass classification saved in xlsx or csv format with pandas. I also practice writing articles for Qiita.

About ROC curve

I won't go into the basics in this article. Regarding the story of the ROC curve, the article here is easy to understand.

Label format change

First, check the data format. pandas1.JPG --GT has numbers 0, 1, 2 and corresponds to each label. --C1, C2, C3: Each label C1 = 0.0, C2 = 1.0, C3 = 2.0 --M1, M2, M3: Each model

The ROC curve needs to be converted to a 0, 1 binary, so

from sklearn.preprocessing import label_binarize
y_test = label_binarize(df.iloc[:, 0], classes=[0,1,2])

Use sklearn's label_binarize to do a binary translation. binary.JPG It's a little long, so only the upper part. As a result of binary conversion, C1 = [1, 0, 0], C2 = [0, 1, 0], C3 = [0, 0, 1].

List of predicted values

After binarizing the labels, the predicted values for each model must also be converted to the corresponding list.

M1_y_score = []
M2_y_score = []
M3_y_score = []
for i in df.index:
    M1_y_score.append(([df.iloc[i, 1], df.iloc[i, 2], df.iloc[i, 3]]))
    M2_y_score.append(([df.iloc[i, 4], df.iloc[i, 5], df.iloc[i, 6]]))
    M3_y_score.append(([df.iloc[i, 7], df.iloc[i, 8], df.iloc[i, 9]]))
M1_y_score = M1_y_score
M2_y_score = M2_y_score
M3_y_score = M3_y_score

Like this, I executed the loop processing and stored the predicted value. At this point,

from sklearn.metrics import roc_auc_score
auc_m1 = roc_auc_score(y_test, M1_y_score, multi_class="ovo")
print(auc_m1)

You can find a multi-class AUC by typing. The argument multi_class seems to throw an error if you don't set either "ovo" or "ovr". More details can be found in sklearn documentation .

Calculation of FPR and TPR

This part has just stumbled.

M1_fpr = dict()
M1_tpr = dict()
M1_roc_auc = dict()
M2_fpr = dict()
M2_tpr = dict()
M2_roc_auc = dict()
M3_fpr = dict()
M3_tpr = dict()
M3_roc_auc = dict()

After creating an empty dictionary to store the data

n_class = 3
from sklearn.metrics import roc_curve, auc
for i in range(n_classes):
    M1_fpr[i], M1_tpr[i], _ = roc_curve(y_test[:, i], M1_y_score[:, i])
    M1_roc_auc[i] = auc(M1_fpr[i], M1_tpr[i])

    M2_fpr[i], M2_tpr[i], _ = roc_curve(y_test[:, i], M2_y_score[:, i])
    M2_roc_auc[i] = auc(M2_fpr[i], M2_tpr[i])

    M3_fpr[i], M3_tpr[i], _ = roc_curve(y_test[:, i], M3_y_score[:, i])
    M3_roc_auc[i] = auc(M3_fpr[i], M3_tpr[i])

I loop through the number of labels and store the fpr and tpr of each model. error1.JPG I get an error for some reason! It was because I didn't make it an ndarray when storing the predicted values. So, change the above code a little ...

M1_y_score = np.array(M1_y_score)
M2_y_score = np.array(M2_y_score)
M3_y_score = np.array(M3_y_score)

By making it ndarray type, it is possible to store data in the dictionary. After that, if you code according to the official document, it's OK!

M1_all_fpr = np.unique(np.concatenate([M1_fpr[i] for i in range(n_classes)]))
M2_all_fpr = np.unique(np.concatenate([M2_fpr[i] for i in range(n_classes)]))
M3_all_fpr = np.unique(np.concatenate([M3_fpr[i] for i in range(n_classes)]))
M1_mean_tpr = np.zeros_like(M1_all_fpr)
M2_mean_tpr = np.zeros_like(M2_all_fpr)
M3_mean_tpr = np.zeros_like(M3_all_fpr)

for i in range(n_classes):
    M1_mean_tpr += np.interp(M1_all_fpr, M1_fpr[i], M1_tpr[i])
    M2_mean_tpr += np.interp(M2_all_fpr, M2_fpr[i], M2_tpr[i])
    M3_mean_tpr += np.interp(M3_all_fpr, M3_fpr[i], M3_tpr[i])

M1_mean_tpr /= n_classes
M2_mean_tpr /= n_classes
M3_mean_tpr /= n_classes

M1_fpr["macro"] = M1_all_fpr
M1_tpr["macro"] = M1_mean_tpr
M1_roc_auc["macro"] = auc(M1_fpr["macro"], M1_tpr["macro"])

M2_fpr["macro"] = M2_all_fpr
M2_tpr["macro"] = M2_mean_tpr
M2_roc_auc["macro"] = auc(M2_fpr["macro"], M2_tpr["macro"])

M3_fpr["macro"] = M3_all_fpr
M3_tpr["macro"] = M3_mean_tpr
M3_roc_auc["macro"] = auc(M3_fpr["macro"], M3_tpr["macro"])

ROC curve drawing

Once you've done that, all you have to do is graph using matplotlib.

import matplotlib.pyplot as plt
from matplotlib import cm
lw=1
colors = [cm.gist_ncar(190), cm.gist_ncar(30), cm.gist_ncar(10)]
sns.color_palette(colors)
sns.set_palette(colors, desat=1.0)

plt.figure(figsize=(6, 6))

plt.plot(M1_fpr["macro"], M1_tpr["macro"],
         label='M1',
         color=colors[0], 
         linestyle='-', 
         linewidth=2)

plt.plot(M2_fpr["macro"], M2_tpr["macro"],
         label='M2',
         color=colors[1], 
         linestyle='-', 
         linewidth=2)

plt.plot(M3_fpr["macro"], M3_tpr["macro"],
         label='M3',
         color=colors[2], 
         linestyle='-', 
         linewidth=2)

plt.plot([0, 1], [0, 1], 'k--', lw=lw)
plt.xlim([0.0, 1.0])
plt.ylim([0.0, 1.05])
plt.xlabel('False Positive Rate')
plt.ylabel('True Positive Rate')
plt.legend(loc="lower right")
plt.show()

<img src="https://qiita-image-store.s3.ap-northeast-1.amazonaws.com/0/784518/0a44adaf-50eb-9bd7-55bc-1a7a0adaf28b.jpeg ", width=50%)> I was able to draw a ROC curve for multiclass classification using macro mean.

Recommended Posts

ROC curve for multiclass classification
Plot ROC Curve for Binary Classification with Matplotlib
SVM (multi-class classification)
Naive Bayes (multiclass classification)
Keras multiclass classification Iris
K-nearest neighbor method (multiclass classification)
CNN (1) for image classification (for beginners)
ROC curve and PR curve-Understanding how to evaluate classification performance ②-