University of Tsukuba Machine Learning Course: Study sklearn while creating the Python script part of the task (11)

Last time University of Tsukuba Machine Learning Course: Study sklearn while creating the Python script part of the assignment (10) https://github.com/legacyworld/sklearn-basic

Exercise 5.4 Creating ROC curves and tuning regularization parameters

Commentary on Youtube: 6th (1) 58 minutes 30 seconds The iris (iris) data of sklearn is used, but the following conditions are used for simplification.

--Use only sepal width and sepal length --Do not use petal width / length --Only consider whether virginica or not --Actually, there are 3 types: setosa, versicolor, and virginica.

5.4_irisdata.png

Click here for source code

python:Homework_5.4.py


import numpy as np
import matplotlib.pyplot as plt
from sklearn import svm,metrics
from sklearn.model_selection import train_test_split
from sklearn.datasets import load_iris  
iris = load_iris()
# sepal width(Sepal width),sepal length(One-sided length)Use only
features = [0,1]
#Just consider whether virginica or not
target = 2
X = iris.data[:,features]
y = iris.target
# y =Not 2(Not virginica)If-1
y[np.where(np.not_equal(y,target))] = -1
y[np.where(np.equal(y,target))] = 1
#Draw data only
plt.figure(0,figsize=(5,5))
plt.scatter(X[:, 0][np.where(y==1)], X[:, 1][np.where(y==1)], color='b',label=iris.target_names[target])
plt.scatter(X[:, 0][np.where(y==-1)], X[:, 1][np.where(y==-1)], color='r',label=f"not {iris.target_names[target]}")
plt.xlabel(iris.feature_names[features[0]])
plt.ylabel(iris.feature_names[features[1]])
plt.legend()
plt.savefig("5.4_irisdata.png ")
# SVM (C=0.01)
clf = svm.SVC(kernel='linear', C=0.01)
clf.fit(X, y)

#ROC curve
fpr,tpr,th = metrics.roc_curve(y,clf.decision_function(X))
plt.clf()
plt.plot(fpr,tpr)
plt.xlabel("False Positive Rate")
plt.ylabel("True Positive Rate")
plt.title("ROC")
plt.savefig("5.4_ROC_curve.png ")
#False Positive Rate is 0.True Positive Rate near 2
index = np.where(fpr>=0.2)[0].tolist()[0]
print(f"FPR = {fpr[index-1]} TPR = {tpr[index-1]}")
print(f"FPR = {fpr[index]} TPR = {tpr[index]}")

#Draw while changing the intercept
for i in range(0,110,10):
    dec = clf.decision_function(X)
    #Move the intercept from maximum to minimum
    intercept = np.max(dec) - i/100*(np.max(dec)-np.min(dec))
    #TPR and FPR
    TP = np.where((np.signbit(dec-intercept)==False) & (np.signbit(y)==False))
    FP = np.where((np.signbit(dec-intercept)==False) & (np.signbit(y)==True))
    FPR = len(FP[0])/len(y)
    TPR = len(TP[0])/len(y)
    #Index of wrong prediction
    index = np.where(np.signbit(dec-intercept)!=np.signbit(y))
    x_miss = X[index]
    y_miss = y[index]
    #drawing
    plt.clf()
    plt.figure(0,figsize=(5,5))
    plt.scatter(X[:, 0][np.where(y==1)], X[:, 1][np.where(y==1)], color='b',label=iris.target_names[target])
    plt.scatter(X[:, 0][np.where(y==-1)], X[:, 1][np.where(y==-1)], color='r',label=f"not {iris.target_names[target]}")
    plt.xlabel(iris.feature_names[features[0]])
    plt.ylabel(iris.feature_names[features[1]])
    plt.title(f"Decision boundary Moved {i}% {-intercept:.3f}\nFPR = {FPR:.3f} TPR = {TPR:.3f}")
    #Drawing the decision boundary
    xlim = plt.xlim()
    ylim = plt.ylim()
    #Make a 30x30 grid
    xx = np.linspace(xlim[0], xlim[1], 30)
    yy = np.linspace(ylim[0], ylim[1], 30)
    YY, XX = np.meshgrid(yy, xx)
    xy = np.vstack([XX.ravel(), YY.ravel()]).T
    #Classification at each grid
    Z = clf.decision_function(xy).reshape(XX.shape)-intercept
    #Draw decision boundaries using contour lines level=0 corresponds to it
    plt.contour(XX,YY,Z,levels=[0])
    plt.scatter(x_miss[:,0][np.where(y_miss==1)],x_miss[:,1][np.where(y_miss==1)],color='b',s=100,linewidth=1, facecolors='none', edgecolors='k')
    plt.scatter(x_miss[:,0][np.where(y_miss==-1)],x_miss[:,1][np.where(y_miss==-1)],color='b',s=100,linewidth=1, facecolors='none', edgecolors='k')
    plt.legend()
    plt.savefig(f"5.4_decision_boundary_{i}.png ")

#C to 0.01 to 1,AUC score when changing to 000
c_values = [0.01,0.02,0.05,0.1,0.2,0.5,1,2,5,10,100,1000]
aucs = []
# TRAIN/VALICATION/Divided into 3 for TEST
X_tr_val,X_test,y_tr_val,y_test = train_test_split(X,y,test_size=0.2,random_state=1)
X_tr,X_val,y_tr,y_val = train_test_split(X_tr_val,y_tr_val,test_size=0.2,random_state=1)
print(len(X_tr_val),len(X_test),len(X_tr),len(X_val))
for c_value in c_values:
    clf = svm.SVC(kernel='linear',C=c_value).fit(X_tr,y_tr)
    dec = clf.decision_function(X_val)
    auc = metrics.roc_auc_score(y_val,dec)
    aucs.append(auc)

#Recalculate with the best C
best_c_index = np.argmax(aucs)
best_clf = svm.SVC(kernel='linear', C=c_values[best_c_index]).fit(X_tr_val,y_tr_val)
test_predict = best_clf.predict(X_test)
print(f"Best value of parameter C c_values[best_c_index]")

plt.clf()
plt.plot(c_values,aucs,label="AUC Scores")
plt.plot(c_values[best_c_index],aucs[best_c_index],marker ='o',markersize=10,label="Best Score")
plt.legend()
plt.xscale("log")
plt.savefig("5.4_AUC.png ")

ROC curve

In the lecture, the intercept (threshold value) was moved, but in this program, the intercept is drawn every 10%. The explanation of why you are doing this is very detailed here. https://qiita.com/TsutomuNakamura/items/ef963381e5d2768791d4

The predictions when moving the intercept and the mistakes (circled) are shown below. 5.png 5 (2).png 5 (3).png

The higher the threshold, the higher the accuracy. In other words, something that is not virginica is not made virginica, but there are more cases where virginica is not virginica. On the contrary, if the threshold value is lowered, virginica can be correctly judged as virginica, but non-virginica will also be virginica.

The former is a criminal judgment (some criminals may be overlooked, but false charges are not allowed) The latter is the judgment of cancer screening (a benign tumor that is not cancer is also judged to be cancer and makes the patient extra scared, but there is no problem because the damage is only scary)

The ROC curve looks like this 5.4_ROC_curve.png

Now this problem in the challenge "When the probability of predicting virginica even though it is not virginica is suppressed to 0.2 or less, what is the maximum probability of correctly predicting virginica data as virginica?" In short, what is the maximum value of True Positive Rate when False Positve Rate <= 0.2? In other words. This is the output inside the program

FPR = 0.18 TPR = 0.8
FPR = 0.2 TPR = 0.8

When the FPR is around 0.2, the TPR is about 0.8, so the probability of making a correct prediction is 80%.

AUC The final problem is to move the regularization parameters and plot the AUC. First, as a preparation, the data is divided into three parts.

--X_tr_val, y_tr_val: 120 pieces --Data for SVC again after the best C is decided --X_test, y_test: 30 pieces --Data for the final accuracy_score calculation --X_tr, y_tr: 96 pieces --Training data to find the optimum C --X_val, y_val: 24 pieces --Data for issuing AUC Score

AUC graph 5.4_AUC.png

predict_proba and decision_function

In the lecture, predict_proba is used in the AUC part, but I used decision_function because I saw some articles that predict_proba was sometimes mistaken. https://qiita.com/rawHam/items/3bcb6a68a533f2b82a85

Past posts

University of Tsukuba Machine Learning Course: Study sklearn while creating the Python script part of the assignment (1) University of Tsukuba Machine Learning Course: Study sklearn while creating the Python script part of the assignment (2) University of Tsukuba Machine Learning Course: Study sklearn while creating the Python script part of the assignment (3) University of Tsukuba Machine Learning Course: Study sklearn while creating the Python script part of the assignment (4) University of Tsukuba Machine Learning Course: Study sklearn while creating the Python script part of the assignment (5) University of Tsukuba Machine Learning Course: Study sklearn while creating the Python script part of the assignment (6) University of Tsukuba Machine Learning Course: Study sklearn while making the Python script part of the task (7) Make your own steepest descent method University of Tsukuba Machine Learning Course: Study sklearn while making the Python script part of the task (8) Make your own stochastic steepest descent method University of Tsukuba Machine Learning Course: Study sklearn while creating the Python script part of the assignment (9) https://github.com/legacyworld/sklearn-basic https://ocw.tsukuba.ac.jp/course/systeminformation/machine_learning/

Recommended Posts

University of Tsukuba Machine Learning Course: Study sklearn while creating the Python script part of the task (17)
University of Tsukuba Machine Learning Course: Study sklearn while creating the Python script part of the task (5)
University of Tsukuba Machine Learning Course: Study sklearn while creating the Python script part of the task (10)
University of Tsukuba Machine Learning Course: Study sklearn while creating the Python script part of the task (2)
University of Tsukuba Machine Learning Course: Study sklearn while creating the Python script part of the task (13)
University of Tsukuba Machine Learning Course: Study sklearn while creating the Python script part of the task (9)
University of Tsukuba Machine Learning Course: Study sklearn while creating the Python script part of the task (4)
University of Tsukuba Machine Learning Course: Study sklearn while creating the Python script part of the task (12)
University of Tsukuba Machine Learning Course: Study sklearn while creating the Python script part of the task (1)
University of Tsukuba Machine Learning Course: Study sklearn while creating the Python script part of the task (11)
University of Tsukuba Machine Learning Course: Study sklearn while creating the Python script part of the task (3)
University of Tsukuba Machine Learning Course: Study sklearn while creating the Python script part of the task (14)
University of Tsukuba Machine Learning Course: Study sklearn while creating the Python script part of the task (6)
University of Tsukuba Machine Learning Course: Study sklearn while creating the Python script part of the task (15)
University of Tsukuba Machine Learning Course: Study sklearn while making the Python script part of the task (8) Make your own stochastic steepest descent method
University of Tsukuba Machine Learning Course: Study sklearn while making the Python script part of the task (8) Make your own stochastic steepest descent method
University of Tsukuba Machine Learning Course: Study sklearn while creating the Python script part of the task (17)
University of Tsukuba Machine Learning Course: Study sklearn while creating the Python script part of the task (5)
University of Tsukuba Machine Learning Course: Study sklearn while creating the Python script part of the task (10)
University of Tsukuba Machine Learning Course: Study sklearn while creating the Python script part of the task (2)
University of Tsukuba Machine Learning Course: Study sklearn while creating the Python script part of the task (13)
University of Tsukuba Machine Learning Course: Study sklearn while creating the Python script part of the task (9)
University of Tsukuba Machine Learning Course: Study sklearn while creating the Python script part of the task (4)
University of Tsukuba Machine Learning Course: Study sklearn while creating the Python script part of the task (12)
University of Tsukuba Machine Learning Course: Study sklearn while creating the Python script part of the task (1)
University of Tsukuba Machine Learning Course: Study sklearn while creating the Python script part of the task (11)
University of Tsukuba Machine Learning Course: Study sklearn while creating the Python script part of the task (3)
University of Tsukuba Machine Learning Course: Study sklearn while creating the Python script part of the task (14)
University of Tsukuba Machine Learning Course: Study sklearn while creating the Python script part of the task (6)
University of Tsukuba Machine Learning Course: Study sklearn while creating the Python script part of the task (15)
Python & Machine Learning Study Memo ⑤: Classification of irises
Python & Machine Learning Study Memo ②: Introduction of Library
Python & Machine Learning Study Memo ⑤: Classification of irises
Python & Machine Learning Study Memo ②: Introduction of Library
Image collection Python script for creating datasets for machine learning
Summary of the basic flow of machine learning with Python
The result of Java engineers learning machine learning in Python www
[Machine learning pictorial book] A memo when performing the Python exercise at the end of the book while checking the data
Python learning memo for machine learning by Chainer until the end of Chapter 2
Python & Machine Learning Study Memo: Environment Preparation
Learning notes from the beginning of Python 1
I installed Python 3.5.1 to study machine learning
Python Basic Course (at the end of 15)
Python & Machine Learning Study Memo ③: Neural Network
Python & Machine Learning Study Memo ④: Machine Learning by Backpropagation
Learning notes from the beginning of Python 2
Python & Machine Learning Study Memo ⑥: Number Recognition
Align the number of samples between classes of data for machine learning with Python
Introducing the book "Creating a profitable AI with Python" that allows you to learn machine learning in the shortest course
Machine learning memo of a fledgling engineer Part 1
[Python] Read the source code of Bottle Part 2
Classification of guitar images by machine learning Part 1
Machine learning starting with Python Personal memorandum Part2
The story of low learning costs for Python
2016 The University of Tokyo Mathematics Solved with Python
Machine learning starting with Python Personal memorandum Part1
Upgrade the Azure Machine Learning SDK for Python
EV3 x Python Machine Learning Part 2 Linear Regression
[Python] Read the source code of Bottle Part 1
Machine learning memo of a fledgling engineer Part 2
Classification of guitar images by machine learning Part 2
Python & Machine Learning Study Memo ⑦: Stock Price Forecast
[Python + OpenCV] Whiten the transparent part of the image
Predicting the goal time of a full marathon with machine learning-③: Visualizing data with Python-
The first step of machine learning ~ For those who want to implement with python ~
[CodeIQ] I wrote the probability distribution of dice (from CodeIQ math course for machine learning [probability distribution])
[Machine learning] "Abnormality detection and change detection" Let's draw the figure of Chapter 1 in Python.