[PYTHON] About SVC of svm module (Math is omitted.)

Everyone, good evening. I will write the sentence of 12/19. I had something to do today and it was late. .. ..

Today I would like to summarize the classification of the linear SVC model that I studied yesterday.

Purpose of classification

2019_12_19_blog.png

See the picture above. Classification is to automatically find the line that separates this blue dot from the orange cross. Two-dimensional is very easy for humans, isn't it? The SVC model is amazing because it seems that you can ask for an expression to divide even in the case of high dimensions. So what is this useful for? Many people would have thought. How about the image below?

2019_12_19_picpng.png

Did you see the difference? Yes, Gmail plays junk mail as a matter of course, but this is the power of AI. (The algorithm is more complicated.) But what they are doing is the same, drawing a straight line (face) to analyze whether it is spam or valid email. It's beautiful and amazing, isn't it?

Well, I'll show you the image of today's final purpose. It will be as follows.

Figfdsaure_1.png

⚠︎ There is something I want to be careful about. That is, in the case of this example, an infinite line can be drawn. Therefore, after the width of this dotted line (merge) is the widest, a solid line (decision boundary) is drawn in the middle.

Now let's look at the code to find this decision boundary.

Sample code

import numpy as np
from sklearn.svm import SVC
import matplotlib.pyplot as plt
xmin = 0
xmax = 2
ymin = 0
ymax = 2

# 0 ~Contains values according to a normal distribution up to 1 (100 rows and 2 columns (X & Y))) Matrix
X0 = np.random.uniform(size=(100, 2))
y0 = np.repeat(0, 100)

X1 = np.random.uniform(low=1.0, high=2.0, size=(100, 2))
y1 = np.repeat(1, 100)

svc = SVC(kernel='linear', C=1e8)

#Learning
# svc.fit(train_features, train_labels)
svc.fit(np.vstack((X0, X1)), np.hstack((y0, y1)))

fig, ax = plt.subplots()

# ax.scatter(x, y)Scatter plot
ax.scatter(X0[:, 0], X0[:, 1], marker='o', label='class 0')
ax.scatter(X1[:, 0], X1[:, 1], marker='x', label='class 1')

#Prepare grid points
xx, yy = np.meshgrid(np.linspace(xmin, xmax, 100), np.linspace(ymin, ymax, 100))
# decision_In one matrix for function.
xy = np.vstack([xx.ravel(), yy.ravel()]).T

# 100 *100 matrices.
# z = x*2 + y*The height of z in the part to be 2.
#The coefficient of determination of svc is used to find z from the grid point xy.
z = svc.decision_function(xy).reshape((100, 100))
#Plot decision boundaries and margins
ax.contour(xx, yy, z, colors='k', levels=[-1, 0, 1], alpha=0.3, linestyles=['--', '-', '--'])
plt.show()

As a flow,

  1. Prepare the data
  2. Create an instance of SVC
  3. Learn the input data by putting the label associated with it in the fit function.
  4. Create grid points (Reference article)
  5. Create Z using decision_function learned in fit
  6. You can draw a decision boundary by drawing contour lines using contour.

The important functions are fit and decision_funkction. I understand how to use fit, but decision_function can only understand how to find Z using a classifier. I'm going to study it a little more and write an article.

It's late at night today, so that's about it. good night.

Recommended Posts

About SVC of svm module (Math is omitted.)
About the processing speed of SVM (SVC) of scikit-learn