[PYTHON] I tried to understand the support vector machine carefully (Part 1: I tried the polynomial / RBF kernel using MakeMoons as an example).

Introduction

This time, I would like to summarize a machine learning model using a support vector machine. The outline is below.

Understand the concept of support vector machines

The Support Vector Machine is one of the machine learning algorithms that optimizes the model based on a criterion called "margin maximization". It is basically used for classification and regression problems. Typical algorithms (which I recognize) to solve these classification / regression problems are as follows.

Other than logistic regression, it has the feature that non-linear representation is possible. The support vector machine this time is a popular algorithm (like) because it is practical and easy to handle.

In some cases, it is translated as a support ** vector ** machine. This is probably due to the difference in translating Vector as "vector" or "vector".

What is support vector and margin maximization?

Now, let me explain the words. The margin is the distance between the element (△, ▲) that is closest to the separator (line format: $ w ^ Tx + b $) that identifies as shown in the figure below. And this element with the closest distance is called a support vector.

Maximizing the distance of this margin is the optimization of this support vector machine (hereinafter abbreviated as SVM).

image.png

Try a nonlinear SVM classifier based on the MakeMoons dataset

Linear classifiers work well for data with easy-to-separate characteristics like the ones above, but in practice they often require very complex divisions. In that case, you need to use a non-linear classifier. By using SVM, it is possible to easily create and use a non-linear classifier.

The MakeMoons dataset is an example of an SVM analysis. This is a module that can be imported from the scikit learn dataset shown below. You can draw so-called moon-shaped data. Let's classify this by nonlinear SVM.

SVM.ipynb


from sklearn.datasets import make_moons
X, y = make_moons(n_samples=100, noise=0.15, random_state=42)

def plot_dataset(X, y, axes):
    plt.plot(X[:, 0][y==0], X[:, 1][y==0], "bs")
    plt.plot(X[:, 0][y==1], X[:, 1][y==1], "g^")
    plt.axis(axes)
    plt.grid(True, which='both')
    plt.xlabel(r"$x_1$", fontsize=20)
    plt.ylabel(r"$x_2$", fontsize=20, rotation=0)

plot_dataset(X, y, [-1.5, 2.5, -1, 1.5])
plt.show()

010.png

SVM classifier with polynomial kernels of different dimensions

In SVM, the value of degree represents a dimension, and the performance of the classifier changes depending on this dimension value.

SVM.ipynb


from sklearn.datasets import make_moons
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import PolynomialFeatures
from sklearn.svm import SVC

poly_kernel_svm_clf = Pipeline([
        ("scaler", StandardScaler()),
        ("svm_clf", SVC(kernel="poly", degree=3, coef0=1, C=5))
    ])
poly_kernel_svm_clf.fit(X, y)

poly100_kernel_svm_clf = Pipeline([
        ("scaler", StandardScaler()),
        ("svm_clf", SVC(kernel="poly", degree=12, coef0=100, C=5))
    ])
poly100_kernel_svm_clf.fit(X, y)

plt.figure(figsize=(11, 4))

plt.subplot(121)
plot_predictions(poly_kernel_svm_clf, [-1.5, 2.5, -1, 1.5])
plot_dataset(X, y, [-1.5, 2.5, -1, 1.5])
plt.title(r"$d=3, r=1, C=5$", fontsize=18)

plt.subplot(122)
plot_predictions(poly100_kernel_svm_clf, [-1.5, 2.5, -1, 1.5])
plot_dataset(X, y, [-1.5, 2.5, -1, 1.5])
plt.title(r"$d=12, r=100, C=5$", fontsize=18)

plt.show()

011.png

In the above example, the left figure shows a three-dimensional example, and the right figure shows a 12-dimensional example. In the case of 12 dimensions, it is in a state of overfitting. Therefore, in this case, it is necessary to lower the order of the model. On the contrary, if you lower it to two dimensions, you can see that it cannot be classified as shown in the figure below.

012.png

Understand scikit-learn Pipeline

A method called Pipeline that came out in the above program came out, so I will briefly describe it. Detailed pre-processing such as missing value processing and value standardization is required for model prediction in machine learning. At this time, scikit's Pipeline is a method that can solve these problems to some extent.   In this case, the following processing is performed.


"scaler", StandardScaler()#Standardize the value (= subtract by mean and divide by variance)
"svm_clf", SVC(kernel="poly", degree=3, coef0=1, C=5)#Set the SVM classifier

I understand that it involves the processing that I want to compile as a model,

Try the Gauss RBF kernel

Now, we will use one of the SVM classification methods called the Gauss RBF kernel method. All the words were difficult to understand, so I would like to understand the meaning of each word.

What is a kernel function

First, about kernel functions. This is a very important function for non-linear representation, which I used lightly in the MakeMoons example earlier. As an overview, as shown in the figure below, a process that can have a plane that can be linearly separable by increasing the dimensions (in practice, it is called a hyperplane because it has three or more dimensions) ** is called a kernel function. I call.

image.png

Regression using this kernel function is called kernel regression. The formula that represents kernel regression is as follows.

f({\bf x}) = \sum_{i=1}^{N} \alpha_i k({\bf x}^{(i)}, {\bf x})

$ \ Alpha_i $ is the coefficient you want to optimize, and $ k ({\ bf x} ^ {(i)}, {\ bf x}) $ is called the kernel function. There are several types of this kernel function used. The following is one of them, a kernel function called Gaussian kernel.

k({\bf x}, {\bf x}') = exp(-\beta \|{\bf x} - {\bf x}'\|^2)

||x-x'||Is the norm: the distance.\betaIs\beta>0It is a real hyperparameter that satisfies, and is determined by the person using the model.

What is RBF

It is called Radial Basis Function, and in Japanese it is called Radial Basis Function. There seem to be various functions that become this RBF, but the most used one is called the Gaussian function.

φ(\bf x)=(\bf x - \bf c)^T \frac {(\bf x - \bf c)}{2σ^2}

001.jpg

$ \ Bf x $ is the input function and $ \ bf c $ is the center of the Gaussian function.

Let's implement

SVM.ipynb


from sklearn.svm import SVC

gamma1, gamma2 = 0.1, 5
C1, C2 = 0.001, 1000
hyperparams = (gamma1, C1), (gamma1, C2), (gamma2, C1), (gamma2, C2)

svm_clfs = []
for gamma, C in hyperparams:
    rbf_kernel_svm_clf = Pipeline([
            ("scaler", StandardScaler()),
            ("svm_clf", SVC(kernel="rbf", gamma=gamma, C=C))
        ])
    rbf_kernel_svm_clf.fit(X, y)
    svm_clfs.append(rbf_kernel_svm_clf)

plt.figure(figsize=(11, 11))

for i, svm_clf in enumerate(svm_clfs):
    plt.subplot(221 + i)
    plot_predictions(svm_clf, [-1.5, 2.5, -1, 1.5])
    plot_dataset(X, y, [-1.5, 2.5, -1, 1.5])
    gamma, C = hyperparams[i]
    plt.title(r"$\gamma = {}, C = {}$".format(gamma, C), fontsize=16)

plt.show()

013.png

It is necessary to determine both γ and C as hyperparameters, but it turns out that overfitting will occur if each value is too large.

At the end

While SVM is a very easy-to-use model, the mathematical theory behind it is very profound.

I referred to this article this time.

Linear method and kernel method (regression analysis) https://qiita.com/wsuzume/items/09a59036c8944fd563ff

The full program is stored here. https://github.com/Fumio-eisan/SVM_20200417

Recommended Posts

I tried to understand the support vector machine carefully (Part 1: I tried the polynomial / RBF kernel using MakeMoons as an example).
I tried to understand the learning function in the neural network carefully without using the machine learning library (second half).
I tried to understand the learning function of neural networks carefully without using a machine learning library (first half).
I tried to compress the image using machine learning
I tried to compare the accuracy of machine learning models using kaggle as a theme.
Understand the function of convolution using image processing as an example
(Machine learning) I tried to understand Bayesian linear regression carefully with implementation.
I tried to understand the decision tree (CART) that makes the classification carefully
I tried to operate from Postman using Cisco Guest Shell as an API server
I tried to get an AMI using AWS Lambda
I tried to become an Ann Man using OpenCV
I tried to erase the negative part of Meros
I tried to identify the language using CNN + Melspectogram
I tried to complement the knowledge graph using OpenKE
I tried to understand it carefully while implementing the algorithm Adaboost in machine learning (+ I deepened my understanding of array calculation)
It was difficult to understand kernel tricks in support vector machines (Part 2: Kernel regression, ridge regression, etc.)
[Git] I tried to make it easier to understand how to use git stash using a concrete example
I want to automate ssh using the expect command! part2
I tried to simulate ad optimization using the bandit algorithm.
[TF] I tried to visualize the learning result using Tensorboard
[Machine learning] I tried to summarize the theory of Adaboost
I tried to approximate the sin function using chainer (re-challenge)
I tried to output the access log to the server using Node.js
I tried to get data from AS / 400 quickly using pypyodbc
(Python) Expected value ・ I tried to understand Monte Carlo sampling carefully
I tried to get the index of the list using the enumerate function
[Updated as appropriate] I tried to organize the basic visualization methods
I tried to digitize the stamp stamped on paper using OpenCV
[Machine learning] I tried to do something like passing an image
I tried to get data from AS / 400 quickly using pypyodbc Preparation 1
A super introduction to Django by Python beginners! Part 3 I tried using the template file inheritance function
A super introduction to Django by Python beginners! Part 2 I tried using the convenient functions of the template