Overview

SVM (Support Vector Machine) is known as a machine learning method with high classification accuracy. For higher classification accuracy in SVM, hyperparameters need to be determined from training data. In this article, I will explain how the decision boundary changes by adjusting the hyperparameters of SVM using the RBF kernel (Gaussian kernel).

Hyperparameters to decide

In SVM using RBF kernel, adjust the following two hyperparameters.

Cost parameter: $ C $
RBF kernel parameters: $ \ gamma $

About cost parameters

SVM is a method for determining the hyperplane that separates the set of data points mapped to the feature space. However, the set of points on the feature space is not always separable. For example, in the figure below, it is not possible to draw a straight line that perfectly separates the two types of symbols.

Now, let's consider misclassification, draw a straight line, and divide the point set. For example, draw a straight line in the previous figure as shown below to divide the two types of symbols.

The cost parameter $ C $ is a parameter that determines how much misclassification is tolerated. $ C $ appears in the equation for the quadratic programming problem solved by the SVM.

\min_{\beta}\frac{1}{2}\|\beta\|^2+C\sum_{i=1}^{N}\xi_i

Determine the hyperplane so that smaller $ C $ allows misclassification and larger $ C $ does not tolerate misclassification.

About RBF kernel parameters

RBF kernel parameters: $ \ gamma $ appear in the following RBF kernel expression.

K(x, x')=\exp(-\gamma\|x-x'\|^2)

As shown in the experiment described later, the smaller the value of $ \ gamma $, the simpler the decision boundary, and the larger the value, the more complicated the decision boundary.

Experiment

Let's draw the decision boundaries when $ C $ and $ \ gamma $ are set to extreme values. $ C $ was set to $ 2 ^ {-5} $ and $ 2 ^ {15} $, and $ \ gamma $ was set to $ 2 ^ {-15} $ and $ 2 ^ 3 $, respectively. Use the SVM implemented in scikit-learn (0.15). (Internally, [LIBSVM](http: // www. csie.ntu.edu.tw/~cjlin/libsvm/) is used.) The dataset uses iris. iris is a dataset that contains 3 class labels and 4 features. This time we will use only 2 class labels and 2 features. To make the problem more difficult, we add noise to each of the two features.

Source code

# -*- coding: utf-8 -*-

import numpy as np
from sklearn import svm, datasets
import matplotlib.pyplot as plt
from itertools import product

if __name__ == '__main__':
    iris = datasets.load_iris()
    #The first two features,Use the first two class labels as well
    X = iris.data[:100, :2]
    #Add noise to features
    E = np.random.uniform(0, 1.0, size=np.shape(X))
    X += E
    y = iris.target[:100]
    #mesh step size
    h = 0.02
    #Cost parameters
    Cs = [2 ** -5, 2 ** 15]
    #RBF kernel parameters
    gammas = [2 ** -15, 2 ** 3]
    
    svms = [svm.SVC(C=C, gamma=gamma).fit(X, y) for C, gamma in product(Cs, gammas)]
    titles = ["C: small, gamma: small", "C: small, gamma: large",
        "C: large, gamma: small", "C: large, gamma: large"]
    x_min, x_max = X[:, 0].min() - 1, X[:, 0].max() + 1
    y_min, y_max = X[:, 1].min() - 1, X[:, 1].max() + 1
    xx, yy = np.meshgrid(np.arange(x_min, x_max, h), np.arange(y_min, y_max, h))
    
    for i, clf in enumerate(svms):
        plt.subplot(2, 2, i + 1)
        plt.subplots_adjust(wspace=0.4, hspace=0.4)
        Z = clf.predict(np.c_[xx.ravel(), yy.ravel()])
        Z = Z.reshape(xx.shape)
        plt.contourf(xx, yy, Z, cmap=plt.cm.Paired, alpha=0.8)
        plt.scatter(X[:, 0], X[:, 1], c=y, cmap=plt.cm.Paired)
        plt.xlabel("Sepal length")
        plt.ylabel("Sepal width")
        plt.xlim(xx.min(), xx.max())
        plt.ylim(yy.min(), yy.max())
        plt.xticks(())
        plt.yticks(())
        plt.title(titles[i])
    plt.show()

Execution result

The horizontal axis and the vertical axis each represent two features. When $ C $ is small, there are many misclassification points in the decision area, while when $ C $ is large, there are few misclassification points in the decision area. The decision boundary when $ \ gamma $ is small is a simple decision boundary (straight line), while the decision boundary when $ \ gamma $ is large has a complicated shape.

Other

Adjusting $ C $ and $ \ gamma $ seems to give something similar to the decision boundaries when using a linear kernel. If you are uncertain about the kernel selection, it seems okay to use the RBF kernel, but it will take time to tune the parameters. (´ ・ ω ・｀)

[PYTHON] What happens when I change the hyperparameters of SVM (RBF kernel)?