[PYTHON] I failed when clustering with k-means, but what should I do (implementation of kernel k-means)

What is this article?

A typical clustering algorithm is k-means. Since k-means is a very simple algorithm, it can lead to unfortunate clustering results. Therefore, in this article, we will introduce the implementation of kernel k-means that maps the data space to a high dimension by a nonlinear function and performs clustering.

k-means failure example

I tried clustering the following data with k-means.

origin.png

linear.png

At a glance, it seems that there are two clusters in the central part and the outer part, but the clustering result of k-means is like a straight line.

kernel k-means In kernel k-means, the data space is mapped to a high dimension by a nonlinear function and clustering is performed. In other words, when the data points are $ x \ in X $ and the nonlinear function $ \ phi $, clustering is performed for $ \ phi (x) $. There are many ways to choose the nonlinear function $ \ phi $, but the kernel function $ k (x_i, x_j) = \ phi (x_i) ^ T \ phi (x_j) $ is better than choosing $ \ phi $. Is often selected (kernel method).

The kernel functions are as follows.

Choosing a linear kernel is equivalent to k-means.

Clustering with kernel k-means

I tried clustering the previous data with kernel k-means. I set the Gaussian kernel for the kernel function and 0.1 for the value of $ \ gamma $. The source code has been uploaded to here.

kernel.png

You can see that clustering is possible between the central part and the outer part.

Other

Since k-means and kernel k-means are algorithms that largely depend on the initial values, it is not always possible to perform clustering in this way. In kernel k-means, it is necessary to select kernel functions and set hyperparameters ...

(2015/7/2 Corrected that the figure was slightly different)

References

Recommended Posts

I failed when clustering with k-means, but what should I do (implementation of kernel k-means)
What should I do with DICOM in MPEG2?
What happens when I change the hyperparameters of SVM (RBF kernel)?
What should I do with the Python directory structure after all?
A reminder of what I got stuck when starting Atcoder with python
I want to visualize the transfer status of the 2020 J League, what should I do?
What to do when PermissionError of tempfile.mkstemp occurs
real-time-Personal-estimation (What should I do to prevent the estimation of images outside the category) * Failure.
What I do when imitating embedded go in python
I briefly summarized what you should keep in mind when learning with or without supervised learning
What I was careful about when implementing Airflow with docker-compose
[AWS] What to do when you want to pip with Lambda
What to do when an error occurs with import _ssl
What I was worried about when displaying images with matplotlib
[Python] What do you do with visualization of 4 or more variables?
What to do when a part of the background image becomes transparent when the transparent image is combined with Pillow