[PYTHON] Anomaly detection using MNIST by Autoencoder (PyTorch)

Overview

Hello everyone. The state of emergency has finally been lifted, but the situation is still unpredictable. .. It seems that I will continue to stay at home.

Now, this time, I would like to implement and verify an anomaly detection program for MNIST using a simple Autoencoder. Specifically, the model is as follows.

(What you can understand in this article is the basic flow of Autoencoder and anomaly detection, the flow of MNIST anomaly detection using PyTorch, and its verification results.)

Qiita has already published several articles on anomaly detection using MNIST. So where is the demand for this article? Actually, I think that the point implemented by *** PyTorch is different from the others ***.

When I google, I often see an implementation example using Keras, but since I recently switched to PyTorch, I was wondering if I could implement PyTorch, but I could not find it, so I implemented it myself. .. Then I will come to the explanation.

*** All the code implemented this time is available at here. *** ***

Autoencoder and anomaly detection

Let's take a quick look back at how Autoencoder is applied to anomaly detection tasks. (If you know, see the chapter after implementation)

About Autoencoder

The structure of the model is shown below.

The idea of Autoencoder is very simple, it is a model that encodes high-dimensional data with images etc. into latent variables using encoder and decodes the image using decoder. What are the benefits of mapping to a latent space? This is based on the manifold hypothesis. See the distribution below.

Source: [here](https://blog.albert2005.co.jp/2014/12/11/%E9%AB%98%E6%AC%A1%E5%85%83%E3%83%87% E3% 83% BC% E3% 82% BF% E3% 81% AE% E5% 8F% AF% E8% A6% 96% E5% 8C% 96% E3% 81% AE% E6% 89% 8B% E6% B3% 95% E3% 82% 92swiss-roll% E3% 82% 92% E4% BE% 8B% E3% 81% AB% E8% A6% 8B% E3% 81% A6% E3% 81% BF% E3% 82% 88 /)

The above is called the Swiss roll distribution. The image is three-dimensional, but think of it as an example of high-dimensional data. If you look closely, you can see that there are some parts where the data is quite sparse. If this can be mapped in two dimensions (an image that stretches to a plane), it can be inferred that the original distribution can be expressed in a low-dimensional space. More generally, data existing in high-dimensional space is regarded as a low-dimensional manifold, which is called the manifold hypothesis.

Returning to the story of the Autoencoder, the encoder maps from a high-dimensional space to a low-dimensional latent space. In other words, the "features" of high-dimensional data such as images are extracted and treated as latent variables. The original data is decoded from this low-dimensional "feature".

Application to anomaly detection

This Autoencoder framework is often applied in anomaly detection [1]. The purpose of anomaly detection is to recognize whether the model is "normal" or "abnormal" for the input data. This problem setting is a framework for pattern recognition in supervised learning, which is often used, but unfortunately, in many factories where anomaly detection is actually applied (visual inspection), anomaly data is usually not collected. Therefore, supervised approaches such as pattern recognition cannot be applied. However, in general, a large amount of "normal" data can be obtained at sites such as factories. Autoencoder will be introduced to take advantage of this and incorporate it into "anomaly detection". As explained above, Autoencoder can extract features from the distribution of high-dimensional data and map them to a low-dimensional latent space.

In other words, by training the model using a large amount of normal data, you can acquire the characteristics of the normal data. *** From this, if you enter "normal" data into the model, the decoder will of course be able to decode the original input. However, when "abnormal" data is input, it cannot be decrypted well because it does not acquire the feature that can express the abnormal data. Anomaly detection is performed using this trick. Specifically, anomalies can be detected by taking a difference between *** input / output and calculating it as the degree of anomaly. *** *** In addition, most of the application examples for actual anomaly detection are "unsupervised" or "semi-supervised" that utilizes a small amount of anomaly data.

Next, I hope that you will deepen your understanding by actually checking the implementation and experiments of MNIST.

Implementation / MNIST load part

Anomaly detection is performed using MNIST (handwritten digit data set). This time, out of 0 to 9 of MNIST, the ones labeled with *** "1" will be learned as normal data. *** *** Then, we will use the data labeled "9" as abnormal data and verify whether it can be detected.

First, you can easily load MNIST data by using the MNIST module of PyTorch's dataset class. However, as it is, all data from 0 to 9 exist, so it is necessary to narrow down this to only those with arbitrary labels. I have defined the following classes.

main.py


class Mnisttox(Dataset):
    def __init__(self, datasets ,labels:list):
        self.dataset = [datasets[i][0] for i in range(len(datasets))
                        if datasets[i][1] in labels ]
        self.labels = labels
        self.len_oneclass = int(len(self.dataset)/10)

    def __len__(self):
        return int(len(self.dataset))

    def __getitem__(self, index):
        img = self.dataset[index]
        return img,[]

In the initialization method, only the data corresponding to the list of any label given as an argument is passed as a variable in the class. The rest is the same behavior as the normal Dataset class.

The essential Autoencoder is defined as follows.

main.py


class Autoencoder(nn.Module):
    def __init__(self,z_dim):
        super(Autoencoder, self).__init__()
        self.encoder = nn.Sequential(
            nn.Linear(28 * 28, 256),
            nn.ReLU(True),
            nn.Linear(256, 128),
            nn.ReLU(True),
            nn.Linear(128, z_dim))

        self.decoder = nn.Sequential(
            nn.Linear(z_dim, 128),
            nn.ReLU(True),
            nn.Linear(128, 256),
            nn.ReLU(True),
            nn.Linear(256, 28 * 28),
            nn.Tanh()
        )

    def forward(self, x):
        z = self.encoder(x)
        xhat = self.decoder(z)
        return xhat

Each is a simple three-layered one. Learning takes MSE between input and output, and by minimizing this, it is learned to reconstruct the input. Now, let's start the experiment.

Experiment / Discussion

As mentioned above, the training data is only the image of "1", and about 6,000 image data are learned. For the test data, mix the data of "1" and "9" and check if "9" can be correctly identified as abnormal. The absolute value of the difference between input and output is used to define the degree of anomaly.

The transition of loss is shown below.

The model input (upper), its output (middle), and its difference image (lower) are shown below.

result.png

As expected, you can see that the learned image of "1" has been successfully reconstructed, but the data of "9" mixed as abnormal data has not been reconstructed well. This time it was a model with only a simple fully connected layer, but it seems to have worked.

Also, if you pay attention to the score in the lower row, you can see that the value becomes large when abnormal data is input. Actually, abnormality detection is performed by setting a threshold value for the degree of abnormality. Most of the thresholds are set by experts.

Summary

This time, we implemented and verified unsupervised anomaly detection by MNIST using PyTorch. We also explained the basic flow of anomaly detection using Autoencoder. This article has become subtle in terms of technical novelty, but I think there is demand in terms of MNIST anomaly detection by PyTorch (I may be the only one who thinks so). (Laughs) I didn't touch on the quantitative evaluation of anomaly detection performance (AUROC, etc.) because I was afraid that the article would be redundant, but I would like to summarize it in the near future. At the same time, I would like to verify the framework for anomaly detection by GAN. Recently, anomaly detection frameworks that utilize various GANs such as AnoGAN, EfficientGAN, and AnoVAEGAN have appeared, further developing from Autoencoder, and SOTA has been acquired one after another. You can expect more from the future trends.

References

http://cedro3.com/ai/keras-autoencoder-anomaly/

Recommended Posts

Anomaly detection using MNIST by Autoencoder (PyTorch)
Anomaly detection by autoencoder using keras [Implementation example for beginners]
ECG data anomaly detection by Matrix Profile
Reconstruction of moving images by Autoencoder using 3D-CNN
[Note] Market anomaly detection using Minimum Covariance Determinant
Anomaly detection of time series data by LSTM (Keras)
[Anomaly detection] Detect image distortion by deep distance learning
Anomaly detection introduction 2 Outlier detection
Anomaly detection assuming multivariate normal distribution by Hotelling's T ^ 2 method
I tried to implement anomaly detection by sparse structure learning
I tried to classify MNIST by GNN (with PyTorch geometric)