[PYTHON] Try an autoencoder with Pytorch

at first

AutoEncoder is a type of unsupervised learning in machine learning. Anomaly detection is known as an application example. This time I would like to try AutoEncoedr for mnist using CNN (convolutional neural network). I don't know what the number is, but it's a memorandum of studying Pytorch. This time I referred to here

About AutoEncoder

Here's a rough understanding of AutoEncoder: オートエンコーダ.png Adjust the encoder and decoder so that the contents of the input and output data are equal. If the input data is different from usual, it is not possible to create output data that is equal to the input data. Therefore, if there is a large difference between the input data and the output data, it can be judged as abnormal (it seems).

Practice

The environment is python ver3.6.9 pytorch ver1.3.1 numpy ver1.17.4

First, load mnist. In the code (as a personal hobby), scikit learn is used to read data in ndarray format, but it is easier to use torch vision to read data in Tensor format. 7000 images of 0 number image data are extracted, and 6000 of them are used for learning.


import numpy as np
import matplotlib.pyplot as plt
import torch
from torch import nn, optim
from torch.utils.data import DataLoader, TensorDataset
from sklearn.datasets import fetch_openml

#Download mnist data
mnist =  fetch_openml('mnist_784', version=1, data_home='data/src/download/')  
X, y = mnist["data"], mnist["target"]
X_0 = X[np.int32(y) == 0]        #Extract only when target data is 0
X_0 = (2 *X_0) / 255.0 - 1.0     #max 1.0 min -1.Convert to 0

X_0 = X_0.reshape([X_0.shape[0], 1, 28, 28])
X_0 = torch.tensor(X_0, dtype = torch.float32)
X_0_train, X_0_test = X_0[:6000, :, :, :], X_0[6000:, :, :, :]
train_loader = DataLoader(X_0_train, batch_size = 50)

Next, create a network using Pytorch. The encoder uses nn.Conv2d for normal convolution. The input image was 1x28x28 with 784 dimensions, but after passing through the encoder, it is dimensionally compressed to 4x7x7 with 196 dimensions. The decoder uses nn.ConvTranspose2d, which is the reverse of the normal convolution (?). Then, it finally returns to 784 dimensions of 1x28x28.

class ConvAutoencoder(nn.Module):
    def __init__(self):
        super(ConvAutoencoder, self).__init__()
        #Encoder Layers
        self.conv1 = nn.Conv2d(in_channels = 1, out_channels = 16,
                               kernel_size = 3, padding = 1)
        self.conv2 = nn.Conv2d(in_channels = 16, out_channels = 4,
                               kernel_size = 3, padding = 1)
        #Decoder Layers
        self.t_conv1 = nn.ConvTranspose2d(in_channels = 4, out_channels = 16,
                                          kernel_size = 2, stride = 2)
        self.t_conv2 = nn.ConvTranspose2d(in_channels = 16, out_channels = 1,
                                          kernel_size = 2, stride = 2)
        self.relu = nn.ReLU()
        self.pool = nn.MaxPool2d(2, 2)
        self.sigmoid = nn.Sigmoid()
    
    def forward(self, x):
        #Shows the dimension when i sheets of 28x28 monochrome images are input in the comment.
        #encode#                          #in  [i, 1, 28, 28] 
        x = self.relu(self.conv1(x))      #out [i, 16, 28, 28]  
        x = self.pool(x)                  #out [i, 16, 14, 14]
        x = self.relu(self.conv2(x))      #out [i, 4, 14, 14]
        x = self.pool(x)                  #out [i ,4, 7, 7]
        #decode#
        x = self.relu(self.t_conv1(x))    #out [i, 16, 14, 14]
        x = self.sigmoid(self.t_conv2(x)) #out [i, 1, 28, 28]
        return x

Now that we have a network, let's learn. The loss_fn calculation compares the input image with the output image of the network.

def train_net(n_epochs, train_loader, net, optimizer_cls = optim.Adam,
              loss_fn = nn.MSELoss(), device = "cpu"):
    """
    n_epochs… Number of training sessions
net… network
    device …　"cpu" or "cuda:0"
    """
    losses = []         #loss_Record function transitions
    optimizer = optimizer_cls(net.parameters(), lr = 0.001)
    net.to(device)
    
    for epoch in range(n_epochs):
        running_loss = 0.0  
        net.train()         #Network training mode
    
        for i, XX in enumerate(train_loader):
            XX.to(device)
            optimizer.zero_grad()
            XX_pred = net(XX)             #Predict on the network
            loss = loss_fn(XX, XX_pred)   #Forecast data and forecast of original data
            loss.backward()
            optimizer.step()              #Gradient update
            running_loss += loss.item()
        
        losses.append(running_loss / i)
        print("epoch", epoch, ": ", running_loss / i)
        
    return losses

losses = train_net(n_epochs = 30,
                   train_loader = train_loader,
                   net = net)

Graph the output loss with matplotlib.

Predict on the net using data that is not used for learning. When 0 images are input to net

img_num = 4
pred = net(X_0_test[img_num:(img_num + 1)])
pred = pred.detach().numpy()
pred = pred[0, 0, :, :]

origin = X_0_test[img_num:(img_num + 1)].numpy()
origin = origin[0, 0, :, :]

plt.subplot(211)
plt.imshow(origin, cmap = "gray")
plt.xticks([])
plt.yticks([])
plt.text(x = 3, y = 2, s = "original image", c = "red")

plt.subplot(212)
plt.imshow(pred, cmap = "gray")
plt.text(x = 3, y = 2, s = "output image", c = "red")
plt.xticks([])
plt.yticks([])
plt.savefig("0_auto_encoder")
plt.show()

The top is the input image and the bottom is the output image. 0 is reproduced well.

When I input the image of 6, I could not reproduce 6 well. If you couldn't reproduce the data well, you can judge that it is abnormal (probably).

By the way, this model was able to learn relatively quickly even on a notebook PC, so I would like to try it even in a deeper layer.