AutoEncoder is a type of unsupervised learning in machine learning. Anomaly detection is known as an application example. This time I would like to try AutoEncoedr for mnist using CNN (convolutional neural network). I don't know what the number is, but it's a memorandum of studying Pytorch. This time I referred to here
Here's a rough understanding of AutoEncoder: Adjust the encoder and decoder so that the contents of the input and output data are equal. If the input data is different from usual, it is not possible to create output data that is equal to the input data. Therefore, if there is a large difference between the input data and the output data, it can be judged as abnormal (it seems).
The environment is python ver3.6.9 pytorch ver1.3.1 numpy ver1.17.4
First, load mnist. In the code (as a personal hobby), scikit learn is used to read data in ndarray format, but it is easier to use torch vision to read data in Tensor format. 7000 images of 0 number image data are extracted, and 6000 of them are used for learning.
import numpy as np
import matplotlib.pyplot as plt
import torch
from torch import nn, optim
from torch.utils.data import DataLoader, TensorDataset
from sklearn.datasets import fetch_openml
#Download mnist data
mnist = fetch_openml('mnist_784', version=1, data_home='data/src/download/')
X, y = mnist["data"], mnist["target"]
X_0 = X[np.int32(y) == 0] #Extract only when target data is 0
X_0 = (2 *X_0) / 255.0 - 1.0 #max 1.0 min -1.Convert to 0
X_0 = X_0.reshape([X_0.shape[0], 1, 28, 28])
X_0 = torch.tensor(X_0, dtype = torch.float32)
X_0_train, X_0_test = X_0[:6000, :, :, :], X_0[6000:, :, :, :]
train_loader = DataLoader(X_0_train, batch_size = 50)
Next, create a network using Pytorch. The encoder uses nn.Conv2d for normal convolution. The input image was 1x28x28 with 784 dimensions, but after passing through the encoder, it is dimensionally compressed to 4x7x7 with 196 dimensions. The decoder uses nn.ConvTranspose2d, which is the reverse of the normal convolution (?). Then, it finally returns to 784 dimensions of 1x28x28.
class ConvAutoencoder(nn.Module):
def __init__(self):
super(ConvAutoencoder, self).__init__()
#Encoder Layers
self.conv1 = nn.Conv2d(in_channels = 1, out_channels = 16,
kernel_size = 3, padding = 1)
self.conv2 = nn.Conv2d(in_channels = 16, out_channels = 4,
kernel_size = 3, padding = 1)
#Decoder Layers
self.t_conv1 = nn.ConvTranspose2d(in_channels = 4, out_channels = 16,
kernel_size = 2, stride = 2)
self.t_conv2 = nn.ConvTranspose2d(in_channels = 16, out_channels = 1,
kernel_size = 2, stride = 2)
self.relu = nn.ReLU()
self.pool = nn.MaxPool2d(2, 2)
self.sigmoid = nn.Sigmoid()
def forward(self, x):
#Shows the dimension when i sheets of 28x28 monochrome images are input in the comment.
#encode# #in [i, 1, 28, 28]
x = self.relu(self.conv1(x)) #out [i, 16, 28, 28]
x = self.pool(x) #out [i, 16, 14, 14]
x = self.relu(self.conv2(x)) #out [i, 4, 14, 14]
x = self.pool(x) #out [i ,4, 7, 7]
#decode#
x = self.relu(self.t_conv1(x)) #out [i, 16, 14, 14]
x = self.sigmoid(self.t_conv2(x)) #out [i, 1, 28, 28]
return x
Now that we have a network, let's learn. The loss_fn calculation compares the input image with the output image of the network.
def train_net(n_epochs, train_loader, net, optimizer_cls = optim.Adam,
loss_fn = nn.MSELoss(), device = "cpu"):
"""
n_epochs… Number of training sessions
net… network
device … "cpu" or "cuda:0"
"""
losses = [] #loss_Record function transitions
optimizer = optimizer_cls(net.parameters(), lr = 0.001)
net.to(device)
for epoch in range(n_epochs):
running_loss = 0.0
net.train() #Network training mode
for i, XX in enumerate(train_loader):
XX.to(device)
optimizer.zero_grad()
XX_pred = net(XX) #Predict on the network
loss = loss_fn(XX, XX_pred) #Forecast data and forecast of original data
loss.backward()
optimizer.step() #Gradient update
running_loss += loss.item()
losses.append(running_loss / i)
print("epoch", epoch, ": ", running_loss / i)
return losses
losses = train_net(n_epochs = 30,
train_loader = train_loader,
net = net)
Graph the output loss with matplotlib.
Predict on the net using data that is not used for learning. When 0 images are input to net
img_num = 4
pred = net(X_0_test[img_num:(img_num + 1)])
pred = pred.detach().numpy()
pred = pred[0, 0, :, :]
origin = X_0_test[img_num:(img_num + 1)].numpy()
origin = origin[0, 0, :, :]
plt.subplot(211)
plt.imshow(origin, cmap = "gray")
plt.xticks([])
plt.yticks([])
plt.text(x = 3, y = 2, s = "original image", c = "red")
plt.subplot(212)
plt.imshow(pred, cmap = "gray")
plt.text(x = 3, y = 2, s = "output image", c = "red")
plt.xticks([])
plt.yticks([])
plt.savefig("0_auto_encoder")
plt.show()
The top is the input image and the bottom is the output image. 0 is reproduced well.
When I input the image of 6, I could not reproduce 6 well. If you couldn't reproduce the data well, you can judge that it is abnormal (probably).
By the way, this model was able to learn relatively quickly even on a notebook PC, so I would like to try it even in a deeper layer.
Recommended Posts