Introduction

I've been writing Deep Learning programs only in Keras, but now I have to write in PyTorch, so after studying, I tried to make something that works using MNIST as an example.

[Correction history] -Flatten () and Softmax () are now used in the model -Correspondingly, the loss function is changed to CrossEntropyLoss ().

First tutorial

For the time being, it will be a tutorial at first, so I started PyTorch Tutorial from the beginning.

However, this is really hard to understand. I couldn't grasp "Autograd" sensuously, and my motivation dropped to less than 10%.

Now I think it would have been better to start with "What is torch.nn really?" instead of starting from the beginning. ..

backup

I had such a feeling, so I started the tutorial and bought a book. After researching various things, I chose the following two books.

-"Can be used in the field! Introduction to PyTorch development" ――Since keras was easy to understand in the same series, I'm thinking of buying credit
first, I'll read from this book. -"Learn while making! Deep learning by PyTorch" ――It has a good reputation on the net, and the author's Qiita article was easy to understand.
However, when it arrives, it is thick and a little difficult to carry.

I will read this carefully while copying the sutras.

And MNIST

So, after all, it's MNIST at first, right? So, referring to PyTorch official code, extract only the necessary parts and modify it so that you can understand it ** I tried to.

environment

For the time being, I'm doing it on Windows 10 + Anaconda + CUDA.

import

import torch
import torch.nn as nn
import torch.optim as optim
from torch.optim.lr_scheduler import StepLR
from torchvision import datasets, transforms

from torchsummary import summary

Basically it is still a sample, but I added "torch summary" for confirmation. Unfortunately I can't install it with conda, so I installed it with pip. Also, the version of torchvision did not match and I could not install it from the GUI. (Installed with conda)

Parameters

seed = 1

epochs = 14
batch_size = 64
log_interval = 100

lr = 1.0
gamma = 0.7

In the sample, it was an argument, but I extracted only the necessary variables and made them fixed values. The value is the default number. (I changed only log_interval)

model

class Net(nn.Module):
    def __init__(self):
        super(Net, self).__init__()
        self.conv1 = nn.Conv2d(1, 32, kernel_size=3, stride=1)
        self.conv2 = nn.Conv2d(32, 64, kernel_size=3, stride=1)
        self.relu = nn.ReLU(inplace=True)
        self.maxpool = nn.MaxPool2d(kernel_size=2)
        self.dropout1 = nn.Dropout2d(0.25)
        self.dropout2 = nn.Dropout2d(0.5)
        self.flatten = nn.Flatten()
        self.fc1 = nn.Linear(9216, 128)
        self.fc2 = nn.Linear(128, 10)
        self.softmax = nn.Softmax(dim=1)

    def forward(self, x):
        x = self.conv1(x)
        x = self.relu(x)
        x = self.conv2(x)
        x = self.relu(x)
        x = self.maxpool(x)
        x = self.flatten(x)
        x = self.fc1(x)
        x = self.relu(x)
        x = self.dropout1(x)
        x = self.fc2(x)
        x = self.relu(x)
        x = self.dropout2(x)
        output = self.softmax(x)
        return output

I made it an internal variable with \ _ \ _ init__ () as much as possible. The last is Softmax ().

I also thought about writing the model with the Sequential API, but I feel like I'm going to play around with it based on this, so I left it as the Functional API.

Learning

def train(model, loss_fn, device, train_loader, optimizer, epoch):
    model.train()
    for batch_idx, (data, target) in enumerate(train_loader):
        data, target = data.to(device), target.to(device)
        optimizer.zero_grad()
        output = model(data)
        loss = loss_fn(output, target)
        loss.backward()
        optimizer.step()
        if batch_idx % log_interval == 0:
            print('Train Epoch: {} [{}/{} ({:.0f}%)]\tLoss: {:.6f}'.format(
                epoch, batch_idx * len(data), len(train_loader.dataset),
                100. * batch_idx / len(train_loader), loss.item()/len(data)))

It is basically a sample. The loss function is now specified by the caller.

I like the mechanism of turning epoch on the caller side and turning the batch here because it is easy to understand.

Evaluation

def test(model, loss_fn, device, test_loader):
    model.eval()
    test_loss = 0
    correct = 0
    with torch.no_grad():
        for data, target in test_loader:
            data, target = data.to(device), target.to(device)
            output = model(data)
            test_loss += loss_fn(output, target)  # sum up batch loss
            pred = output.argmax(dim=1, keepdim=True)  # get the index of the max log-probability
            correct += pred.eq(target.view_as(pred)).sum().item()

    test_loss /= len(test_loader.dataset)

    print('\nTest set: Average loss: {:.4f}, Accuracy: {}/{} ({:.0f}%)\n'.format(
        test_loss, correct, len(test_loader.dataset),
        100. * correct / len(test_loader.dataset)))

Again, I decided to specify the loss function on the caller side.

It seems that PyTorch does not have the concept of "predict", so I think that the code around here will be helpful when inferring using the trained model later. (I haven't written the inference code yet)

view_as () is too convenient and I'm surprised.

Main processing

torch.manual_seed(seed)

device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

First, embed a random seed and specify the execution environment (CUDA).

transform=transforms.Compose([
    transforms.ToTensor(),
    transforms.Normalize((0.0,), (1.0,))
    ])
kwargs = {'batch_size': batch_size,
          'num_workers': 1,
          'pin_memory': True,
          'shuffle': True}

dataset1 = datasets.MNIST('../data', train=True, download=True, transform=transform)
dataset2 = datasets.MNIST('../data', train=False, transform=transform)
train_loader = torch.utils.data.DataLoader(dataset1,**kwargs)
test_loader = torch.utils.data.DataLoader(dataset2, **kwargs)

Then load the MNIST dataset. When read, the average is 0 and the standard deviation is 1. (I changed this from the sample)

Now, prepare a data loader. This one is quite convenient. I'm hoping that this guy will expand the data. (Not examined yet)

I'm also happy that the objective variable does not have to be in one-hot format.

model = Net().to(device)
summary(model, (1,28,28))

loss_fn = nn.CrossEntropyLoss()
optimizer = optim.Adadelta(model.parameters(), lr=lr)
scheduler = StepLR(optimizer, step_size=1, gamma=gamma)

I have created a model and displayed its contents. torchsummary, excellent. If you like keras, it's a must. I can't be relieved if I don't see this. By the way, this time it will be displayed like this.

----------------------------------------------------------------
        Layer (type)               Output Shape         Param #
================================================================
            Conv2d-1           [-1, 32, 26, 26]             320
              ReLU-2           [-1, 32, 26, 26]               0
            Conv2d-3           [-1, 64, 24, 24]          18,496
              ReLU-4           [-1, 64, 24, 24]               0
         MaxPool2d-5           [-1, 64, 12, 12]               0
           Flatten-6                 [-1, 9216]               0
            Linear-7                  [-1, 128]       1,179,776
              ReLU-8                  [-1, 128]               0
         Dropout2d-9                  [-1, 128]               0
           Linear-10                   [-1, 10]           1,290
             ReLU-11                   [-1, 10]               0
        Dropout2d-12                   [-1, 10]               0
          Softmax-13                   [-1, 10]               0
================================================================
Total params: 1,199,882
Trainable params: 1,199,882
Non-trainable params: 0
----------------------------------------------------------------
Input size (MB): 0.00
Forward/backward pass size (MB): 1.04
Params size (MB): 4.58
Estimated Total Size (MB): 5.62
----------------------------------------------------------------

I am happy with the familiar display. By the way, it should be noted that the order of the arrays is different. Since it was (batch, height, width, channel) all the time, I feel something is wrong with (batch, channel, height, width).

Don't forget to prepare the loss function. I am using CrossEntropyLoss () for my model Softmax ().

After that, decide the optimization algorithm and how to adjust the learning rate. It would be nice to be able to easily incorporate a mechanism to change the learning rate.

for epoch in range(1, epochs + 1):
    train(model, loss_fn, device, train_loader, optimizer, epoch)
    test(model, loss_fn, device, test_loader)
    scheduler.step()

I will continue learning with epoch. Learning → Evaluation → Learning rate adjustment It is very easy to understand.

Save / load trained model

torch.save(model.state_dict(), "mnist_cnn.pt")
model.load_state_dict(torch.load("mnist_cnn.pt"))

As a bonus, I wrote about how to save and load the trained model. I looked at the contents with a binary editor, but it was refreshing.

Summary

I'm not familiar with it yet, but I felt that it has the following features.

--Easy to grasp the flow ――Can describe in detail --The amount of code increases

I'll study harder!

[PYTHON] Keras lovers tried PyTorch