[PYTHON] Summary of basic implementation by PyTorch

0. Introduction

In this article, I will write down the basic implementation using PyTorch (also as a memorandum). Take the classification of CIFAR10 (color image classification set) as an example.

1. About the programs posted

1.1. MIT license

Copyright 2020 shun310

Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.

1.2. Full view of the program

The whole view of the program is as follows.

import torch
import torchvision
import torchvision.transforms as transforms
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim
import numpy as np

ave = 0.5               #Normalized average
std = 0.5               #Normalized standard deviation
batch_size_train = 256  #Learning batch size
batch_size_test = 16    #Test batch size
val_ratio = 0.2         #Ratio of validation data to total data
epoch_num = 30          #Number of learning epochs

class Net(nn.Module):
    #Definition of network structure
    def __init__(self):
        super(Net, self).__init__()
        self.init_conv = nn.Conv2d(3,16,3,padding=1)
        self.conv1 = nn.ModuleList([nn.Conv2d(16,16,3,padding=1) for _ in range(3)])
        self.bn1 = nn.ModuleList([nn.BatchNorm2d(16) for _ in range(3)])
        self.pool = nn.MaxPool2d(2, stride=2)
        self.fc1 = nn.ModuleList([nn.Linear(16*16*16, 128), nn.Linear(128, 32)])
        self.output_fc = nn.Linear(32, 10)

    #Forward calculation
    def forward(self, x):
        x = F.relu(self.init_conv(x))
        for l,bn in zip(self.conv1, self.bn1):
            x = F.relu(bn(l(x)))
        x = self.pool(x)
        x = x.view(-1,16*16*16) # flatten
        for l in self.fc1:
            x = F.relu(l(x))
        x = self.output_fc(x)
        return x

def set_GPU():
    #GPU settings
    device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
    print(device)
    return device

def load_data():
    #Data loading
    transform = transforms.Compose([transforms.ToTensor(),transforms.Normalize((ave,),(std,))])
    train_set = torchvision.datasets.CIFAR10(root='./data', train=True, download=True, transform=transform)
    test_set = torchvision.datasets.CIFAR10(root='./data', train=False, download=True, transform=transform)

    #Validation data split
    n_samples = len(train_set)
    val_size = int(n_samples * val_ratio)
    train_set, val_set = torch.utils.data.random_split(train_set, [(n_samples-val_size), val_size])

    #DataLoader definition
    train_loader = torch.utils.data.DataLoader(train_set, batch_size=batch_size_train, shuffle=True, num_workers=2)
    val_loader = torch.utils.data.DataLoader(val_set, batch_size=batch_size_train, shuffle=False, num_workers=2)
    test_loader = torch.utils.data.DataLoader(test_set, batch_size=batch_size_test, shuffle=False, num_workers=2)

    return train_loader, test_loader, val_loader

def train():
    device = set_GPU()
    train_loader, test_loader, val_loader = load_data()
    model = Net()
    model.to(device)

    criterion = nn.CrossEntropyLoss()
    optimizer = optim.Adam(model.parameters(), lr=0.001)
    scheduler = optim.lr_scheduler.ReduceLROnPlateau(optimizer, patience=5, verbose=True)

    min_loss = 999999999
    print("training start")
    for epoch in range(epoch_num):
        train_loss = 0.0
        val_loss = 0.0
        train_batches = 0
        val_batches = 0
        model.train()   #Training mode
        for i, data in enumerate(train_loader):   #Read by batch
            inputs, labels = data[0].to(device), data[1].to(device) #data is[inputs, labels]List of

            #Gradient reset
            optimizer.zero_grad()

            outputs = model(inputs)    #Forward calculation
            loss = criterion(outputs, labels)   #Loss calculation
            loss.backward()                     #Reverse calculation(Gradient calculation)
            optimizer.step()                    #Parameter update

            #Cumulative history
            train_loss += loss.item()
            train_batches += 1

        # validation_Loss calculation
        model.eval()    #Inference mode
        with torch.no_grad():
            for i, data in enumerate(val_loader):   #Read by batch
                inputs, labels = data[0].to(device), data[1].to(device) #data is[inputs, labels]List of
                outputs = model(inputs)               #Forward calculation
                loss = criterion(outputs, labels)   #Loss calculation

                #Cumulative history
                val_loss += loss.item()
                val_batches += 1

        #History output
        print('epoch %d train_loss: %.10f' %
              (epoch + 1,  train_loss/train_batches))
        print('epoch %d val_loss: %.10f' %
              (epoch + 1,  val_loss/val_batches))

        with open("history.csv",'a') as f:
            print(str(epoch+1) + ',' + str(train_loss/train_batches) + ',' + str(val_loss/val_batches),file=f)

        #Save the best model
        if min_loss > val_loss/val_batches:
            min_loss = val_loss/val_batches
            PATH = "best.pth"
            torch.save(model.state_dict(), PATH)

        #Dynamic change of learning rate
        scheduler.step(val_loss/val_batches)

    #Save model of final epoch
    print("training finished")
    PATH = "lastepoch.pth"
    torch.save(model.state_dict(), PATH)

if __name__ == "__main__":
    train()

2. Explanation of each implementation

2.0. Library import and constant definition

The libraries used in this article are as follows.

import torch
import torchvision
import torchvision.transforms as transforms
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim
import numpy as np

Define constants to be used later.

ave = 0.5               #Normalized average
std = 0.5               #Normalized standard deviation
batch_size_train = 256  #Learning batch size
batch_size_test = 16    #Test batch size
val_ratio = 0.2         #Ratio of validation data to total data
epoch_num = 30          #Number of learning epochs

2.1. GPU settings

If you use GPU, you need to specify device before various settings.

def set_GPU():
    #GPU settings
    device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
    print(device)
    return device

The GPU is used through this object called device. For example

data.to(device)
model.to(device)

By doing so, data and neural network models can be loaded on the GPU.

2.2 Data preparation

Several datasets are already available in PyTorch. For example, with CIFAR10, you can prepare as follows.

def load_data():
    #Data loading
    transform = transforms.Compose([transforms.ToTensor(),transforms.Normalize((ave,),(std,))])
    train_set = torchvision.datasets.CIFAR10(root='./data', train=True, download=True, transform=transform)
    test_set = torchvision.datasets.CIFAR10(root='./data', train=False, download=True, transform=transform)

    #Validation data split
    n_samples = len(train_set)
    val_size = int(n_samples * val_ratio)
    train_set, val_set = torch.utils.data.random_split(train_set, [(n_samples-val_size), val_size])

    #DataLoader definition
    train_loader = torch.utils.data.DataLoader(train_set, batch_size=batch_size_train, shuffle=True, num_workers=2)
    val_loader = torch.utils.data.DataLoader(val_set, batch_size=batch_size_train, shuffle=False, num_workers=2)
    test_loader = torch.utils.data.DataLoader(test_set, batch_size=batch_size_test, shuffle=False, num_workers=2)

    return train_loader, test_loader, val_loader

I will explain in order. First, transform represents a series of processes that have the function of transforming data.

    transform = transforms.Compose([transforms.ToTensor(),transforms.Normalize((ave,),(std,))])

In the above example, transforms.ToTensor () changes the data to Tensor (PyTorch data type), and then transforms.Normalize ((ave,), (std,)) normalizes the mean ave and standard deviation std. Is going. Compose plays the role of organizing a series of processes. There are several other types of data conversion. See documentation.

Next, read the data of CIFAR10.

    train_set = torchvision.datasets.CIFAR10(root='./data', train=True, download=True, transform=transform)
    test_set = torchvision.datasets.CIFAR10(root='./data', train=False, download=True, transform=transform)

Specify the download destination for root. It seems that PyTorch's existing dataset is divided into train and test. Specify with the option train. Here, set to perform data transformation by passing the transform defined earlier as an argument.

I want to separate the verification data from the read data. For that, we use torch.utils.data.random_split.

    train_set, val_set = torch.utils.data.random_split(train_set, [(n_samples-val_size), val_size])

It randomly divides the data into the number specified by the second argument.

When learning with PyTorch, using DataLoader is very convenient when learning. Make it as follows.

    train_loader = torch.utils.data.DataLoader(train_set, batch_size=batch_size_train, shuffle=True, num_workers=2)
    val_loader = torch.utils.data.DataLoader(val_set, batch_size=batch_size_train, shuffle=False, num_workers=2)
    test_loader = torch.utils.data.DataLoader(test_set, batch_size=batch_size_test, shuffle=False, num_workers=2)

Pass the dataset as the first argument. batch_size specifies the batch size, shuffle specifies the presence / absence of data shuffle, and num_workers specifies the number of subprocesses (parallel number) at the time of reading. By the way, the identity of DataLoader is an iterator. Therefore, at the time of learning, data is fetched for each batch with a for statement.

If you want to use your own data

There are many situations where you want to use your own data. In this case, you should define your own dataset class as follows. (Reference: https://qiita.com/mathlive/items/2a512831878b8018db02)

class MyDataset(torch.utils.data.Dataset):
    def __init__(self, data, label, transform=None):
        self.transform = transform
        self.data = data
        self.data_num = len(data)
        self.label = label

    def __len__(self):
        return self.data_num

    def __getitem__(self, idx):
        out_data = self.data[idx]
        out_label =  self.label[idx]
        if self.transform:
            out_data = self.transform(out_data)

        return out_data, out_label

At least, define len (a function that returns the size of data) and getitem (a function that gets data) in the class. With this,

dataset = MyDataset(Input data,Teacher label, transform=Required data conversion)

You can create a dataset like this. As an aside, if you look at the contents of MyDataset, you can somehow understand what PyTorch is doing inside. In other words, when idx is specified, the data corresponding to that index is returned. Moreover, if any transform is specified, it is applied to the data. It is possible to make various operations by playing with getitem (for example, using two transforms), but it is omitted.

2.3. Creating a neural network

PyTorch makes it easy to define neural networks using classes. For the CIFAR10 classifier, for example:

class Net(nn.Module):
    #Definition of network structure
    def __init__(self):
        super(Net, self).__init__()
        self.init_conv = nn.Conv2d(3,16,3,padding=1)
        self.conv1 = nn.ModuleList([nn.Conv2d(16,16,3,padding=1) for _ in range(3)])
        self.bn1 = nn.ModuleList([nn.BatchNorm2d(16) for _ in range(3)])
        self.pool = nn.MaxPool2d(2, stride=2)
        self.fc1 = nn.ModuleList([nn.Linear(16*16*16, 128), nn.Linear(128, 32)])
        self.output_fc = nn.Linear(32, 10)

    #Forward calculation
    def forward(self, x):
        x = F.relu(self.init_conv(x))
        for l,bn in zip(self.conv1, self.bn1):
            x = F.relu(bn(l(x)))
        x = self.pool(x)
        x = x.view(-1,16*16*16) # flatten
        for l in self.fc1:
            x = F.relu(l(x))
        x = self.output_fc(x)
        return x

I will explain various things.

Network construction

The network is created by inheriting nn.Module. Define each layer as its own member in init.

    def __init__(self):
        super(Net, self).__init__()
        self.init_conv = nn.Conv2d(3,16,3,padding=1)
        self.conv1 = nn.ModuleList([nn.Conv2d(16,16,3,padding=1) for _ in range(3)])
        self.bn1 = nn.ModuleList([nn.BatchNorm2d(16) for _ in range(3)])
        self.pool = nn.MaxPool2d(2, stride=2)
        self.fc1 = nn.ModuleList([nn.Linear(16*16*16, 128), nn.Linear(128, 32)])
        self.output_fc = nn.Linear(32, 10)

For example, nn.Conv2d passes the following arguments.

nn.Conv2d(Number of input channels, number of output channels, kernel size, padding=Padding size, stride=Amount of movement)

See the official documentation for details. (https://pytorch.org/docs/stable/nn.html)

A little technique

The network layer can also be defined as an array using nn.ModuleList.

        self.conv1 = nn.ModuleList([nn.Conv2d(16,16,3,padding=1) for _ in range(3)])

This is especially useful when defining a large-scale network with a repeating structure. In addition, it seems that the parameters cannot be updated if the array is changed to a normal array without using nn.ModuleList. (Reference: https://qiita.com/perrying/items/857df46bb6cdc3047bd8) Make sure to use nn.ModuleList properly.

Definition of forward calculation

Forward calculation is defined as forward.

    def forward(self, x):
        x = F.relu(self.init_conv(x))
        for l,bn in zip(self.conv1, self.bn1):
            x = F.relu(bn(l(x)))
        x = self.pool(x)
        x = x.view(-1,16*16*16) # flatten
        for l in self.fc1:
            x = F.relu(l(x))
        x = self.output_fc(x)
        return x

The layers arranged by nn.ModuleList are taken out by the for statement. This kind of place is also convenient. Also, on the way

        x = x.view(-1,16*16*16) # flatten

a. Here, the image-like data is converted into a one-dimensional vector (pass the number of channels x the vertical width of the image x the horizontal width of the image as the second argument). The reason why the first argument is -1 is that the conversion is automatically performed according to the batch size.

2.4. Loss function and update method

The loss function is defined in torch.nn and the update method is defined in torch.optim, which are called and used. This time, we use CrossEntropyLoss as the loss function for classification. Adam is used as the update method.

criterion = nn.CrossEntropyLoss()
optimizer = optim.Adam(model.parameters(), lr=0.001)

Dynamic learning rate setting

In order to perform learning efficiently, we may want to dynamically set (or reduce) the learning rate. In that case, use something called lr_scheduler. For example

scheduler = optim.lr_scheduler.ReduceLROnPlateau(optimizer, patience=5, verbose=True)

Define the scheduler like this. With this, after calculating the loss to the validation data,

scheduler.step(val_loss)

If there is no improvement during the (patience) epoch, the learning rate will be automatically reduced. This can prevent stagnation of learning.

Self-made loss function

As with datasets, you may want to create your own loss function. It seems that this can be created by inheriting the PyTorch class or defined as a simple function. (Reference: https://kento1109.hatenablog.com/entry/2018/08/13/092939) For simple regression / classification tasks, MSELoss / CrossEntropyLoss often works well, but for machine learning papers, the loss function is devised to improve performance, so it is a fairly important implementation.

2.5. Actual learning

After preparing up to this point, we will finally start learning. First, apply each setting defined so far.

    device = set_GPU()
    train_loader, test_loader, val_loader = load_data()
    model = Net()
    model.to(device)

    criterion = nn.CrossEntropyLoss()
    optimizer = optim.Adam(model.parameters(), lr=0.001)
    scheduler = optim.lr_scheduler.ReduceLROnPlateau(optimizer, patience=5, verbose=True)

The learning is as follows. Basically as commented out.

    min_loss = 999999999
    print("training start")
    for epoch in range(epoch_num):
        train_loss = 0.0
        val_loss = 0.0
        train_batches = 0
        val_batches = 0
        model.train()   #Training mode
        for i, data in enumerate(train_loader):   #Read by batch
            inputs, labels = data[0].to(device), data[1].to(device) #data is[inputs, labels]List of

            #Gradient reset
            optimizer.zero_grad()

            outputs = model(inputs)    #Forward calculation
            loss = criterion(outputs, labels)   #Loss calculation
            loss.backward()                     #Reverse calculation(Gradient calculation)
            optimizer.step()                    #Parameter update

            #Cumulative history
            train_loss += loss.item()
            train_batches += 1

        # validation_Loss calculation
        model.eval()    #Inference mode
        with torch.no_grad():
            for i, data in enumerate(val_loader):   #Read by batch
                inputs, labels = data[0].to(device), data[1].to(device) #data is[inputs, labels]List of
                outputs = model(inputs)               #Forward calculation
                loss = criterion(outputs, labels)   #Loss calculation

                #Cumulative history
                val_loss += loss.item()
                val_batches += 1

        #History output
        print('epoch %d train_loss: %.10f' %
              (epoch + 1,  train_loss/train_batches))
        print('epoch %d val_loss: %.10f' %
              (epoch + 1,  val_loss/val_batches))

        with open("history.csv",'a') as f:
            print(str(epoch+1) + ',' + str(train_loss/train_batches) + ',' + str(val_loss/val_batches),file=f)

        #Save the best model
        if min_loss > val_loss/val_batches:
            min_loss = val_loss/val_batches
            PATH = "best.pth"
            torch.save(model.state_dict(), PATH)

        #Dynamic change of learning rate
        scheduler.step(val_loss/val_batches)

    #Save model of final epoch
    print("training finished")
    PATH = "lastepoch.pth"
    torch.save(model.state_dict(), PATH)

In PyTorch, it is necessary to explicitly describe the calculation of the loss function and the back propagation of the error (there is also a library that wraps this up).

Official tutorial (https://pytorch.org/tutorials/beginner/blitz/cifar10_tutorial.html) for testing the learned model etc. ). In the case of this model, the accuracy of each class is about 40 to 80% (the blur is quite large ...).

About train mode and eval mode

For Batch Normalization and Dropout, it is necessary to switch the behavior between learning and inference. Therefore, during learning and inference,

model.train()
model.eval()

It seems that it is necessary to add each description. In addition to this, there is also a mode called torch.no_grad (). This is a mode that does not save gradient information. Gradient information is not required because backward calculation is not performed at the time of verification, and omitting this will increase the calculation speed and save memory.

3. At the end

I feel like I'm exhausted on the way, so I may add it soon. I'm glad if you can use it as a reference.

4. References

About overall usage

Mr. fukuit: https://qiita.com/fukuit/items/215ef75113d97560e599 Mr. perrying: https://qiita.com/perrying/items/857df46bb6cdc3047bd8

Construction of classifier (CIFAR10)

Official Tutorial: https://pytorch.org/tutorials/beginner/blitz/cifar10_tutorial.html transform Official documentation: https://pytorch.org/docs/stable/torchvision/transforms.html

Existing dataset

Official documentation: https://pytorch.org/docs/stable/torchvision/datasets.html

Around data processing

Official documentation: https://pytorch.org/docs/stable/data.html

About self-made data set

mathlive: https://qiita.com/mathlive/items/2a512831878b8018db02

About self-made loss function

Mr. kento1109: https://kento1109.hatenablog.com/entry/2018/08/13/092939

ModuleList official

Official documentation: https://pytorch.org/docs/stable/generated/torch.nn.ModuleList.html

Learning rate Scheduler

Official documentation: https://pytorch.org/docs/stable/optim.html#how-to-adjust-learning-rate

Save and load the model

Official Tutorial: https://pytorch.org/tutorials/beginner/saving_loading_models.html jyori112: https://qiita.com/jyori112/items/aad5703c1537c0139edb

eval mode and no_grad

Official Tutorial: https://pytorch.org/tutorials/beginner/saving_loading_models.html PyTorch Forum: https://discuss.pytorch.org/t/model-eval-vs-with-torch-no-grad/19615 s0sem0y: https://www.hellocybernetics.tech/entry/2018/02/20/182906

Recommended Posts

Summary of basic implementation by PyTorch
1D-CNN, 2D-CNN scratch implementation summary by Pytorch
Basic usage of Pandas Summary
Save the output of GAN one by one ~ With the implementation of GAN by PyTorch ~
Summary of basic knowledge of PyPy Part 1
Summary of restrictions by file system
Implementation of SVM by stochastic gradient descent
Summary of library hosting pages by language
Summary of SQLAlchemy connection method by DB
Implementation of cos similarity matrix [Pytorch, Tensorflow]
Summary of basic implementation by PyTorch
Reconstruction of moving images by Autoencoder using 3D-CNN
Summary of examples that cannot be pyTorch backward
[mypy] Summary of options not enabled by `--strict`
Super (concise) summary of image classification by ArcFace
Deep learning learned by implementation (segmentation) ~ Implementation of SegNet ~
[Language processing 100 knocks 2020] Summary of answer examples by Python
Summary of Tensorflow / Keras
Let's summarize various implementation codes of GCN by compounds
Summary of Python articles by pharmaceutical company researcher Yukiya
Basic operation of pandas
Summary of pyenv usage
Basic usage of flask-classy
Basic usage of Jinja2
Overview of DNC (Differentiable Neural Computers) + Implementation by Chainer
Basic operation of Pandas
Basic usage of SQLAlchemy
Implementation of Fibonacci sequence
Faker summary by language
[Linux] Basic command summary
Separation summary of development environment by chroot of various Linux
Summary of string operations
Summary of problems when doing Semantic Segmentation with Pytorch
Summary of the basic flow of machine learning with Python
Stackful coroutine implementation summary
Basic knowledge of Python
Summary of Python arguments
Implementation of DB administrator screen by Flask-Admin and Flask-Login
Summary of logrotate software logrotate
Summary of test method
7-line interpreter implementation summary
Basic processing of librosa
Save the output of conditional GAN for each class ~ With cGAN implementation by PyTorch ~
Basic summary of data manipulation in Python Pandas-Second half: Data aggregation
Rank learning using neural network (Implementation of RankNet by Chainer)
Application of affine transformation by tensor-from basic to object detection-
[Scientific / technical calculation by Python] Basic operation of arrays, numpy
Basic understanding of depth estimation by mono camera (Deep Learning)
Basic summary of data manipulation with Python Pandas-First half: Data creation & manipulation
Quantum computer implementation of quantum walk 2
pandas Matplotlib Summary by usage
[PyTorch] Image classification of CIFAR-10
Super basic usage of pytest
Implementation of TF-IDF using gensim
Implementation of MathJax on Sphinx
Summary of python file operations
Summary of Python3 list operations
2017.3.6 ~ 3.12 Summary of what we did
Ensemble learning summary! !! (With implementation)
Basic usage of PySimple GUI
Random forest (implementation / parameter summary)
Explanation and implementation of SocialFoceModel